Scientific Writing Guide
Scientific Writing Guide
Guide of the
Empirical Software Engineering Research Group,
Department of Computer Science,
University of Helsinki
This work is licensed under the Creative Commons Attribution 4.0 BY-NC-SA
International License. http://creativecommons.org/licenses/by-nc-sa/4.0/
Introduction
These guidelines aim to assist students in writing scientific text. The guidelines are
applicable to seminar reports, B.Sc. and M.Sc. theses and also when aiming to write your
first scientific publication. These guidelines are intended for software engineering and related
areas of research. They are not necessarily directly applicable to other fields, e.g.,
theoretical computer science.
Note that guidelines and instructions vary in many respects between institutes, individual
supervisors and scientific forums. However, some principles are more or less generally
applicable. Learn first to follow the rules, that is, before even considering bending them. With
experience, you may reasonably justify not following a particular rule.
Learning scientific writing provides the ability to express one’s thoughts with particular clarity
and communicate them in a manner that seasoned scientists find easy to follow.
Writing Process
(intertwined with the research process)
Writing is a personal process. Different people have very different ways of getting the words
on paper. You need to learn to know yourself in this sense: what gets you to write even if
cleaning windows seems much more interesting and urgent?
There are good guidebooks for writing a thesis; look for them. (One example for Finnish
readers is Svinhufvud, K. (2015). Gradutakuu. Art House. Also see the accompanying web
site at http://www.gradutakuu.fi/)
● Previous work. Understanding the previous, published scientific work is the bedrock
for and distinguishing aspect of scientific research. Simply, read, search for
information, and take notes as you go. In your writing, the previous work is clearly
reflected in the beginning of the text. It is not seldom that too little effort is spent on
previous work and something important is noticed late in the process. In the worst
case, this may invalidate much of the effort spent or otherwise cause major
challenges or rework.
● Research methods. Another bedrock is applying proper methods in your research.
In order to do that, you need to know the methods and how and when to use them.
Then, of course, you need to use the selected method(s) properly. When writing,
explain the methods you used and how exactly you did it. In other words, describe
your study design including data collection methods and sources, analysis methods,
etc. Look for good examples from existing articles.
● Research questions. We recommend spending notable effort on thinking and
(re)phrasing your research questions. Defining research questions is both an
essential part of your research work and also a powerful tool for your writing process.
It focuses and scopes your work. Do not underestimate the importance of your
research questions or the effort to specify and justify them. There is a section below
devoted to research questions.
Research questions
A research question summarises an issue that you want to investigate in a clear and focused
way with exact and carefully considered wording. Good research questions are the basis of
your research work. They guide your research work, help to identify data to be collected and
analysed, facilitate the construction of logical argumentation, and provide extremely valuable
assistance for writing.
Formulating the research questions is a major and difficult task. It tends to require multiple
iterations and consideration of many alternatives. The process is thus iterative, converging
towards the explication of what your research work about.
The relationship of a hypothesis and a research question: A hypothesis shares the same
purpose of being a clear explication of what exactly is being investigated. A hypothesis is a
proposed explanation for a phenomenon – a tentative, testable answer or “an educated
guess” to a scientific question. A well-formed hypothesis consists of one or more
assumptions that can be tested through study or experimentation. For example, an
assumption makes it possible to create a setup for an experiment, where it is possible to
measure the outcome of a factor that you change in the experiment. A research question,
however, also bears the idea of being a question, that is, not primarily assuming, stating or
claiming something, but asking.
Avoid superlatives
RQ2: What is the best way to utilise sentiment analysis in requirements elicitation?
● Extreme words, such as best, optimal, maximal, etc., are difficult. You often already
have hard time defining what "best" means, not to mention measuring it.
● Finding the best, optimal or maximal something is probably even not the real
question, although formulating such a question is tempting. Often, finding something
"good" could be more appropriate. So, you may consider, for example:
RQ2': How can we improve requirement elicitation by sentiment analysis? or just
RQ2'': How can sentiment analysis support requirements elicitation?
Be neutral
One particular and important aspect when asking a research question is trying to be neutral.
That is, do not state the question so that it is leading towards particular answers. E.g.,
RQ3: What are the biggest flaws in feature models?
● Well, again "the biggest".
● When looking for flaws, you easily find flaws (even if not that remarkable). For
example, if your initial standpoint reflects on how you interview developers or analyse
the models, you may be (mis)lead to digging only for challenges. Therefore, think
carefully why not to consider the both sides, e.g., benefits and challenges. This helps
to gain a more balanced view and a context for the severity of the identified
challenges. This is certainly not a rule, but merely a note for making you to think bit
harder. The following might work for you as well:
RQ3': What are the benefits and challenges of using feature models?
● Already hinting that the interest would be in use of feature models in practice rather
than, say analysis at conceptual level.
● Or, consider splitting into two subquestions and making the view of practice more
clear:
RQ3'': How feature models are used in practice?
RQ3''.1: What are the benefits of using feature models?
RQ3''.2: What are the challenges in using feature models?
Structure
These guidelines adopt the so-called IMRAD structure of scientific articles as the backbone.
With some modifications, IMRAD structure is applicable to a thesis as well. IMRAD is an
acronym from the letters of the main headings a scientific article should include: Introduction,
Methods, Results and Discussion.
Figure 1. Overview of the structure of a scientific article.
In addition to the major IMRAD sections, a research article also has other elements. An
article begins with:
● Title
● List of authors, their affiliations and contact information
● Abstract
and an article ends with:
● Conclusions
● Acknowledgements of help in doing the research, e.g., funding
● References
● Appendices (if any)
Next, we detail the IMRAD sections and also Abstract and Conclusions. Figure 1 shows an
overview of a typical structure of a scientific article.
Abstract
It is often useful to structure the abstract so that the reader can get a concise overview of the
paper. The basic structure can follow the IMRAD format; essentially, the abstract is an
extremely condensed version of the paper. Some forums (e.g., ESEM conference) have
adopted a so-called structured abstract, which may have the following sections:
● Background – Rationale and motivation for conducting the study and context of the
study
● Aims – Aims of the study, or in other words, goal, objective(s), research problem,
main research question, maybe hypotheses or propositions
● Method – Methodological approach, method(s) used to address the aims; data the
study is based on
● Results – Very concise summary of the main results
● Conclusions – So what?, takeaway message, what to learn from the study,
implications of the results. A very concise summary of the implications of the study.
Possibly main limitations, e.g., if crucial for interpreting the results.
The sections can be written out in the abstract using a colon after each section name (e.g.
Conclusions: …). No line breaks are required, in order to keep the abstract compact. Even if
not using the section names, using a similar structure can help write a good abstract. Some
different wordings may be used, e.g., Context, Objective, Method, Results, and Conclusions.
Abstracts should not include references. They should stand on their own without the rest of
the article. The reference list is not necessarily available when a reader scans titles and
abstracts for interesting articles.
Introduction
One of the main purposes for Introduction can be stated as giving the reader a good reason
to read on. The introduction should set the context for the problem, maybe with the help of a
few key references or by describing the problem setting and situation in reality, if your work
is more empirical in nature. You should also give an overview of what is known of the
problem, mainly to motivate why doing your research is meaningful. In addition, explain how
you have approached the problem with proper justifications for what was done so that the
reader gets the idea that given the particular situation and problem your research is
interesting and worth reading more about.
The last section of the introduction is typically a summarising paragraph on the structure of
the article or thesis. It is not supposed to be just a list of empty sentences mentioning all the
main sections, e.g., as for Methods: “In Section 2 the research methods are presented” but
rather something more insightful and informative, such as “Section 2 motivates the selection
of case study as the main research method and the software development environment of
ACME.”
Methods
This is the place to describe your study design and details how you collected your data and
how you analysed it. Give enough appropriate details, which you must have collected while
doing the research, such as numbers of people interviewed, their background, context, etc.,
and any other sources of information, e.g., log files, memos, presentation material, white
papers, etc. When performing literature studies, document your search strings, databases
applied in search and snowballing procedures, if any.
The research question can be presented in Introduction before addressing previous work
when it is understandable without previous scientific work. Typically this is the case when the
research problem comes from elsewhere than scientific literature, e.g., from practice in an
organisation, or if it is otherwise a generally known issue.
A natural place for the research questions is after addressing previous work when your work
addresses an identified gap in the literature. Here, previous work is discussed in a way that
leads to the identification of a gap in the previous knowledge. Research questions are then
placed so that you aim to fill the identified gap. Depending on the details of the structure, the
Research questions may be placed (late) in Introduction or as a separate section after
Previous work. In some cases you may do the both, i.e., introduce the research questions or
at least the main problem in Introduction and then later repeat the research questions with,
e.g., some more details about how they are scoped and approached in your research.
Results
Figure 1 highlights the line between Results and Discussion. This is sometimes hard to
understand, or define and maintain in practice. The main idea is to make it explicitly clear
what was found as results and how those results are discussed. Section Results avoids
discussion, commenting and any speculation when presenting the results. On the other
hand, Section Discussion (discussed below) discusses results based on what is presented in
Results – no new results should be introduced in Discussion.
In a thesis document, you may have multiple sections for reporting your results. A general
guideline for a Results section(s) is to write the facts, e.g., what you built or what you
observed and what you analysed from the data observed. You are not presenting raw data
as your results, but also analyses how to interpret it, what it means, so that the reader can
understand what was found.
While reporting the results, you should keep looking back to your research questions to see
that you write about answering them. You may need to rephrase your research questions, if
your results do not provide answers to them and a different wording may better describe
your results. Just remember to maintain consistency between the different parts of your
thesis document.
Discussion
In Discussion, you take a look at what you achieved as reported in Results (not necessarily
in the order shown below). You reflect your results to the research questions basically to
discuss how well you were able to answer them. Typically, you will discuss the validity and
generalisability of the results and compare them to the related work.
Validity
In short, validity is about how good approximation of truth are the knowledge claims you
make, i.e., your results and conclusions. You can think this as how good (valid) conclusions
you have been able to draw are based on your research. Although a validity is based on the
rigour of research methods, it is actually not a characteristic of research methods. Even
though good use of methods improves the likelihood of achieving valid conclusions, there is
no guarantee. Thus, it is important to discuss the validity of the final outcome of your
research in addition to the rigour of the research process.
You need to very honestly bring up any potential problems (threats to validity, limitations) in
terms of the validity of your results. For example, if you interviewed some persons to
understand the problems, you need to think carefully whether you had the right person to
talk to, they understood your questions correctly, you understood their answers, you have
made correct analysis and conclusions of the answers, and so on. If your results are a
designed artefact, you need to explain how well the design solves the original problem and
how you know that.
Explain all you did for aiming to improve the validity of your results, such as justification for
the selection of the interviewees, how you tested your questions, the analysis process you
used, have you presented the results to the original informants asking if they agree with your
interpretation and conclusions, how you tested, measured or otherwise validated your
design, etc. Your research approach and study design play an important role here.
Nevertheless, remember to be humble with your results, as absolute truths are not that
common in software engineering research. But on the other hand, be proud of your results
you have worked hard to get. Discussing the validity is not about bashing your work as hard
as you can, but to provide the reader justified and intelligent views on your results.
Related work
You also discuss your results in the light of related work. That is, what similar work have
others done and how do your results compare with theirs.
The term Previous work is used in these instructions to refer to the literature that is
presented as a basis for your work, whereas the term Related work refers to the comparison
of your results with the work of others. Therefore, Related work is placed after the Results
section. Depending on your topic, Previous work may be rather brief and does not deserve a
section of its own – you can embed it in the Introduction. Similarly, brief Related work may
be embedded in Discussion. That is, it depends on your topic whether it makes more sense
to present the literature mainly as Previous work or as Related work, or to have them both in
a prominent role.
Generalisability
Depending on the context of your work, you should also consider generalisability of the
results. For example, if you learnt some lessons in a case study, consider if those might be
applicable to other companies as well. Think what was special in your case? What seems
typical or common to other similar cases? What lessons or findings from your work could be
applicable in different contexts? Give your suggestions and justify them. Make sure to
understand the difference between generalising from study population to a larger population
(based on representativeness and statistics) and from a single case to another case (based
on in-depth understanding and contextualisation).
Conclusions
Select and write concluding remarks on your work with the perspective the reader has at this
point, that is, after the results and discussion. Summarise briefly what you did, but keep in
mind that giving conclusions is not the same as simply summarising all that has been said
earlier. In layman’s terms, you should answer the question: “So what?” It is important to think
over and clarify the central results / main contributions of your work. Consequently, write
them crystal clear in Conclusions. It may help to revisit your research question(s) to ensure
that your contributions have relevance for the reader who was interested in the answers
when starting to read the paper.
Finally, some ideas about new research questions may have emerged during your project.
You can now give some suggestions on how you or someone else could carry on doing
further research based on your work.
A typical length of Conclusions is half a page or less in articles and 1–2 pages in a thesis.
Variations to IMRAD
Wikipedia:
Usually, the IMRAD article sections use the IMRAD words as headings. A few variations may
occur, as follows:
● Many journals have a convention of omitting the "Introduction" heading, based on the
idea that the reader who begins reading an article does not need to be told that the
beginning of the text is the introduction. This print-era proscription is fading since the
advent of the Web era, when having an explicit "Introduction" heading helps with
navigation via document maps and collapsible/expandable TOC (Table of Contents)
trees. The same considerations are true regarding the presence or proscription of an
explicit "Abstract" heading.
● In some journals, the "Methods" heading may vary, being "Methods and materials",
"Materials and methods", or similar phrases. Some journals mandate that exactly the
same wording for this heading be used for all articles without exception; other
journals reasonably accept whatever each submitted manuscript contains, as long as
it is one of these sensible variants.
● The "Discussion" section may subsume any "Summary", "Conclusion", or
"Conclusions" section, in which case there may or may not be any explicit
"Summary", "Conclusion", or "Conclusions" subheading; or the
"Summary"/"Conclusion"/"Conclusions" section may be a separate section, using an
explicit heading on the same heading hierarchy level as the "Discussion" heading.
Which of these variants to use as the default is a matter of each journal's chosen
style, as is the question of whether the default style must be forced onto every article
or whether sensible inter-article flexibility will be allowed.
Referencing
References have a very important role in scientific writing. Fundamental scientific principles
rely on building research on others’ work and correcting or refining previous work whenever
needed. In addition, as novelty is one of the essential characteristics and valued aspects of
scientific work, proper acknowledgement should be given to all previous and related work.
Appropriate use of references makes all this explicit. Furthermore, based on your selection
of references, the reader can determine your maturity in positioning your work in the
underlying field of research.
Well-written scientific text uses references in a manner that clearly communicates the
necessary information and links it to the work at hand without unnecessarily breaking the
text flow. Necessary information may include, for example, the use of the referred article for
justifying your claim, quality and reliability of the referred work – all publications are not
created equal. Achieving this is hard and requires some practice. We try to give some
practical advice below.
● Cut non-informative clutter. For example:
“Möttönen et al. (2012) write in their publication that pink is a challenging colour.”
can be simplified to:
“Pink is a challenging colour (Möttönen et al. 2012).”
The meaning of a reference at the end of the sentence is that this is what Möttönen
et al. say in that publication. In any case, be careful not to simplify to the point of
misrepresenting the original information.
● Prefer information-prominence over author-prominence. Using an author’s name
in text, e.g., as in “Möttönen (2014) writes...”, tends to indicate a reason for doing so,
which may be to highlight it is Möttönen who said so. For example, if you tend to
disagree with the authors, there is something controversial or unconvincing in the
statement made or research conducted, or the point is merely an opinion and you
want to highlight that. Generally in scientific writing, it should be of more important
what is said than who said it. Sometimes, however, you may need to refer to
Möttönen’s results later in the text and thus use the authors’ names as a means of
reference. Even in this case, other options can be considered, e.g., name of the
research group, country of the research (e.g., “a Finnish study”), etc.
In addition, it is often far more important to reveal the context of the research or
results. Consider the following examples, as they could contain some much more
important piece of information to the reader than Möttönen’s name:
○ “The participants of a boxing event considered pink a useless colour
(Möttönen et al. 2012).”
○ “The developers in a Finnish game company categorised pink as a positively
challenging UI colour (Möttönen et al. 2012).”
Regarding the formatting of the references, we strongly encourage you to use the
Author-Date model (e.g., Möttönen et al. 2012), particularly in a thesis. You may use, for
example, the following guideline for details:
● A good and clear guide: Harvard Referencing Guide by Monash University (for a
quick guide, check the Appendices of that document; referred 2018-06-27)
● Or material from apastyle.org (referred 2018-06-27)
Details and other instructions
Let’s consider a six-page, two-column article. Roughly, the typical numbers are:
These rough figures already give some basis for considering how to allocate the available
space and if something needs more or less emphasis, e.g., more rigorous description of
methods is required, how to adjust the whole.
In a thesis, the total number of pages typically has no strict upper limit. It is more important to
maintain a good balance. One practical piece of advice is to first figure out how much space
the results require. Then allocate adequately for Discussion; it is an important part
demonstrating the nature and relevance of your results and your own maturity within and
around the thesis topic. The length of theoretical background is constrained by remaining
available space. Select what the reader needs from what you have read and what fits in. A
common mistake is to write too detailed previous work, e.g. 40 pages, only to find out that
there is not enough space. This is important, as summarising existing literature is often
relatively easy and produces a lot of text. Do not just summarise: write a critical own analysis
of the literature as a background and motivation to the rest of your work.. In other words, do
not try to write a textbook on your thesis topic.
For a MSc thesis aiming to be no longer than 50 pages, you can start, for example, by
allocating some 10-15 pages each to Previous work, Results and Discussion. This may
leave about 10 pages to the rest, i.e., Introduction (3–5 pages), Conclusions (1–2) and
References (5–10). The length of references is not that critical, as the appropriate number of
reference is determined by other factors.
Figures and tables
Scientific results are often summarised as figures and tables. Table 1 summarises criteria for
selecting between presenting data as a table, a figure or text.
Table 1: How to choose between tables, figures and text to present data (Adapted from
Editage Insights, tips on effective use of tables and figures in research papers, retrieved
2018-06-27)
To show many and precise To show trends, patterns, When you don’t have
numerical values and other and relationships across and extensive or complicated
specic data in a small between data sets when the data to present
space general pattern is
more important than the
exact data values (what to
use:graphs and data plots)
To compare and contrast To summarise research When putting your data into
data values or results (what to use: graphs, a table would mean creating
characteristics among data plots, maps, and pie a table with 2 or fewer
related items or items with charts) columns
several shared
characteristics or variables
To show the presence or To present a visual When the data that you are
absence of specic explanation of a sequence planning to present is
characteristics of events, procedures, peripheral to the study or
geographic features, or irrelevant to the main study
physical characteristics findings
(what to use: schematic
diagrams, images,
photographs, and maps)
● The difference between own original contribution (own figures) and figures from other
sources is done by a reference.
○ Basically: no reference in the caption means the figure is presented as an
original contribution!
○ In principle, it is a serious error, even if accidental, to give the impression that
a figure or graph taken from elsewhere would be your own contribution.
● A figure and its caption together should form a self-explanatory whole that is
understandable even without reading the text.
● Figures should be redrawn and elements not supporting the text removed. Potentially
something is reorganised or further annotated with the terminology of your article.
The original is referred to, e.g. as:
○ (modified from Möttönen 2012) or (adapted from Möttönen 2012)
● Long lists are not very readable, so include only the necessary and make the text
readable, not just a bulleted list.
Language
Use grammatically and idiomatically correct written language. Do not use slang terms,
colloquialisms, or informal expressions (unless quoting verbatim). However, do not try to
make the text fancier by introducing difficult words or sentence structures, as this will only
make the text difficult to read.
Sometimes confusing: “In this thesis...”, as one can understand this both as referring to the
thesis project or the written thesis. Best to use only to refer to the written document and
clarify, e.g., as “thesis project” if that is what is meant.