Understanding the factors for fast answers in technical Q&A websites

Wang, Shaowei; Chen, Tse-Hsun; Hassan, Ahmed E.

doi:10.1007/s10664-017-9558-5

Understanding the factors for fast answers in technical Q&A websites

An empirical study of four stack exchange websites

Published: 17 October 2017

Volume 23, pages 1552–1593, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

1280 Accesses
1 Altmetric
Explore all metrics

Abstract

Technical questions and answers (Q&A) websites accumulate a significant amount of knowledge from users. Developers are especially active on these Q&A websites, since developers are constantly facing new development challenges that require help from other experts. Over the years, Q&A website designers have derived several incentive systems (e.g., gamification) to encourage users to answer questions that are posted by others. However, the current incentive systems primarily focus on the quantity and quality of the answers instead of encouraging the rapid answering of questions. Improving the speed of getting an answer can significantly improve the user experience and increase user engagement on such Q&A websites. In this paper, we explore how one may improve the current incentive systems to motivate fast answering of questions. We use a logistic regression model to analyze 46 factors along four dimensions (i.e., question, asker, answer, and answerer dimension) in order to understand the relationship between the studied factors and the needed time to get an accepted answer. We conduct our study on the four most popular (i.e., with the most questions) Q&A Stack Exchange websites: Stack Overflow, Mathematics, Ask Ubuntu, and Superuser. We find that i) factors in the answerer dimension have the strongest effect on the needed time to get an accepted answer, after controlling for other factors; ii) the current incentive system does not recognize non-frequent answerers who often answer questions which frequent answerers are not able to answer. Such questions that are answered by non-frequent answerers are as important (i.e., have similar range of scores) as those that are answered by frequent answerers; iii) the current incentive system motivates frequent answerers well, but such frequent answerers tend to answer short questions. Our findings suggest that Q&A website designers should improve their incentive systems to motivate non-frequent answerers to be more active and to answer questions fast, in order to shorten the waiting time to receive an answer (especially for questions that require specific knowledge that frequent answerers might not possess). In addition, the question answering incentive system needs to factor in the value and difficulty of answering the questions (e.g., providing more rewards to harder questions or questions that remain unanswered for a long period of time).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bounties on technical Q&A sites: a case study of Stack Overflow bounties

Article 31 July 2019

What is asked about technical debt (TD) on Stack Exchange question-and-answer (Q&A) websites? An observational study

Article 28 January 2023

Effects of tag usage on question response time

Article 05 June 2015

Notes

References

Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th international conference on world wide web, WWW ’08, pp 665–674
Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf. Process. Manage. 39(1):45–65
Article Google Scholar
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 850–858
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on world wide web, WWW ’13, pp 95–106
Antin J, Churchill EF (2011) Badges in social media: a social psychological perspective. In: CHI 2011 Gamification workshop proceedings (Vancouver, BC, Canada, 2011)
Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, pp 97–100
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: Examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering, pp 4–14
Briggs A, Clark T, Wolstenholme J, Clark P (2003) Missing... presumed at random: cost analysis of incomplete data. Health Econ 12(5):377–393
Article Google Scholar
Bulmer M (1979) Principles of statistics. Dover books on mathematics series. Dover Publications, New York
Google Scholar
Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: the case of stackoverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, CSCW’15 companion, pp 171–174
Chambers JM (1991) Statistical models in S. CRC Press, Inc., Boca Raton
Google Scholar
Chen T-H, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: Proceedings of the 9th IEEE working conference on mining software repositories, MSR ’12, pp 189–198
Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114:494–509
Article Google Scholar
Colburn L (2016) SLI systems granted auto complete patent. https://www.sli-systems.com/sli-systems-granted-auto-complete-patent
Cornea R, Weininger N (2014) Providing autocomplete suggestions. US Patent 8,645,825
Dunn OJ (1961) Multiple comparisons among means. American Statistical Association 56:52–64
Article MathSciNet Google Scholar
Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461–470
Article MathSciNet Google Scholar
Freedman D (2005) Statistical models: theory and practice. Cambridge University Press, Cambridge
Book Google Scholar
Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc, San Francisco
Google Scholar
Harrell FE Jr (2006) Regression modeling strategies. Springer-Verlag New York, Inc., Secaucus
Google Scholar
Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? A study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension, ICPC 2014, New York, NY, USA, pp 83–94
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21(5):2146–2189
Article Google Scholar
Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor & Francis Inc, New York
Book Google Scholar
Mockus A (2008) Missing data in software engineering. Springer, London, pp 185–200
Google Scholar
Moore D, Maccabe G, Craig B (2009) Introduction to the practice of statistics. W.H. Freeman and Company, San Francisco
Google Scholar
Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014a) Improving low quality stack overflow post detection. In: 30th IEEE international conference on software maintenance and evolution, Victoria, BC, Canada, September 29–October 3, 2014, pp 541–544
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, MSR ’13, pp 102–111
Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: 14th international conference on quality software, pp 343–352
Rahman MM, Roy CK (2015) An insight into the unresolved questions at stack overflow. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15, pp 426–429
San Pedro J, Karatzoglou A (2014) Question recommendation for collaborative question answering systems with RankSLDA. In: Proceedings of the 8th ACM conference on recommender systems. ACM, pp 193–200
Squire M (2015) Should we move to stack overflow? Measuring the utility of social media for developer support. In: Proceedings of the 37th international conference on software engineering, pp 219–228
StackOverflow (2016) Developer survey results 2016. http://stackoverflow.com/research/user-survey-2016
Thongtanunam P, McIntosh S, Hassan AE, Iida H (2016) Revisiting code ownership and its relationship with software quality in the scope of modern code review. In: Proceedings of the 38th international conference on software engineering, pp 1039–1050
Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807
Vasilescu B, Serebrenik A, Devanbu P, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing, CSCW ’14, pp 342–354
Wang S, Lo D, Jiang L (2013) An empirical study on developer interactions in stackoverflow. In: Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, March 18–22, 2013, pp 1019–1024
Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of the international conference on software maintenance and evolution, pp 291–300
Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, pp 287–296
Yang L, Bao S, Lin Q, Wu X, Han D, Su Z, Yu Y (2011) Analyzing and predicting not-answered questions in community-based question answering services. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI’11. AAAI Press, pp 1273–1278
Yao Y, Tong H, Xie T, Akoglu L, Xu F, Lu J (2013) Want a good answer? Ask a good question first! arXiv:1311.6876
Zhou M, Mockus A (2011) Does the initial environment impact the future of developers?. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11, pp 271–280

Download references

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), School of Computing, Queen’s University, Kingston, ON, Canada
Shaowei Wang & Ahmed E. Hassan
Department of Computer Science and Engineering, Concordia University, Montreal, QC, Canada
Tse-Hsun Chen

Authors

Shaowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tse-Hsun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaowei Wang.

Additional information

Communicated by: Per Runeson

Appendix A: Model Building and Analysis Process

In this appendix, we present the detail of our model building process.

Figure 12 shows an overview of our model building process. We use the R package rms ^{Footnote 17} as the implementation of our logistic regression model. Below, we describe the detailed steps of our model building process.

1. Label Assignment

Since we use a classification model to understand the impact of the studied factors on the speed. We first need to select the questions that are used to build the model and assign the label (i.e., fast-answered question or slow-answered question) to these questions.

As the results shown in Section 5, more than half of the questions were answered within one hour. Thus, the needed time to answer a question is very close (i.e., within minutes) for most questions. Such skewness in the data will have a negative impact on the resulting model (i.e., increase bias).

Figure 13 presents the percentage of the questions that are received in the time window that are around the median cut-off point (i.e., median of TimeToGetAcceptedAnswer). We see that the number of questions that is around the median cut-off point is notably large. For example, 10.8% (6027) of the questions receive an accepted answer within a time window of 5 minutes less or larger than the median value of TimeToGetAcceptedAnswer on Stack Overflow. If we loosen the time window to 20 minutes, 53.4% (29,834) of the questions receive an accepted answer in 20 minutes less or larger than the median value of TimeToGetAcceptedAnswer. In other word, more than half of the questions on Stack Overflow land on the boundary, which probably could result in having a large amount of noise in our built model.

To reduce such noise, we sort the questions based on their needed time to get an accepted answer, and then label the top 20% of questions as the fast-answered questions and bottom 20% of questions as the slow-answered question. This approach intuitively fits with goals of our study (studying the speed of answering where a few minutes difference should not be used to distinguish between a fast-answered question and a slow-answered question). The mean values of TimeToGetAcceptedAnswer of two groups are shown in Table 10. We could observe that the fast-answered questions were answered within 0.1 hours on average, while slow-answered questions needed at least 10 days to be answered.

Table 10 The comparison of mean values of TimeToGetAcceptedAnswer between fast-answered and slow-answered questions

Full size table

2. Normality Adjustment

When building a logistic regression model, the model prefers the explanatory variables to be normally distributed in order to produce a more stable and robust model (Freedman 2005). In our case, most of the studied factors are skewed. All studied factors are considered as highly skewed (i.e., the skewness is larger than 1) (Bulmer 1979) except for Tag_Level_Difference, Tag_Number, Q_Title_Popularity, Mean_Down_Votes, Median_Down_Votes, and Sum_Down_Votes. Therefore, we apply a logarithm transformation [ l n(x + 1)] to all the studied factors to reduce skewness.

3. Correlation & Redundancy Analysis

We remove correlated and redundant factors using the following steps: i) removing factors with zero variance; ii) removing highly correlated factors; iii) and removing redundant factors.

We first remove factors with zero variance, since these factors do not have any contribution to the model. For example, the variance of Median_Down_Votes of Super User is 0, which indicates the value of Median_Down_Votes of the studied Super User data (top 20% and bottom 20%) is unique (i.e., 0 in this case).

Highly correlated factors can cause multicollinearity problems in our model. Thus, we perform a correlation analysis to remove highly correlated factors using a variable clustering analysis technique by following prior studies (Thongtanunam et al. 2016; McIntosh et al. 2016). We construct a hierarchical overview of the correlation among the factors and select one factor from each cluster of highly-correlated variables, i.e., |ρ| > 0.7 (Thongtanunam et al. 2016).

After this step, there remains 28, 28, 26, and 27 factors in the Stack Overflow, Mathematics, Ask Ubuntu, and Super User data, respectively (see the remained factors at Table 4).

Correlation analysis reduces multicollinearity among the factors, but it may not detect all of the redundant factors (i.e., factors that do not have a unique signal relative to the other factors). We remove redundant factors by using the redun function in the R package rms ^{Footnote 18} with the default R ² threshold of 0.9. However, no factors were removed in this step. The final factors are presented in Table 4.

4. Non-linear Term Allocation

When building a logistic regression model, some factors potentially share non-linear relationships with the response variable. However, logistic regression models are mainly used for modeling linear relationships. Thus, we use restricted cubic splines (Harrell 2006) to add the non-linear terms of factors into the model by following prior studies (Thongtanunam et al. 2016; McIntosh et al. 2016). We measure the non-linear relationship by calculating the Spearman multiple ρ ² between the dependent variable y and linear and quadratic forms of each factor(x _i, x i2). A large ρ ² indicates that there is a high chance for a non-linear relationship between a factor and the response variable, which indicates that the factor should be assigned a larger degree of freedom. By observing the rough clustering of the factors according to their ρ ², we cluster the factors into four groups according to the Spearman multiple ρ ² values across the four websites (see Figure 14). We give factors in the first, second, and third groups five, four, and three degrees, respectively.

5. Logistic Regression Model Building

Finally, after selecting the factors and specifying the non-linear terms of the factors, we build our regression models using the preprocessed data. When building the model, we consider text-related question factors, asker factors, answer factors as control variables by including it in the model; an approach that is commonly used in regression models (Miller and Han 2001; Bird et al. 2011; Chen et al. 2012). We use the function lrm in the R package rms as the implementation of logistic regression model and use the rcs function in rms as the implementation of restricted cubic splines.

6. Model assessment

We use AUC and bootstrapping to assess the explanatory power of the logistic regression model (i.e., ability of the model to capture the relationship between the explanatory variables and the response variable). AUC is the area under the Receiver Operating Characteristic (ROC) curve (Han 2005). The area under ROC curve is often used as a measure of the quality of classification models. A random classifier has an AUC of 0.5, while the AUC for a perfect classifier is equal to 1. In practice, most of the regression models have an AUC between 0.5 and 1.

Since AUC can be an overestimation (i.e., higher than it actually is) if the model is overfitted to the data, we further evaluate the stability of our model. Similar to prior work (McIntosh et al. 2016; Thongtanunam et al. 2016), we reduce such overestimation by using a bootstrap-derived approach (Efron 1986). The steps of the bootstrap-derived approach are listed below:

1.
From the original dataset with n records (i.e., 55,853, 70,336, 7,134, and 10,776 for Stack Overflow, Mathematics, Ask Ubuntu, and Super User, respectively), select a bootstrap sample, i.e., a random sample of n records with replacement.
2.
In the bootstrap sample, we build a model using the same allocation of knots as was used in the original dataset.
3.
Apply the model that is built using the bootstrap sample on the bootstrapped and the original datasets. We calculate the AUC for each model.
4.
The optimism is the difference in the AUC of the bootstrap sample and the original sample. Note that optimism is not an absolute value. A positive sign indicates that AUC of the original sample is larger than that of the bootstrap sample; a negative sign indicates that AUC of the bootstrap sample is larger than that of the original sample.

The above process is repeated 1,000 times and the average (mean) optimism is calculated. Small optimism values indicate that the model does not suffer from overfitting.

7. Explanatory Variables Analysis

After our model assessment step, if the AUC value is high and the optimism value is low (i.e., our model can explain the TimeToGetAcceptedAnswer well with low bias), we can then use the model to study the impact of each factor on the TimeToGetAcceptedAnswer. We measure the impact of each factor on the TimeToGetAcceptedAnswer using the Wald χ ² test (Chambers 1991). The Wald χ ² test is commonly used in biostatistic (Harrell 2006) and software engineering (McIntosh et al. 2016; Thongtanunam et al. 2016) research to understand the impact of factors in a model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Chen, TH. & Hassan, A.E. Understanding the factors for fast answers in technical Q&A websites. Empir Software Eng 23, 1552–1593 (2018). https://doi.org/10.1007/s10664-017-9558-5

Download citation

Published: 17 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10664-017-9558-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding the factors for fast answers in technical Q&A websites

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bounties on technical Q&A sites: a case study of Stack Overflow bounties

What is asked about technical debt (TD) on Stack Exchange question-and-answer (Q&A) websites? An observational study

Effects of tag usage on question response time

Notes

References