Abstract
Human Papillomavirus (HPV) infection is known as the main factor for cervical cancer, where cervical cancer is a leading cause of cancer deaths in women worldwide. Because there are more than 100 types in HPV, it is critical to discriminate the HPVs related with cervical cancer from those not related with it. In this paper, we classify the risk type of HPVs using their textual explanation. The important issue in this problem is to distinguish false negatives from false positives. That is, we must find out high-risk HPVs though we may miss some low-risk HPVs. For this purpose, the AdaCost, a cost-sensitive learner is adopted to consider different costs between training examples. The experimental results on the HPV sequence database show that considering costs gives higher performance. The F-score is higher than the accuracy, which implies that most high-risk HPVs are found.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chan, S., Chew, S., Egawa, K., Grussendorf-Conen, E., Honda, Y., Rubben, A., Tan, K., Bernard, H.: Phylogenetic Analysis of the Human Papillomavirus Type 2 (HPV-2), HPV-27, and HPV-57 Group, Which is Associated with Common Warts. Virology 239, 296–302 (1997)
Fan, W., Stolfo, S., Zhang, J., Chan, P.: AdaCost: Misclassification Cost-Sensitive Boosting. In: Proceedings of the 16th International Conference on Machine Learning, pp. 97–105 (1999)
Favre, M., Kremsdorf, D., Jablonska, S., Obalek, S., Pehau-Arnaudet, G., Croissant, O., Orth, G.: Two New Human Papillomavirus Types (HPV54 and 55) Characterized from Genital Tumours Illustrate the Plurality of Genital HPVs. International Journal of Cancer 45, 40–46 (1990)
Furumoto, H., Irahara, M.: Human Papilloma Virus (HPV) and Cervical Cancer. The Jounral of Medical Investigation 49(3–4), 124–133 (2002)
Ishiji, T.: Molecular Mechanism of Carcinogenesis by Human Papillomavirus-16. The Journal of Dermatology 27(2), 73–86 (2000)
Janicek, M., Averette, H.: Cervical Cancer: Prevention, Diagnosis, and Therapeutics. Cancer Journal for Clinicians 51, 92–114 (2001)
Kim, Y.-H., Hahn, S.-Y., Zhang, B.-T.: Text Filtering by Boosting Naive Bayes Classifiers. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 168–175 (2000)
McCallum, A., Nigam, K.: Empolying EM in Pool-based Active Learning for Text Classification. In: Proceedings of the 15th International Conference on Machine Learning, pp. 350–358 (1998)
Meyer, T., Arndt, R., Christophers, E., Beckmann, E., Schroder, S., Gissmann, L., Stockfleth, E.: Association of Rare Human Papillomavirus Types with Genital Premalignant and Malignant Lesions. The Journal of Infectious Diseases 178, 252–255 (1998)
Nuovo, G., Crum, C., De Villiers, E., Levine, R., Silverstein, S.: Isolation of a Novel Human Papillomavirus (Type 51) from a Cervical Condyloma. Journal of Virology 62, 1452–1455 (1988)
Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison Under Imprecise Class and Cost Distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Park, S.-B., Zhang, B.-T.: A Boosted Maximum Entropy Model for Learning Text Chunking. In: Proceedings of the 19th Internatinal Conference on Machine Learning, pp. 482–489 (2002)
Ting, K.-M., Zheng, Z.: Boosting Trees for Cost-Sensitive Classifications. In: Proceedings of the 10th European Conference on Machine Learning, pp. 190–195 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, S.B., Hwang, S., Zhang, B.T. (2003). Mining the Risk Types of Human Papillomavirus (HPV) by AdaCost. In: MaÅ™Ãk, V., Retschitzegger, W., Å tÄ›pánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-45227-0_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive