Automated Short-Answer Grading using Semantic Similarity based on Word Embedding

Title: Automated Short-Answer Grading using Semantic Similarity based on Word Embedding

Authors
Authors and Affiliations

Fetty Fitriyanti Lubis, Mutaqin, Atina Putri, Dana Waskita, Tri Sulistyaningtyas, Arry Akhmad Arman, Yusep Rosmansyah

Corresponding email: fettyfitriyanti@gmail.com

Published at : 19 Jul 2021
Volume : IJtech Vol 12, No 3 (2021)
DOI : https://doi.org/10.14716/ijtech.v12i3.4651

Cite this article as:
Lubis, F.F., Mutaqin, Putri, A., Waskita, D., Sulistyaningtyas, T., Arman, A.A., Rosmansyah, Y., 2021. Automated Short-Answer Grading using Semantic Similarity based on Word Embedding. International Journal of Technology. Volume 12(3), pp. 571-581

2,653

Downloads

Fetty Fitriyanti Lubis	School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Mutaqin	School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Atina Putri	School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Dana Waskita	Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Tri Sulistyaningtyas	Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Arry Akhmad Arman	School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia
Yusep Rosmansyah	School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia

Email to Corresponding Author

Abstract

Automated Short-Answer Grading using Semantic Similarity based on Word Embedding

Automatic short-answer grading (ASAG) is a system that aims to help speed up the assessment process without an instructor’s intervention. Previous research had successfully built an ASAG system whose performance had a correlation of 0.66 and mean absolute error (MAE) starting from 0.94 with a conventionally graded set. However, this study had a weakness in the need for more than one reference answer for each question. It used a string-based equation method and keyword matching process to measure the sentences’ similarity in order to produce an assessment rubric. Thus, our study aimed to build a more concise short-answer automatic scoring system using a single reference answer. The mechanism used a semantic similarity measurement approach through word embedding techniques and syntactic analysis to assess the learner’s accuracy. Based on the experiment results, the semantic similarity approach showed a correlation value of 0.70 and an MAE of 0.70 when compared with the grading reference.

Keywords

Automated grading; Short answer; Semantic similarity; Syntax analysis; Word embeddings

Introduction

In education, the assessment of learners is essential for evaluating their knowledge and understanding. Subjective assessment, such as short-answer questions, is the best choice to explore understanding and basic knowledge rather than objective assessment, such as multiple-choice or true/false questions. Short-answer questions require learners to respond by composing and integrating ideas expressed in their own sentences. However, grading short-answer exams has its challenges, especially in manual grading and with large-scale testing. It requires significant time and has problems in the consistency of the assessment. Automated scoring can be used as a feasible solution for the short-answer scoring process. As a solution, we adopted the sentiment analysis process, as in the studies of Santosh and Vardhan (2015), Mahadzir et al. (2018), and Surjandari et al. (2019).

Automatic short-answer grading (ASAG) is the process of evaluating this type of question response through a computer program by matching it with a related reference model (Sahu and Bhowmick, 2020). Unlike automated-essay scoring (AES), automated short-answer scoring (ASAS), another term used for ASAG, emphasizes the content rather than the style (Brew and Leacock, 2015). Therefore, a simple way of assessing short-essay answers is to measure the similarity of short-essay answers to an appropriate answer model. A combination of syntactic and lexical approaches will help the model determine the same semantic meaning in short-essay answers more simply.

Semantic similarity between learner answers (LA) and reference answers (RA) is the focus of many kinds of research related to ASAG (Mohler and Mihalcea, 2009; Mohler et al., 2011; Luchoomun et al., 2019). Three approaches to determine semantic similarity are knowledge-based, corpus-based, and word-embedding-based measures (Gomaa and Fahmy, 2013; Sahu and Bhowmick, 2020). Corpus-based similarity measures determine how many words are alike according to information obtained from large corpora (Gomaa and Fahmy, 2013). Latent semantic analysis (LSA) is the most popular corpus-based similarity technique. LSA assumes that words having close meanings will appear in similar segments of text. LSA uses the concept of a metaphorical “bag of words” that does not consider the actual order in gathering related words (Cutrone et al., 2011; Ratna et al., 2013).

Knowledge-based similarity measures determine how words are related using information derived from semantic networks (Gomaa and Fahmy, 2013). WordNet is the most popular semantic network in the field of measuring knowledge-based similarities among words. However, WordNet has inherent limitations related to the availability of qualified resources; they are not available for all languages, and proper names and domain-specific technical terms are underrepresented (Kenter and De Rijke, 2015).

The word-embedding model has shown successful results in representing words semantically in a vector space initially proposed by Mikolov and various colleagues (Mikolov et al., 2013a; Mikolov, et al., 2013b; see also, Bengio et al., 2003; Levy and Goldberg 2014). Word representation in a vector space reflects the semantics of the words. This paper proposes a semantic similarity calculation method based on this type of word-embedding for grading short-answer responses.

The following section reviews related work in automated short-answer scoring. Section 3 covers our proposed method. Section 4 reports on and analyses the experimental results. Finally, Section 5 presents the conclusion.

Conclusion

This paper explored the semantic similarity approach for automatic short answer grading. We believe this paper made two significant contributions. First, while the previous research used multiple answer references, our proposed method used only a single reference answer. Second, to make our method more influential than the previous study, we applied syntactic analysis by utilizing POS tagging, dependency relationships, and the word-embedding method. In the future, we intend to improve the word2vec model by adding more text corpora as training model input. Furthermore, we would like to expand the research problem, especially for short essay answers requiring a sequence of solutions.

Acknowledgement

This research was jointly funded by the Indonesian Ministry of Research, Technology and Higher Education under the WCU Program managed by the Bandung Institute of Technology and the Research, Community Service, and Innovation Program (P2MI) of the Faculty of Arts and Design, Bandung Institute of Technology (ITB). This study is also partially funded by MIT-Indonesia Research Alliance (MIRA) IMPACT, to whom the authors are grateful. It was also supported by the Smart City & Community Innovation Center, Bandung Institute of Technology (ITB).

References

Arora, S., Liang, Y., Ma, T., 2016. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In: ?International Conference on Learning Representations, pp. 416–424

Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C., 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research, Volume 3, pp. 1137–1155

Bin, L., Jun, L., Jian-Min, Y., Qiao-Ming, Z., 2008. Automated Essay Scoring using the KNN Algorithm. In: Proceedings—International Conference on Computer Science and Software Engineering, CSSE 2008, pp. 735–738

Brew, C., Leacock, C., 2013. Automated Short Answer Scoring Principles and Prospects. In: Handbook of Automated Essay Evaluation, Current Applications and New Directions, Routledge, pp. 136–152

Cutrone, L., Chang, M., Kinshuk, 2011. Auto-Assessor: Computerized Assessment System for Marking Student's Short-Answers Automatically. In: Proceedings IEEE International Conference on Technology for Education, T4E 2011, pp. 81–88

Dandibhotla, T.S., Vardhan, B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF Data from Unstructured Reviews using Ontology Based Machine Learning. International Journal of Technology, Volume 6(2), pp. 198–206

Gomaa, W., Fahmy, A., 2013. A Survey of Text Similarity Approaches. International Journal of Computer Applications, Volume 68(13), pp. 13–18

Green, N., Larasati, S.D., Žabokrtský, Z., 2012. Indonesian Dependency Treebank: Annotation and Parsing. In: Proceedings of the 26^th Pacific Asia Conference on Language, Information, and Computation, pp. 137–145

Gutierrez, F., Dou, D., Martini, A., Fickas, S., Zong, H., 2013. Hybrid Ontology-Based Information Extraction for Automated Text Grading. In: International Conference on Machine Learning and Applications, pp. 359–364

Hasanah, U., Astuti, T., Wahyudi, R., Rifai, Z., Pambudi, R.A., 2018. An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. In: Proceedings—Third International Conference on Information Technology, Information Systems and Electrical Engineering, pp. 230–234

Hasanah, U., Permanasari, A.E., Kusumawardani, S.S., Pribadi, F.S., 2019. A Scoring Rubric for Automatic Short Answer Grading System. Telkomnika (Telecommunication Computing Electronics and Control), Volume 17(2), pp. 763–770

Heilman, M., Madnani, N., 2013. ETS: Domain Adaptation and Stacking for Short Answer Scoring. In: SEM 2013—Second Joint Conference on Lexical and Computational Semantics, 2(SemEval), pp. 275–279

Kenter, T., De Rijke, M., 2015. Short Text Similarity with Word Embeddings. In: Proceedings International Conference on Information and Knowledge Management, pp. 1411–1420

Levy, O., Goldberg, Y., 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pp. 171–180

Luchoomun, T., Chumroo, M., Ramnarain-Seetohul, V., 2019. A Knowledge Based System for Automated Assessment of Short Structured Questions. In: IEEE Global Engineering Education Conference, pp. 1349–1352

Mahadzir, N.H., Omar, M.F., Nawi, M.N.M., 2018. A Sentiment Analysis Visualization System for the Property Industry. International Journal of Technology, Volume 9(8), pp. 1609–1617

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., 2013a. Distributed Representations of Words and Phrases and Their Compositionality. In: Proceedings of the 26^th International Conference on Neural Information Processing Systems–Volume 2, pp. 3111–3119

Mikolov, T., Yih, W., Zweig, G., 2013b. Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751

Mohler, M., Bunescu, R., Mihalcea, R., 2011. Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments. In: ACL-HLT 2011— Proceedings of the 49^th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 752–762

Mohler, M., Mihalcea, R., 2009. Text-to-Text Semantic Similarity for Automatic Short Answer Grading. In: Proceedings EACL 2009—12^th Conference of the European Chapter of the Association for Computational Linguistics, pp. 567–575

Ratna, A.A.P., Artajaya, H., Adhi, B.A., 2013. GLSA Based Online Essay Grading System. In: Proceedings of 2013 IEEE International Conference on Teaching, Assessment and Learning for Engineering, pp. 358–361

Roy, S., Bhatt, H.S., Narahari, Y., 2016. An Iterative Transfer Learning Based Ensemble Technique for Automatic Short Answer Grading. ArXiv, abs/1609.0

Sahu, A., Bhowmick, P.K., 2020. Feature Engineering and Ensemble-Based Approach for Improving Automatic Short-Answer Grading Performance. IEEE Transactions on Learning Technologies, Volume 13, pp. 77–90

Sakaguchi, K., Heilman, M., Madnani, N., 2015. Effective Feature Integration for Automated Short Answer Scoring. In: NAACL HLT 2015—2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1049–1054

Santosh, D.T., Vardhan, B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF Data From Unstructured Reviews using Ontology Based Machine Learning. International Journal of Technology, Volume 2, pp. 198-206

Sultan, M.A., Salazar, C., Sumner, T., 2016. Fast and Easy Short Answer Grading with High Accuracy. In: 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016–Proceedings of the Conference, pp. 1070–1075

Surjandari, I., Wayasti, R.A., Zulkarnain, Laoh, E., 2019. Mining Public Opinion on Ride Hailing Service Providers using Aspect Based Sentiment Analysis. International Journal of Technology, Volume 10(4), pp. 818–828

Xu, X., Ye, F., 2017. Sentences Similarity Analysis based on Word Embedding and Syntax Analysis. In: 2017 17^th IEEE International Conference on Communication Technology Sentences, pp. 1896–1900

Download PDF

Who cite this paper

Table of Contents