|Fetty Fitriyanti Lubis||School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Mutaqin||School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Atina Putri||School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Dana Waskita||Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Tri Sulistyaningtyas||Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Arry Akhmad Arman||School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
|Yusep Rosmansyah||School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia|
short-answer grading (ASAG) is a system that aims to help speed up the
assessment process without an instructor’s intervention. Previous research had
successfully built an ASAG system whose performance had a correlation of 0.66
and mean absolute error (MAE) starting from 0.94 with a conventionally graded
set. However, this study had a weakness in the need for more than one reference
answer for each question. It used a string-based equation method and keyword
matching process to measure the sentences’ similarity in order to produce an
assessment rubric. Thus, our study aimed to build a more concise short-answer
automatic scoring system using a single reference answer. The mechanism used a
semantic similarity measurement approach through word embedding techniques and
syntactic analysis to assess the learner’s accuracy. Based on the experiment
results, the semantic similarity approach showed a correlation value of 0.70
and an MAE of 0.70 when compared with the grading reference.
Automated grading; Short answer; Semantic similarity; Syntax analysis; Word embeddings
In education, the assessment of learners is essential for evaluating their knowledge and understanding. Subjective assessment, such as short-answer questions, is the best choice to explore understanding and basic knowledge rather than objective assessment, such as multiple-choice or true/false questions. Short-answer questions require learners to respond by composing and integrating ideas expressed in their own sentences. However, grading short-answer exams has its challenges, especially in manual grading and with large-scale testing. It requires significant time and has problems in the consistency of the assessment. Automated scoring can be used as a feasible solution for the short-answer scoring process. As a solution, we adopted the sentiment analysis process, as in the studies of Santosh and Vardhan (2015), Mahadzir et al. (2018), and Surjandari et al. (2019).
Automatic short-answer grading (ASAG) is the process of evaluating this type of question response through a computer program by matching it with a related reference model (Sahu and Bhowmick, 2020). Unlike automated-essay scoring (AES), automated short-answer scoring (ASAS), another term used for ASAG, emphasizes the content rather than the style (Brew and Leacock, 2015). Therefore, a simple way of assessing short-essay answers is to measure the similarity of short-essay answers to an appropriate answer model. A combination of syntactic and lexical approaches will help the model determine the same semantic meaning in short-essay answers more simply.
Semantic similarity between learner answers (LA) and reference answers (RA) is the focus of many kinds of research related to ASAG (Mohler and Mihalcea, 2009; Mohler et al., 2011; Luchoomun et al., 2019). Three approaches to determine semantic similarity are knowledge-based, corpus-based, and word-embedding-based measures (Gomaa and Fahmy, 2013; Sahu and Bhowmick, 2020). Corpus-based similarity measures determine how many words are alike according to information obtained from large corpora (Gomaa and Fahmy, 2013). Latent semantic analysis (LSA) is the most popular corpus-based similarity technique. LSA assumes that words having close meanings will appear in similar segments of text. LSA uses the concept of a metaphorical “bag of words” that does not consider the actual order in gathering related words (Cutrone et al., 2011; Ratna et al., 2013).
Knowledge-based similarity measures determine how words are related using information derived from semantic networks (Gomaa and Fahmy, 2013). WordNet is the most popular semantic network in the field of measuring knowledge-based similarities among words. However, WordNet has inherent limitations related to the availability of qualified resources; they are not available for all languages, and proper names and domain-specific technical terms are underrepresented (Kenter and De Rijke, 2015).
The word-embedding model has shown successful results in representing words semantically in a vector space initially proposed by Mikolov and various colleagues (Mikolov et al., 2013a; Mikolov, et al., 2013b; see also, Bengio et al., 2003; Levy and Goldberg 2014). Word representation in a vector space reflects the semantics of the words. This paper proposes a semantic similarity calculation method based on this type of word-embedding for grading short-answer responses.
following section reviews related work in automated short-answer scoring.
Section 3 covers our proposed method. Section 4 reports on and analyses the
experimental results. Finally, Section 5 presents the conclusion.
paper explored the semantic similarity approach for automatic short answer
grading. We believe this paper made two significant contributions. First, while
the previous research used multiple answer references, our proposed method used
only a single reference answer. Second, to make our method more influential
than the previous study, we applied syntactic analysis by utilizing POS
tagging, dependency relationships, and the word-embedding method. In the
future, we intend to improve the word2vec model by adding more text corpora as
training model input. Furthermore, we would like to expand the research
problem, especially for short essay answers requiring a sequence of solutions.
research was jointly funded by the Indonesian Ministry of Research, Technology
and Higher Education under the WCU Program managed by the Bandung Institute of
Technology and the Research, Community Service, and Innovation Program (P2MI)
of the Faculty of Arts and Design, Bandung Institute of Technology (ITB). This
study is also partially funded by MIT-Indonesia Research Alliance (MIRA)
IMPACT, to whom the authors are grateful. It was also supported by the Smart
City & Community Innovation Center, Bandung Institute of Technology (ITB).
Arora, S., Liang, Y., Ma, T., 2016. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In: ?International Conference on Learning Representations, pp. 416–424
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C., 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research, Volume 3, pp. 1137–1155
Bin, L., Jun, L., Jian-Min, Y., Qiao-Ming, Z., 2008. Automated Essay Scoring using the KNN Algorithm. In: Proceedings—International Conference on Computer Science and Software Engineering, CSSE 2008, pp. 735–738
Brew, C., Leacock, C., 2013. Automated Short Answer Scoring Principles and Prospects. In: Handbook of Automated Essay Evaluation, Current Applications and New Directions, Routledge, pp. 136–152
Cutrone, L., Chang, M., Kinshuk, 2011. Auto-Assessor: Computerized Assessment System for Marking Student's Short-Answers Automatically. In: Proceedings IEEE International Conference on Technology for Education, T4E 2011, pp. 81–88
Dandibhotla, T.S., Vardhan, B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF Data from Unstructured Reviews using Ontology Based Machine Learning. International Journal of Technology, Volume 6(2), pp. 198–206
Gomaa, W., Fahmy, A., 2013. A Survey of Text Similarity Approaches. International Journal of Computer Applications, Volume 68(13), pp. 13–18
Green, N., Larasati, S.D., Žabokrtský, Z., 2012. Indonesian Dependency Treebank: Annotation and Parsing. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, pp. 137–145
Gutierrez, F., Dou, D., Martini, A., Fickas, S., Zong, H., 2013. Hybrid Ontology-Based Information Extraction for Automated Text Grading. In: International Conference on Machine Learning and Applications, pp. 359–364
Hasanah, U., Astuti, T., Wahyudi, R., Rifai, Z., Pambudi, R.A., 2018. An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. In: Proceedings—Third International Conference on Information Technology, Information Systems and Electrical Engineering, pp. 230–234
Hasanah, U., Permanasari, A.E., Kusumawardani, S.S., Pribadi, F.S., 2019. A Scoring Rubric for Automatic Short Answer Grading System. Telkomnika (Telecommunication Computing Electronics and Control), Volume 17(2), pp. 763–770
Heilman, M., Madnani, N., 2013. ETS: Domain Adaptation and Stacking for Short Answer Scoring. In: SEM 2013—Second Joint Conference on Lexical and Computational Semantics, 2(SemEval), pp. 275–279
Kenter, T., De Rijke, M., 2015. Short Text Similarity with Word Embeddings. In: Proceedings International Conference on Information and Knowledge Management, pp. 1411–1420
Levy, O., Goldberg, Y., 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pp. 171–180
Luchoomun, T., Chumroo, M., Ramnarain-Seetohul, V., 2019. A Knowledge Based System for Automated Assessment of Short Structured Questions. In: IEEE Global Engineering Education Conference, pp. 1349–1352
Mahadzir, N.H., Omar, M.F., Nawi, M.N.M., 2018. A Sentiment Analysis Visualization System for the Property Industry. International Journal of Technology, Volume 9(8), pp. 1609–1617
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., 2013a. Distributed Representations of Words and Phrases and Their Compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems–Volume 2, pp. 3111–3119
Mikolov, T., Yih, W., Zweig, G., 2013b. Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751
Mohler, M., Bunescu, R., Mihalcea, R., 2011. Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments. In: ACL-HLT 2011— Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 752–762
Mohler, M., Mihalcea, R., 2009. Text-to-Text Semantic Similarity for Automatic Short Answer Grading. In: Proceedings EACL 2009—12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 567–575
Ratna, A.A.P., Artajaya, H., Adhi, B.A., 2013. GLSA Based Online Essay Grading System. In: Proceedings of 2013 IEEE International Conference on Teaching, Assessment and Learning for Engineering, pp. 358–361
Roy, S., Bhatt, H.S., Narahari, Y., 2016. An Iterative Transfer Learning Based Ensemble Technique for Automatic Short Answer Grading. ArXiv, abs/1609.0
Sahu, A., Bhowmick, P.K., 2020. Feature Engineering and Ensemble-Based Approach for Improving Automatic Short-Answer Grading Performance. IEEE Transactions on Learning Technologies, Volume 13, pp. 77–90
Sakaguchi, K., Heilman, M., Madnani, N., 2015. Effective Feature Integration for Automated Short Answer Scoring. In: NAACL HLT 2015—2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1049–1054
Santosh, D.T., Vardhan, B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF Data From Unstructured Reviews using Ontology Based Machine Learning. International Journal of Technology, Volume 2, pp. 198-206
Sultan, M.A., Salazar, C., Sumner, T., 2016. Fast and Easy Short Answer Grading with High Accuracy. In: 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016–Proceedings of the Conference, pp. 1070–1075
Surjandari, I., Wayasti, R.A., Zulkarnain, Laoh, E., 2019. Mining Public Opinion on Ride Hailing Service Providers using Aspect Based Sentiment Analysis. International Journal of Technology, Volume 10(4), pp. 818–828
Xu, X., Ye, F., 2017. Sentences Similarity Analysis based on Word Embedding and Syntax Analysis. In: 2017 17th IEEE International Conference on Communication Technology Sentences, pp. 1896–1900