Published at : 19 Jul 2021
Volume : IJtech
Vol 12, No 3 (2021)
DOI : https://doi.org/10.14716/ijtech.v12i3.4651
Fetty Fitriyanti Lubis | School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Mutaqin | School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Atina Putri | School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Dana Waskita | Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Tri Sulistyaningtyas | Faculty of Art and Design, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Arry Akhmad Arman | School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Yusep Rosmansyah | School of Electrical Engineering and Informatics, Smart City & Community Innovation Center, Institut Teknologi Bandung, Jl. Ganesa No.10, Kota Bandung 40132, Indonesia |
Automatic
short-answer grading (ASAG) is a system that aims to help speed up the
assessment process without an instructor’s intervention. Previous research had
successfully built an ASAG system whose performance had a correlation of 0.66
and mean absolute error (MAE) starting from 0.94 with a conventionally graded
set. However, this study had a weakness in the need for more than one reference
answer for each question. It used a string-based equation method and keyword
matching process to measure the sentences’ similarity in order to produce an
assessment rubric. Thus, our study aimed to build a more concise short-answer
automatic scoring system using a single reference answer. The mechanism used a
semantic similarity measurement approach through word embedding techniques and
syntactic analysis to assess the learner’s accuracy. Based on the experiment
results, the semantic similarity approach showed a correlation value of 0.70
and an MAE of 0.70 when compared with the grading reference.
Automated grading; Short answer; Semantic similarity; Syntax analysis; Word embeddings
In education, the assessment of learners is essential for evaluating their knowledge and understanding. Subjective assessment, such as short-answer questions, is the best choice to explore understanding and basic knowledge rather than objective assessment, such as multiple-choice or true/false questions. Short-answer questions require learners to respond by composing and integrating ideas expressed in their own sentences. However, grading short-answer exams has its challenges, especially in manual grading and with large-scale testing. It requires significant time and has problems in the consistency of the assessment. Automated scoring can be used as a feasible solution for the short-answer scoring process. As a solution, we adopted the sentiment analysis process, as in the studies of Santosh and Vardhan (2015), Mahadzir et al. (2018), and Surjandari et al. (2019).
Automatic short-answer grading (ASAG) is the process of evaluating this type of question response through a computer program by matching it with a related reference model (Sahu and Bhowmick, 2020). Unlike automated-essay scoring (AES), automated short-answer scoring (ASAS), another term used for ASAG, emphasizes the content rather than the style (Brew and Leacock, 2015). Therefore, a simple way of assessing short-essay answers is to measure the similarity of short-essay answers to an appropriate answer model. A combination of syntactic and lexical approaches will help the model determine the same semantic meaning in short-essay answers more simply.
Semantic similarity between learner
answers (LA) and reference answers (RA) is the focus of many kinds of research
related to ASAG (Mohler and Mihalcea, 2009; Mohler
et al., 2011; Luchoomun et al., 2019).
Three approaches to determine semantic similarity are knowledge-based,
corpus-based, and word-embedding-based measures (Gomaa
and Fahmy, 2013; Sahu and Bhowmick, 2020).
Corpus-based similarity measures determine how many words are alike according
to information obtained from large corpora (Gomaa and
Fahmy, 2013). Latent semantic analysis (LSA) is the most popular corpus-based
similarity technique. LSA assumes that words having close meanings will appear
in similar segments of text. LSA uses the concept of a metaphorical “bag of
words” that does not consider the actual order in gathering related words (Cutrone et al., 2011; Ratna et al., 2013).
Knowledge-based similarity measures
determine how words are related using information derived from semantic
networks (Gomaa and Fahmy, 2013). WordNet is
the most popular semantic network in the field of measuring knowledge-based
similarities among words. However, WordNet has inherent limitations related to
the availability of qualified resources; they are not available for all
languages, and proper names and domain-specific technical terms are
underrepresented (Kenter and De Rijke, 2015).
The word-embedding model has shown
successful results in representing words semantically in a vector space
initially proposed by Mikolov and various colleagues (Mikolov
et al., 2013a; Mikolov, et al., 2013b; see also, Bengio et al., 2003; Levy and
Goldberg 2014). Word representation in a vector space reflects the
semantics of the words. This paper proposes a semantic similarity calculation
method based on this type of word-embedding for grading short-answer responses.
The
following section reviews related work in automated short-answer scoring.
Section 3 covers our proposed method. Section 4 reports on and analyses the
experimental results. Finally, Section 5 presents the conclusion.
This
paper explored the semantic similarity approach for automatic short answer
grading. We believe this paper made two significant contributions. First, while
the previous research used multiple answer references, our proposed method used
only a single reference answer. Second, to make our method more influential
than the previous study, we applied syntactic analysis by utilizing POS
tagging, dependency relationships, and the word-embedding method. In the
future, we intend to improve the word2vec model by adding more text corpora as
training model input. Furthermore, we would like to expand the research
problem, especially for short essay answers requiring a sequence of solutions.
This
research was jointly funded by the Indonesian Ministry of Research, Technology
and Higher Education under the WCU Program managed by the Bandung Institute of
Technology and the Research, Community Service, and Innovation Program (P2MI)
of the Faculty of Arts and Design, Bandung Institute of Technology (ITB). This
study is also partially funded by MIT-Indonesia Research Alliance (MIRA)
IMPACT, to whom the authors are grateful. It was also supported by the Smart
City & Community Innovation Center, Bandung Institute of Technology (ITB).
Arora, S., Liang, Y.,
Ma, T., 2016. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In: ?International Conference on Learning Representations, pp. 416–424
Bengio, Y., Ducharme,
R., Vincent, P., Jauvin, C., 2003. A Neural Probabilistic Language Model. Journal
of Machine Learning Research, Volume 3, pp. 1137–1155
Bin, L., Jun, L.,
Jian-Min, Y., Qiao-Ming, Z., 2008. Automated Essay Scoring using the KNN
Algorithm. In: Proceedings—International
Conference on Computer Science and Software Engineering, CSSE 2008, pp. 735–738
Brew, C., Leacock,
C., 2013. Automated Short Answer Scoring Principles and Prospects. In: Handbook
of Automated Essay Evaluation, Current Applications and New Directions,
Routledge, pp.
136–152
Cutrone, L., Chang,
M., Kinshuk, 2011. Auto-Assessor: Computerized Assessment System for Marking
Student's Short-Answers Automatically. In: Proceedings
IEEE International Conference on Technology for Education, T4E 2011, pp. 81–88
Dandibhotla, T.S.,
Vardhan, B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF
Data from Unstructured Reviews using Ontology Based Machine Learning. International
Journal of Technology, Volume 6(2), pp. 198–206
Gomaa, W., Fahmy, A.,
2013. A Survey of Text Similarity Approaches. International Journal of
Computer Applications, Volume 68(13), pp. 13–18
Green, N., Larasati, S.D.,
Žabokrtský, Z., 2012. Indonesian Dependency Treebank: Annotation and Parsing. In: Proceedings of
the 26th Pacific Asia Conference on Language, Information, and
Computation, pp. 137–145
Gutierrez, F., Dou,
D., Martini, A., Fickas, S., Zong, H., 2013. Hybrid Ontology-Based Information
Extraction for Automated Text Grading. In: International
Conference on Machine Learning and Applications, pp. 359–364
Hasanah, U., Astuti,
T., Wahyudi, R., Rifai, Z., Pambudi, R.A., 2018. An Experimental Study of Text
Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. In: Proceedings—Third International Conference on Information
Technology, Information Systems and Electrical Engineering, pp. 230–234
Hasanah, U.,
Permanasari, A.E., Kusumawardani, S.S., Pribadi, F.S., 2019. A Scoring Rubric
for Automatic Short Answer Grading System. Telkomnika (Telecommunication
Computing Electronics and Control), Volume 17(2), pp. 763–770
Heilman, M., Madnani,
N., 2013. ETS: Domain Adaptation and Stacking for Short Answer Scoring. In: SEM 2013—Second Joint Conference on
Lexical and Computational Semantics, 2(SemEval), pp. 275–279
Kenter, T., De Rijke,
M., 2015. Short Text Similarity with Word Embeddings. In: Proceedings International Conference on
Information and Knowledge Management, pp. 1411–1420
Levy, O., Goldberg,
Y., 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In:
Proceedings of the Eighteenth Conference on Computational Natural Language
Learning, pp. 171–180
Luchoomun, T.,
Chumroo, M., Ramnarain-Seetohul, V., 2019. A Knowledge Based System for
Automated Assessment of Short Structured Questions. In: IEEE Global
Engineering Education Conference, pp. 1349–1352
Mahadzir, N.H., Omar,
M.F., Nawi, M.N.M., 2018. A Sentiment Analysis Visualization System for the
Property Industry. International Journal of Technology, Volume 9(8), pp.
1609–1617
Mikolov, T.,
Sutskever, I., Chen, K., Corrado, G., Dean, J., 2013a. Distributed
Representations of Words and Phrases and Their Compositionality. In:
Proceedings of the 26th International Conference on Neural
Information Processing Systems–Volume 2, pp. 3111–3119
Mikolov, T., Yih, W.,
Zweig, G., 2013b. Linguistic Regularities in Continuous Space Word
Representations. In: Proceedings of the 2013 Conference of the North
American Chapter of the Association for Computational Linguistics: Human
Language Technologies, pp. 746–751
Mohler, M., Bunescu,
R., Mihalcea, R., 2011. Learning to Grade Short Answer Questions using Semantic
Similarity Measures and Dependency Graph Alignments. In: ACL-HLT 2011—
Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, pp. 752–762
Mohler, M., Mihalcea,
R., 2009. Text-to-Text Semantic Similarity for Automatic Short Answer Grading. In:
Proceedings EACL 2009—12th Conference of the European Chapter of
the Association for Computational Linguistics, pp. 567–575
Ratna, A.A.P.,
Artajaya, H., Adhi, B.A., 2013. GLSA Based Online Essay Grading System. In:
Proceedings of 2013 IEEE International Conference on Teaching, Assessment and
Learning for Engineering, pp. 358–361
Roy, S., Bhatt, H.S.,
Narahari, Y., 2016. An Iterative Transfer Learning Based Ensemble Technique for
Automatic Short Answer Grading. ArXiv, abs/1609.0
Sahu, A., Bhowmick,
P.K., 2020. Feature Engineering and Ensemble-Based Approach for Improving
Automatic Short-Answer Grading Performance. IEEE Transactions on Learning
Technologies, Volume 13, pp. 77–90
Sakaguchi, K.,
Heilman, M., Madnani, N., 2015. Effective Feature Integration for Automated
Short Answer Scoring. In: NAACL HLT 2015—2015 Conference of the North
American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Proceedings of the Conference, pp. 1049–1054
Santosh, D.T., Vardhan,
B.V., 2015. Obtaining Feature and Sentiment Based Linked Instance RDF Data From
Unstructured Reviews using Ontology Based Machine Learning. International
Journal of Technology, Volume 2, pp. 198-206
Sultan, M.A.,
Salazar, C., Sumner, T., 2016. Fast and Easy Short Answer Grading with High
Accuracy. In: 2016 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, NAACL
HLT 2016–Proceedings of the Conference, pp. 1070–1075
Surjandari, I.,
Wayasti, R.A., Zulkarnain, Laoh, E., 2019. Mining Public Opinion on Ride
Hailing Service Providers using Aspect Based Sentiment Analysis. International
Journal of Technology, Volume 10(4), pp. 818–828
Xu, X., Ye, F., 2017.
Sentences Similarity Analysis based on Word Embedding and Syntax Analysis. In: 2017 17th
IEEE International Conference on Communication Technology Sentences, pp.
1896–1900