• International Journal of Technology (IJTech)
  • Vol 8, No 5 (2017)

A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences

A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences

Title: A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences
Budi Irmawati, Hiroyuki Shindo, Yuji Matsumoto

Corresponding email:

Published at : 31 Oct 2017
Volume : IJtech Vol 8, No 5 (2017)
DOI : https://doi.org/10.14716/ijtech.v8i5.878

Cite this article as:

Irmawati, B., Shindo, H., Matsumoto, Y., 2017. A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences. International Journal of Technology. Volume 8(5), pp. 957-967

Budi Irmawati Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan Informatics Department, University of Mataram, Mataram 83125, Indonesia
Hiroyuki Shindo Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
Yuji Matsumoto Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
Email to Corresponding Author

A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences

In languages with fixed word orders, syntactic information is useful when solving natural language processing (NLP) problems. In languages like Indonesian, however, which has a relatively free word order, the usefulness of syntactic information has yet to be determined. In this study, a dependency annotation scheme for extracting syntactic features from a sentence is proposed. This annotation scheme adapts the Stanford typed dependency (SD) annotation scheme to cope with such phenomena in the Indonesian language as ellipses, clitics, and non-verb clauses. Later, this adapted annotation scheme is extended in response to the inability to avoid certain ambiguities in assigning heads and relations. The accuracy of these two annotation schemes are then compared, and the usefulness of the extended annotation scheme is assessed using the syntactic features extracted from dependency-annotated sentences in a preposition error correction task. The experimental results indicate that the extended annotation scheme improved the accuracy of a dependency parser, and the error correction task demonstrates that training data using syntactic features obtain better correction than training data that do not use such features, thus lending a positive answer to the research question.

Dependency annotation; Dependency relation; Error correction; Indonesian language; Syntactic information


In many natural language processing (NLP) tasks, along with part-of-speech (PoS) tagging, parsing[1] plays an important role in preprocessing (Buchholz & Marsi, 2006). Kübler et al. (2009) state that dependency grammar is useful in languages with a free or a relatively free word order. However, based on the 2007 Conference on Computational Natural Language Learning (CoNLL) Shared Task on Dependency Parsing’s results, fixed word order languages such as English and Catalan score highest in accuracy, while free word order languages such as Arabic and Basque are less accurate (Nivre et al., 2007).

Indonesian is rich in morphology and has a relatively free word order compared to English (Stack, 2005). It has yet to be confirmed, however, whether syntactic information extracted from dependency relations is useful for NLP tasks in the case of Indonesian. Addressing this question in the present study required building our own dependency parser, as previous  

Indonesian dependency parsers (Kamayani & Purwarianti, 2011; Green et al., 2012) and their dependency annotation schemes cannot be accessed freely.

To build the parser, we developed a dependency annotation scheme for Indonesian and trained an automatic parser. To start, we adapted the Stanford typed dependency (SD) manual (de Marneffe & Manning, 2008) to accommodate relations not covered by the original SD annotation scheme (explained in Subsection 2.1). In doing so, we found a universal dependency annotation scheme (Nivre et al., 2016) that covered Indonesian. However, this scheme did not include the morphological features and stem words relevant to the language, nor did it handle date and time notations (explained in item d of Subsection 2.2.2) or apply language-specific relations like in Danish and Finnish. Moreover, unlike its performance in French, the scheme did not split off clitics in the case of Indonesian. As our chosen scheme could not avoid ambiguities in assigning heads and relations, we extended it.

In this experiment, we calculated the accuracy of our annotation scheme before and after extending it. We also evaluated the effectiveness of our proposed annotation scheme in correcting preposition errors made by second language (L2) learners in real learner data (Irmawati et al., 2016a), explained in item c of Subsection 3.1. We also used this annotation scheme to extract syntactic information from Indonesian sentences and generate more sophisticated artificial error data (Irmawati et al., 2016b; Irmawati et al., 2017).

This dependency annotation scheme is explained in Section 2 by first describing the phenomena in Indonesian, then proposed annotation scheme. Section 3 describes the experiment setting for assessing the extension and the effectiveness of the annotation scheme in a real implementation scenario. Section 4 presents the results of the experiment and provides a discussion.

[1] Parsing is a formal analysis to split a sentence to its constituent. It results a parse tree to show the relation between its constituents.

Experimental Methods

3.1.    Language Tools and Resources

The following language tools and resources were used

a.    Morphind: an Indonesian morphological analyzer system[1] (Larasati et al., 2011) that provides lemmatization information and handles derivations and inflections in a word.

b.    Minimum Spanning Tree (MST) parser (McDonald et al., 2006): this parser served to evaluate the annotation scheme and to parse native sentences to be used as baseline training data for a real implementation (an error correction model).

c.    Learner data: error-corrected learner sentence pairs drawn from the lang-8 website[2]  (Mizumoto et al., 2011). The data contained 6,488 pairs of learners’ sentences and 77,201 tokens. On lang-8, L2 learners write in journals and native speakers correct their sentences. In this study, after we PoS-tagged the learners’ sentences and their corrected versions, we automatically aligned learners’ sentences and their corrections using a heuristic rule to point out the location of errors and the error types (Irmawati et al., 2016a). We used some of the corrected sentences as training data and others as test data to evaluate the annotation scheme. We then used the learners’ sentences as the test data for the error correction model.

d.   Native data: 1m newspaper sentences were taken from the Indonesian part of the leipzig corpora (Quasthoff et al., 2006) as the baseline training data for the error correction model.


3.2.    Pre-processing

To extract the necessary syntactic information, we first tagged all sentences using Morphind. Next, we ran a rule-based script to check whether a word had a clitic; if yes, the clitic was split from its attached word and assigned the clitic with an appropriate PoS tag. The rule-based script also identified adverbs with the suffix -nya, which were not to be processed. We then formatted the data in the CoNLL format (Buchholz & Marsi, 2006).

The final step was to annotate manually the 1,032 corrected learners’ sentences that contained dependency relations. We then separated 100 sentences to be used as test data. To enable syntactic information extraction, we trained the MST parser on the remaining annotated sentences. The trained parser then automatically assigned dependency relations to the native sentences and learners’ sentences. However, to build gold test data annotated with error tags and dependency relations, we had to manually correct the dependency relations of the learners’ sentences assigned automatically by the trained parser. We extracted syntactic features like the head and modifiers of a target word, as well as the PoS tag of its head and its modifiers.


3.3.    Experiment Setting

The goal of these experiments was to evaluate the effectiveness of our defined annotation schemes. First, we evaluated the accuracy of the adapted and extended schemes. We did not compare our annotation schemes with the universal dependency annotation scheme because we used different data and used different annotation scheme that was not comparable. To compare them, we need some conversions which will be our next research. Second, we evaluated the extent to which our annotation schemes improved an error corrections model.

For this evaluation, we compared the model generated from data using the syntactic features with one that did not use those features and tested it on real learner data.

3.3.1.   Evaluation of dependency annotation

To evaluate whether the extensions and chunking were useful to the annotation scheme, we trained and tested an MST parser on the sentences annotated with three annotation schemes as follows:

a.    Adapt: This scheme followed Adapt-1 to Adapt-7 (Subsection 2.2.1). It was the baseline of our evaluation.

b.    Extend1-2: This scheme followed the proposed Extend 1 and Extend 2.

c.    Chunk1-2: This scheme followed Chunk-1 and Chunk-2. In this scheme, we chunked the sentences before assigning heads and labeling the relations.


3.3.2. Evaluation of the error correction task

Before performing an error correction task on a large amount of training data, we conducted a preliminary experiment to evaluate the extended annotation scheme. Two feature sets were compared: (1) a feature set not using syntactic information (nSynFeat); and (2) a feature set using syntactic information (synFeat). The experiment detected five error types (prepositions, adjectives, adverbs, verbs, and nouns) in 550 learners’ manually annotated sentences with error types in five-fold cross-validation.

To obtain a reliable conclusion, 13 preposition errors with error frequencies above five in the learner data were selected for study. We focused on preposition errors because the number of candidate corrections was small compared to other error types; this allowed us to obtain sufficient error samples for each preposition error. From all the learner data, we obtained 382 learners’ sentences containing at least one preposition error (Learner) as the test data.

For the training data, we employed the native sentences (Native) and the generated artificial error sentences. We constructed the artificial error sentences by injecting preposition errors randomly to obtain more erroneous sentences easily. We then generated training data that did not use the syntactic features (nDRndArt), as well as data that did (wDRndArt).

To build the error correction model, we trained a Naïve Bayes classifier. We used the one versus all approach (Rozovskaya & Roth, 2010) to perform a multi class classification. For M target prepositions, we assigned the feature vectors of p prepositions as positive examples and feature vectors of M-1 prepositions as negative examples. Then we used the confident score obtained from each classifier to rank the candidate corrections.

To train our model, we employed context word features in a ±2 window, a bi gram, a tri gram, PoS n-grams, head and object of the preposition, and PoS of the head and object of the preposition. For the native training data, we dropped features that contained the target preposition and, for nDRndArt, we dropped all dependency features from the training data.

For Learner, we trained and tested the model using five-fold cross-validation. We then compared the model trained on Native, nDRndArt, and wDRndArt with the model trained on Learner. We checked whether Learner performed better as training data as had been claimed in previous works on English prepositions (Cahill et al., 2013; Han et al., 2010).

Then, to assess this task, we followed the evaluation metrics proposed by Dahlmeier and Ng (2011), defined as:

where precision is the number of correct prepositions proposed by the system divided by the number of corrections given by the system, and recall is the number of correct prepositions proposed by the system divided by the number of preposition errors.


[1] http://larasati.com.morphind/

[2] http://lang-8.com/

Results and Discussion

4.1.    Evaluation of the Dependency Annotation

Table 1 lists the three annotation schemes evaluated in this experiment. It shows that Extend1-2 and Chunk1-2 improved both the unlabeled attachment score (UAS) and the labeled attachment score (LAS) compared to the baseline Adapt.

Table 1 Dependency parser accuracy of three annotation schemes























Complete: % of sentences whose relations are all correct

Unlabeled Attachment Score (UAS): % of tokens with a correct head

Labeled Attachment Score (LAS): % of tokens with a correct dependency relation


The completeness of the sentences improved as well. The improvements occurred because of our proposed Extend1-2 and Chunk1-2, which provided a more consistent annotation scheme than our Adapt annotation scheme. The Extend1-2 scheme accounted for compound prepositions (in Extend1) and they functioned as propositions introducing an adverb (in Extend2). The Chunk1-2 scheme took care of transition words and date and time notations as mwe. However, Chunk1-2 did not assist the LAS’s completeness, as some labels in one sentence still were not assigned correctly.

4.2.    Evaluation of the Error Correction System

Table 2 presents the preliminary research used to detect five error types. Using a two-tailed test with a confidence interval of 0.95, four error detection models (preposition, adjective, verb, and noun) trained on the data using syntactic information were significantly better than those trained on the data that did not use syntactic information. The results indicate that syntactic information was effective in Indonesian, which has a relatively free word order. This makes our dependency annotation scheme very important for extracting said syntactic information. However, for adverbs, the results were not significantly different because adverb errors were still detectable in the local context (adverbs are usually written not too far from the related verb).


Table 2 Error detection results of five error types

Error Categories

F1 score


















The dependency features significantly improve the F1 score (except for adverbs) by a two-tailed test with a confidence interval of 0.95.


For the preposition error corrections, Figure 3a shows that Learner worked best when the training size was less than 300 instances (See Figure 3b for more detail). However, Learner’s F1 score was below 0.5 because of a lack of training data. Figure 3a shows that Native received the lowest score. It was unable to outperform Learner even when we increased its size to 150K instances. Native performed poorly because it did not include information about error types. If we compare Native with the two sets of artificial training data, both sets of data worked better because the model could learn what was mistaken in the artificial error sentences and correct those errors in real learner data. Further, Figure 3 shows that the artificial error data using syntactic features (wDRndArt) outperformed the other training data. Because extracting syntactic information requires a dependency annotation scheme, the wDRndArt’s results indicate that our dependency annotation scheme does indeed improve the preposition error correction task.


(a) Comparison of large training data

(b) Comparison of small training data

Figure 3 Comparison of preposition error correction results trained on different sizes of training data


Learner uses learner data as the training data. Native uses large amounts of native data as the training data, but cannot outperform Learner. nDRndArt and wDRndArt use artificial training data, with nDRndArt using dependency features and wDRndArt not. (a) nDRndArt outperforms Learner when the data are larger than 15K. The artificial training data using the dependency features of wDRndArt performs the best. (b) Learner performs the best when its size amounts to only 300 sentences.


In this study, a dependency annotation scheme was proposed for Indonesian by extending and chunking an adapted version of the SD annotation scheme. By training and testing an MST parser on our adapted and extended annotation schemes, we confirmed that extension and chunking increased the accuracy and completeness of the adapted annotation scheme. The results demonstrated that our annotations were useful in extracting dependency features for correcting preposition errors in real learner data. We further evaluated the annotation scheme to correct preposition errors using larger amounts of training data.

Our experimental results demonstrate that artificial training data using syntactic features extracted from dependency annotated sentences outperform data that do not make use of syntactic features. We plan to continue this work to improve our annotation scheme and extract more features to solve other NLP problems in Indonesian.


We would like to thank the unknown reviewers, Erlyn Manguilimotan, and Prof. David Sell for their valuable discussion and comments. This study is supported in part by the DGHE, Republic of Indonesia under BPPLN scholarship Batch 7 fiscal year 2012-2015.


Buchholz, S., Marsi, E., 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, USA, pp. 149–164

Cahill, A., Madnani, N., Tetreault, J., Napolitano, D., 2013. Robust Systems for Preposition Error Correction using Wikipedia Revisions. In: Proceedings of the Conference of the North American Conference on Chinese Linguistics: Human Language Technology. Association for Computational Linguistics, Atlanta, Georgia, pp. 507–517

Dahlmeier, D., Ng, H.T., 2011. Grammatical Error Correction with Alternating Structure Optimization. In: Proceedings of the 49thAnnual Meeting of the Association for Computational Linguistics: Human Language Technology -Volume 1. Association for Computational Linguistics, Stroudsburg, USA, pp. 915–923

de Marneffe, M., Manning, C.D., 2008. Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. Association for Computational Linguistics, Stroudsburg, USA, pp. 1–8

Green, N., Larasati, S.D., Z?bokrtský, Z., 2012. Indonesian Dependency Treebank: Annotation and Parsing. In: Proceedings of the 26th Pacific Asia Conference on Language Information and Computation. Faculty of Computer Science, Universitas Indonesia, Bali, Indonesia, pp. 137–145

Han, N., Tetreault, J., Lee, S., Ha, J., 2010. Using an Error-annotated Learner Corpus to Develop an ASL/AFLError Correction System. In: Proceedings of the 7th International Conference on Linguistic Resources Evaluation. European Language Resources Association, Valletta, Malta, pp. 763–770

Irmawati, B., Komachi, M., Matsumoto, Y., 2016a. Towards Construction of an Error-corrected Corpus of Indonesian Second Language Learners. In: Francisco Alonso Almeida, Ivalla Ortega Barrera, Elena Quintana Toledo, Margarita Sánchez Cuervo (Eds), Input a Word, Analyse the World: Selected Approaches to Corpus Linguistics. Cambridge Scholars Publishing, Newcastle upon Tyne, United Kingdom, pp. 425–443

Irmawati, Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2016b. Exploiting Syntactic Similarities for Preposition Error Correction on Indonesian. In: Proceedings of the 5th Workshop on Spoken Language Technologies for Under Resource Languages. International Research Institute Multimedia, Information, Communication & Applications, Jogjakarta, Indonesia, pp. 214–220

Irmawati, Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2017. Generating Artificial Error Data for Indonesian Preposition Error Correction. International Journal of Technology, Volume 8(3), pp. 549–558

Kamayani, M., Purwarianti, A., 2011. Dependency Parsing for Indonesian. In: International Conference on Electrical Engineering and Informatics, pp. 1–5

Kübler, S., McDonald, R., Nivre, J., 2009. Dependency Parsing. Morgan & Claypool Publishers

Larasati, S.D., Kubo?, V., Zeman, D., 2011. Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In: Proceedings of the 2nd International Workshop Systems and Frameworks for Computational Morphology. Zurich, Switzerland, pp. 119–129

McDonald, R., Lerman, K., Pereira, F., 2006. Multilingual Dependency Analysis with a Two-stage Discriminative Parser. In: Proceedings of the 10th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, USA, pp. 216–220

Mizumoto, T., Komachi, M., Nagata, M., Matsumoto, Y., 2011. Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 147–155

Nivre, J., de Marnefe, M., Ginter, F., Goldberg, Y., Haji?, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D., 2016. Universal Dependency Volume 1: A Multilingual Treebank Collection. In: Proceedings of the 10th International Conference on Linguistic Resources Evaluation. European Linguistics Resources Association, Portorož, Slovenia, pp. 1659–1666

Nivre, J., Hall, J., Kübler, S., McDonald, R.T., Nilsson, J., Riedel, S., Yuret, D., 2007. The Conference on Computational Natural Language Learning 2007 Shared Task on Dependency Parsing. In: Proceedings of the Joint Conference on Empirical Method on Natural Language Processing and the Conference on Computational Natural Language Learning. Association for Computational Linguistics, Prague, Czech Republic, pp. 915–932

Quasthoff, U., Richter, M., Biemann, C., 2006. Corpus Portal for Search in Monolingual Corpora. In: Proceedings of the 5th International Conference on Linguistic Resources Evaluation. Genoa, pp. 1799–1802

Rozovskaya, A., Roth, D., 2010. Generating Confusion Sets for Context-sensitive Error Correction. In: Proceedings of the 2010 Conference on Empirical Method on Natural Language Processing. Association for Computational Linguistics, Stroudsburg, USA, pp. 961–970

Sneddon, J.N., Adelaar, A., Djenar, D.N., Ewing, M.C., 2010. Indonesian: A Comprehensive Grammar. Routledge, London, United Kingdom

Stack, M., 2005. Word Order and Intonation in Indonesian. In: Lexical Semantic Ontology Working Papers in Linguistics 5: Proceedings of Workshop in General Linguistics. Milan, Italy, pp. 168–182