Published at : 31 Oct 2017
Volume : IJtech
Vol 8, No 5 (2017)
DOI : https://doi.org/10.14716/ijtech.v8i5.878
Irmawati, B., Shindo, H., Matsumoto, Y., 2017. A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences. International Journal of Technology. Volume 8(5), pp. 957-967
Budi Irmawati | Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan Informatics Department, University of Mataram, Mataram 83125, Indonesia |
Hiroyuki Shindo | Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan |
Yuji Matsumoto | Computational Linguistics laboratory, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan |
In languages with
fixed word orders, syntactic information is useful when solving natural
language processing (NLP) problems. In languages like Indonesian, however,
which has a relatively free word order, the usefulness of syntactic information
has yet to be determined. In this study, a dependency annotation scheme for
extracting syntactic features from a sentence is proposed. This annotation
scheme adapts the Stanford typed dependency (SD) annotation scheme to cope with
such phenomena in the Indonesian language as ellipses, clitics, and non-verb
clauses. Later, this adapted annotation scheme is extended in response to the
inability to avoid certain ambiguities in assigning heads and relations. The
accuracy of these two annotation schemes are then compared, and the usefulness
of the extended annotation scheme is assessed using the syntactic features
extracted from dependency-annotated sentences in a preposition error correction
task. The experimental results indicate that the extended annotation scheme
improved the accuracy of a dependency parser, and the error correction task
demonstrates that training data using syntactic features obtain better
correction than training data that do not use such features, thus lending a
positive answer to the research question.
Dependency annotation; Dependency relation; Error correction; Indonesian language; Syntactic information
In many natural
language processing (NLP) tasks, along with part-of-speech (PoS) tagging,
parsing[1]
plays an important role in preprocessing (Buchholz & Marsi, 2006). Kübler
et al. (2009) state that dependency grammar is useful in languages with a free
or a relatively free word order. However, based on the 2007 Conference on
Computational Natural Language Learning (CoNLL) Shared Task on Dependency
Parsing’s results, fixed word order languages such as English and Catalan score
highest in accuracy, while free word order languages such as Arabic and Basque
are less accurate (Nivre et al., 2007).
Indonesian is rich
in morphology and has a relatively free word order compared to English (Stack,
2005). It has yet to be confirmed, however, whether syntactic information
extracted from dependency relations is useful for NLP tasks in the case of
Indonesian. Addressing this question in the present study required building our
own dependency parser, as previous
Indonesian dependency
parsers (Kamayani & Purwarianti, 2011; Green et al., 2012) and their
dependency annotation schemes cannot be accessed freely.
To build the parser, we developed a dependency annotation scheme for Indonesian
and trained an automatic parser. To start, we adapted the Stanford typed
dependency (SD) manual (de Marneffe & Manning, 2008) to accommodate
relations not covered by the original SD annotation scheme (explained in Subsection
2.1). In doing so, we found a universal dependency annotation scheme (Nivre et
al., 2016) that covered Indonesian. However, this scheme did not include the
morphological features and stem words relevant to the language, nor did it
handle date and time notations (explained in item d of Subsection 2.2.2) or
apply language-specific relations like in Danish and Finnish. Moreover, unlike
its performance in French, the scheme did not split off clitics in the case of Indonesian.
As our chosen scheme could not avoid ambiguities in assigning heads and
relations, we extended it.
In this experiment, we calculated the accuracy of our annotation scheme
before and after extending it. We also evaluated the effectiveness of our
proposed annotation scheme in correcting preposition errors made by second
language (L2) learners in real learner data (Irmawati et al., 2016a), explained
in item c of Subsection 3.1. We also used this annotation scheme to extract
syntactic information from Indonesian sentences and generate more sophisticated
artificial error data (Irmawati et al., 2016b; Irmawati et al., 2017).
This dependency annotation scheme is explained in Section 2 by first describing the phenomena in Indonesian, then proposed annotation scheme. Section 3 describes the experiment setting for assessing the extension and the effectiveness of the annotation scheme in a real implementation scenario. Section 4 presents the results of the experiment and provides a discussion.
[1] Parsing
is a formal analysis to split a sentence to its constituent. It results a parse
tree to show the relation between its constituents.
3.1. Language Tools and
Resources
The following language tools and resources were used
a. Morphind: an Indonesian morphological analyzer system[1]
(Larasati et al., 2011)
that provides lemmatization information and handles derivations and inflections
in a word.
b. Minimum Spanning Tree (MST) parser (McDonald et al., 2006): this
parser served to evaluate the annotation scheme and to parse native sentences
to be used as baseline training data for a real implementation (an error
correction model).
c. Learner data: error-corrected learner sentence pairs drawn from
the lang-8 website[2] (Mizumoto et al., 2011).
The data contained 6,488 pairs of learners’ sentences and 77,201 tokens. On
lang-8, L2 learners write in journals and native speakers correct their
sentences. In this study, after we PoS-tagged the learners’ sentences and their
corrected versions, we automatically aligned learners’ sentences and their
corrections using a heuristic rule to point out the location of errors and the
error types (Irmawati et al., 2016a).
We used some of the corrected sentences as training data and others as test
data to evaluate the annotation scheme. We then used the learners’ sentences as
the test data for the error correction model.
d.
Native data: 1m newspaper sentences
were taken from the Indonesian part of the leipzig corpora (Quasthoff et al., 2006) as the baseline training
data for the error correction model.
3.2. Pre-processing
To extract the necessary
syntactic information, we first tagged all sentences using Morphind. Next, we
ran a rule-based script to check whether a word had a clitic; if yes, the
clitic was split from its attached word and assigned the clitic with an
appropriate PoS tag. The rule-based script also identified adverbs with the
suffix -nya, which were not to be
processed. We then formatted the data in the CoNLL format (Buchholz &
Marsi, 2006).
The final step was to annotate manually the 1,032
corrected learners’ sentences that contained dependency relations. We then
separated 100 sentences to be used as test data. To enable syntactic
information extraction, we trained the MST parser on the remaining annotated
sentences. The trained parser then automatically assigned dependency relations
to the native sentences and learners’ sentences. However, to build gold test
data annotated with error tags and dependency relations, we had to manually
correct the dependency relations of the learners’ sentences assigned
automatically by the trained parser. We extracted syntactic features like the
head and modifiers of a target word, as well as the PoS tag of its head and its
modifiers.
3.3. Experiment
Setting
The goal of these experiments
was to evaluate the effectiveness of our defined annotation schemes. First, we
evaluated the accuracy of the adapted and extended schemes. We did not compare
our annotation schemes with the universal dependency annotation scheme because
we used different data and used different annotation scheme that was not
comparable. To compare them, we need some conversions which will be our next
research. Second, we evaluated the extent to which our annotation schemes
improved an error corrections model.
For this
evaluation, we compared the model generated from data using the syntactic
features with one that did not use those features and tested it on real learner
data.
3.3.1. Evaluation of dependency annotation
To evaluate whether the
extensions and chunking were useful to the annotation scheme, we trained and
tested an MST parser on the sentences annotated with three annotation schemes
as follows:
a. Adapt: This scheme
followed Adapt-1 to Adapt-7 (Subsection 2.2.1). It was the
baseline of our evaluation.
b. Extend1-2: This scheme
followed the proposed Extend 1 and Extend 2.
c. Chunk1-2: This scheme followed
Chunk-1 and Chunk-2. In this scheme, we chunked the sentences before assigning
heads and labeling the relations.
3.3.2. Evaluation of the error correction task
Before performing an error
correction task on a large amount of training data, we conducted a preliminary
experiment to evaluate the extended annotation scheme. Two feature sets were
compared: (1) a feature set not using syntactic information (nSynFeat); and (2) a feature set using
syntactic information (synFeat). The
experiment detected five error types (prepositions, adjectives, adverbs, verbs,
and nouns) in 550 learners’ manually annotated sentences with error types in
five-fold cross-validation.
To obtain a
reliable conclusion, 13 preposition errors with error frequencies above five in
the learner data were selected for study. We focused on preposition errors
because the number of candidate corrections was small compared to other error
types; this allowed us to obtain sufficient error samples for each preposition
error. From all the learner data, we obtained 382 learners’ sentences
containing at least one preposition error (Learner) as the test data.
For the training
data, we employed the native sentences (Native) and the generated artificial
error sentences. We constructed the artificial error sentences by injecting
preposition errors randomly to obtain more erroneous sentences easily. We then
generated training data that did not use the syntactic features (nDRndArt), as well as data that did (wDRndArt).
To build the error
correction model, we trained a Naïve Bayes classifier. We used the one versus
all approach (Rozovskaya & Roth, 2010) to perform a multi class
classification. For M target prepositions, we assigned the feature vectors of p
prepositions as positive examples and feature vectors of M-1 prepositions as
negative examples. Then we used the confident score obtained from each
classifier to rank the candidate corrections.
To train our
model, we employed context word features in a ±2 window, a bi gram, a tri gram,
PoS n-grams, head and object of the
preposition, and PoS of the head and object of the preposition. For the native
training data, we dropped features that contained the target preposition and,
for nDRndArt, we dropped all
dependency features from the training data.
For Learner, we trained and tested the model
using five-fold cross-validation. We then compared the model trained on Native, nDRndArt, and wDRndArt
with the model trained on Learner. We
checked whether Learner performed
better as training data as had been claimed in previous works on English
prepositions (Cahill et al., 2013; Han et al., 2010).
Then, to assess
this task, we followed the evaluation metrics proposed by Dahlmeier and Ng
(2011), defined as:
where precision is the number
of correct prepositions proposed by the system divided by the number of
corrections given by the system, and recall is the number of correct
prepositions proposed by the system divided by the number of preposition
errors.
4.1. Evaluation of the Dependency Annotation
Table 1 lists the three
annotation schemes evaluated in this experiment. It shows that Extend1-2 and Chunk1-2 improved both the unlabeled attachment score (UAS) and the
labeled attachment score (LAS) compared to the baseline Adapt.
Table 1 Dependency parser accuracy of three
annotation schemes
Annotations |
Accuracy |
Complete |
|||
UAS |
LAS |
UAS |
LAS |
||
Adapt |
0.646 |
0.576 |
0.217 |
0.130 |
|
Extend1-2 |
0.788 |
0.707 |
0.357 |
0.214 |
|
Chunk1-2 |
0.812 |
0.731 |
0.394 |
0.214 |
|
Complete: % of sentences whose relations are all
correct |
|||||
Unlabeled
Attachment Score (UAS): % of tokens with a correct head |
|||||
Labeled
Attachment Score (LAS): % of tokens with a correct dependency relation |
|||||
The completeness of the sentences improved as well. The
improvements occurred because of our proposed Extend1-2 and Chunk1-2,
which provided a more consistent annotation scheme than our Adapt annotation
scheme. The Extend1-2 scheme
accounted for compound prepositions (in Extend1)
and they functioned as propositions introducing an adverb (in Extend2). The Chunk1-2 scheme took care of transition words and date and time
notations as mwe. However, Chunk1-2 did not assist the LAS’s
completeness, as some labels in one sentence still were not assigned correctly.
4.2.
Evaluation of the Error Correction System
Table 2 presents the preliminary research used to detect
five error types. Using a two-tailed test with a confidence interval of 0.95,
four error detection models (preposition, adjective, verb, and noun) trained on
the data using syntactic information were significantly better than those
trained on the data that did not use syntactic information. The results
indicate that syntactic information was effective in Indonesian, which has a
relatively free word order. This makes our dependency annotation scheme very
important for extracting said syntactic information. However, for adverbs, the
results were not significantly different because adverb errors were still
detectable in the local context (adverbs are usually written not too far from
the related verb).
Table 2 Error detection results of five error types
Error Categories |
F1
score |
|
nSynFeat |
synFeat |
|
Preposition |
0.135 |
0.372 |
Adjective |
0.292 |
0.338 |
Adverb |
0.351 |
0.357 |
Verb |
0.471 |
0.584 |
Noun |
0.093 |
0.132 |
The
dependency features significantly improve the F1 score (except for adverbs) by
a two-tailed test with a confidence interval of 0.95.
For the preposition error corrections, Figure 3a shows
that Learner worked best when the
training size was less than 300 instances (See Figure 3b for more detail).
However, Learner’s F1 score
was below 0.5 because of a lack of training data. Figure 3a shows that Native received the lowest score. It was
unable to outperform Learner even
when we increased its size to 150K instances. Native performed poorly because it did not include information
about error types. If we compare Native
with the two sets of artificial training data, both sets of data worked better
because the model could learn what was mistaken in the artificial error
sentences and correct those errors in real learner data. Further, Figure 3
shows that the artificial error data using syntactic features (wDRndArt) outperformed the other
training data. Because extracting syntactic information requires a dependency
annotation scheme, the wDRndArt’s
results indicate that our dependency annotation scheme does indeed improve the
preposition error correction task.
|
|
(a) Comparison of large training data |
(b) Comparison of small training data |
Figure 3 Comparison of preposition error correction
results trained on different sizes of training data
Learner
uses learner data as the training data. Native
uses large amounts of native data as the training data, but cannot outperform Learner. nDRndArt and wDRndArt use
artificial training data, with nDRndArt
using dependency features and wDRndArt
not. (a) nDRndArt outperforms Learner when the data are larger than 15K. The
artificial training data using the dependency features of wDRndArt performs the best. (b)
Learner performs the best when its
size amounts to only 300 sentences.
In
this study, a dependency annotation scheme was proposed for Indonesian by
extending and chunking an adapted version of the SD annotation scheme. By
training and testing an MST parser on our adapted and extended annotation
schemes, we confirmed that extension and chunking increased the accuracy and
completeness of the adapted annotation scheme. The results demonstrated that
our annotations were useful in extracting dependency features for correcting
preposition errors in real learner data. We further evaluated the annotation
scheme to correct preposition errors using larger amounts of training data.
Our
experimental results demonstrate that artificial training data using syntactic
features extracted from dependency annotated sentences outperform data that do
not make use of syntactic features. We plan to continue this work to improve
our annotation scheme and extract more features to solve other NLP problems in
Indonesian.
We would like to thank the
unknown reviewers, Erlyn Manguilimotan, and Prof. David Sell for their
valuable discussion and comments. This study is supported in part by the DGHE, Republic of Indonesia under
BPPLN scholarship Batch 7 fiscal year 2012-2015.
Buchholz,
S., Marsi, E., 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of the 10th
Conference on Computational Natural Language Learning. Association for
Computational Linguistics, Stroudsburg, USA, pp. 149–164
Cahill,
A., Madnani, N., Tetreault, J., Napolitano, D., 2013. Robust Systems for
Preposition Error Correction using
Wikipedia Revisions. In: Proceedings
of the Conference of the North American Conference on Chinese Linguistics:
Human Language Technology. Association for Computational Linguistics, Atlanta,
Georgia, pp. 507–517
Dahlmeier,
D., Ng, H.T., 2011. Grammatical Error Correction with Alternating Structure
Optimization. In: Proceedings of the
49thAnnual Meeting of the Association for Computational Linguistics: Human
Language Technology -Volume 1. Association for Computational Linguistics,
Stroudsburg, USA, pp. 915–923
de
Marneffe, M., Manning, C.D., 2008. Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on
Cross-Framework and Cross-Domain Parser Evaluation. Association for
Computational Linguistics, Stroudsburg, USA, pp. 1–8
Green,
N., Larasati, S.D., Z?bokrtský, Z., 2012. Indonesian Dependency Treebank:
Annotation and Parsing. In: Proceedings
of the 26th Pacific Asia Conference on Language Information and Computation.
Faculty of Computer Science, Universitas Indonesia, Bali, Indonesia, pp. 137–145
Han,
N., Tetreault, J., Lee, S., Ha, J., 2010. Using an Error-annotated Learner
Corpus to Develop an ASL/AFLError Correction System. In: Proceedings of the 7th International Conference on
Linguistic Resources Evaluation. European Language Resources Association,
Valletta, Malta, pp. 763–770
Irmawati,
B., Komachi, M., Matsumoto, Y., 2016a. Towards Construction of an
Error-corrected Corpus of Indonesian Second Language Learners. In: Francisco Alonso Almeida, Ivalla
Ortega Barrera, Elena Quintana Toledo, Margarita Sánchez Cuervo (Eds), Input a
Word, Analyse the World: Selected Approaches to Corpus Linguistics. Cambridge
Scholars Publishing, Newcastle upon Tyne, United Kingdom, pp. 425–443
Irmawati,
Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2016b. Exploiting Syntactic
Similarities for Preposition Error Correction on Indonesian. In: Proceedings of the 5th Workshop on
Spoken Language Technologies for Under Resource Languages. International
Research Institute Multimedia, Information, Communication & Applications,
Jogjakarta, Indonesia, pp. 214–220
Irmawati,
Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2017. Generating Artificial Error Data
for Indonesian Preposition Error Correction. International Journal of Technology, Volume 8(3), pp. 549–558
Kamayani,
M., Purwarianti, A., 2011. Dependency Parsing for Indonesian. In: International Conference on
Electrical Engineering and Informatics, pp. 1–5
Kübler,
S., McDonald, R., Nivre, J., 2009. Dependency
Parsing. Morgan & Claypool Publishers
Larasati,
S.D., Kubo?, V., Zeman, D., 2011. Indonesian Morphology Tool (MorphInd):
Towards an Indonesian Corpus. In: Proceedings
of the 2nd International Workshop Systems and Frameworks for Computational Morphology. Zurich,
Switzerland, pp. 119–129
McDonald,
R., Lerman, K., Pereira, F., 2006. Multilingual Dependency Analysis with a
Two-stage Discriminative Parser. In: Proceedings
of the 10th Conference on Computational Natural Language Learning.
Association for Computational Linguistics, Stroudsburg, USA, pp. 216–220
Mizumoto,
T., Komachi, M., Nagata, M., Matsumoto, Y., 2011. Mining Revision Log of
Language Learning SNS for Automated Japanese Error Correction of Second
Language Learners. In: Proceedings of
the 5th International Joint Conference on Natural Language
Processing. Asian Federation of Natural Language Processing, Chiang Mai,
Thailand, pp. 147–155
Nivre,
J., de Marnefe, M., Ginter, F., Goldberg, Y., Haji?, J., Manning, C.D.,
McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.,
2016. Universal Dependency Volume 1: A Multilingual Treebank Collection. In: Proceedings of the 10th
International Conference on Linguistic Resources Evaluation. European
Linguistics Resources Association, Portorož, Slovenia, pp. 1659–1666
Nivre,
J., Hall, J., Kübler, S., McDonald, R.T., Nilsson, J., Riedel, S., Yuret, D.,
2007. The Conference on Computational Natural Language Learning 2007 Shared
Task on Dependency Parsing. In: Proceedings
of the Joint Conference on Empirical Method on Natural Language Processing and
the Conference on Computational Natural Language Learning. Association for
Computational Linguistics, Prague, Czech Republic, pp. 915–932
Quasthoff,
U., Richter, M., Biemann, C., 2006. Corpus Portal for Search in Monolingual Corpora.
In: Proceedings of the 5th
International Conference on Linguistic Resources Evaluation. Genoa, pp.
1799–1802
Rozovskaya,
A., Roth, D., 2010. Generating Confusion Sets for Context-sensitive Error
Correction. In: Proceedings of the
2010 Conference on Empirical Method on Natural Language Processing. Association
for Computational Linguistics, Stroudsburg, USA, pp. 961–970
Sneddon,
J.N., Adelaar, A., Djenar, D.N., Ewing, M.C., 2010. Indonesian: A Comprehensive Grammar. Routledge,
London, United Kingdom
Stack,
M., 2005. Word Order and Intonation in Indonesian. In: Lexical Semantic Ontology Working Papers in Linguistics 5: Proceedings
of Workshop in General Linguistics. Milan, Italy, pp. 168–182