Published at : 21 Apr 2020
Volume : IJtech
Vol 11, No 2 (2020)
DOI : https://doi.org/10.14716/ijtech.v11i2.3678
Risanuri Hidayat | Department of Electrical Engineering and Information Technology, Faculty of Engineering, Universitas Gadjah Mada, Jl. Grafika No. 2, Yogyakarta 55281, Indonesia |
Anggun Winursito | Department of Electrical Engineering and Information Technology, Faculty of Engineering, Universitas Gadjah Mada, Jl. Grafika No. 2, Yogyakarta 55281, Indonesia |
Studies are constantly developing and improving speech
recognition systems, especially their accuracy. This study developed an isolated word recognition system by using the
syllable number characteristics of speech signals that will be recognized.
First, the syllable number of speech signals to be recognized was detected, and
then, the detection results were used to call one of the database groups that
matched the syllable number characteristics. This method was designed to reduce
the error possibility through a matching process between test data features and
database features. This study used Mel frequency cepstral coefficients (MFCC)
for feature extraction and the K-nearest neighbor (KNN) method for
classification. Three versions of the proposed method were designed. The
results showed that version three increased the accuracy by 4% compared to the
conventional recognition system. Version three had the fastest computational
time compared to the other methods. The addition of syllable detection
algorithms in version three increased the computational time by only 0.151 s
compared to the conventional MFCC method. The data cut length and threshold
value for the filter also influenced the speech recognition system accuracy.
Isolated word; K-nearest neighbor (KNN); Mel frequency cepstral coefficients (MFCC); Number of syllables; Speech recognition
Technology plays a crucial role in human daily life and is developing at a rapid rate. One such technology is speech recognition. Speech recognition technology is being widely used in applications such as mobile phones, home security systems, and global positioning systems. Studies have used speech recognition systems to recognize drones (Shi et al., 2018). Studies on speech recognition systems are continually improving the recognition results. Speech recognition systems use several main stages including feature extraction and classification to identify speech patterns (Dahake et al., 2016). The feature extraction process obtains the characteristics of a sound frame, and the classification process chooses a word by analyzing extracted features (Jo et al., 2016). Mel frequency cepstral coefficients (MFCC) are widely used for feature extraction. Some studies have already used MFCC for feature extraction (Adiwijaya et al., 2017; Vijayan et al., 2017; Hidayat et al., 2018; Kumar et al., 2018; Marlina et al., 2018; Winursito et al., 2018; Li et al., 2020). Although MFCC is widely used in speech recognition systems, they still require further development, especially in terms of their accuracy (Winursito et al., 2018).
Several studies have tried to improve
the performance of the MFCC method. One study improved the MFCC method by
adding a delta coefficient (Hossan et al., 2010) and compared this MFCC + Delta method with the ordinary MFCC method.
The results indicated that the added delta coefficient improved the speech
recognition system’s accuracy. Another study (Hidayat
et al., 2018) added a wavelet-transform-based noise
reduction system. This is because the MFCC is quite susceptible to noise
interference in the sound input, and this impacts the speech recognition
system’s accuracy. Other studies used wavelets and a psychoacoustic model for
speech compression (Gunjal and Raut, 2015). A study on noise removal in speech signals (Tomchuk, 2018) tried to realize high speech recognition
system accuracy for both signals with and without noise. A recent speech
recognition system added a data compression method (Winursito et al., 2018) and compressed the
total output data of all MFCC features by using a principal component analysis
(PCA) method. Data compression was performed for removing unnecessary data and
leaving behind only important data. The study results indicated increased
accuracy at the cost of increased computational time.
The
development of speech recognition systems by using syllable number
characteristics improved the speech recognition accuracy. Version three of the
proposed method improved the speech recognition accuracy by 4% compared to the
conventional MFCC method. This method was developed by dividing the reference
database into two parts based on the syllable number characteristics. In
developing a recognition system using the proposed method, the speech
recognition system accuracy strongly depends on the syllable number detection
accuracy. That is because if the system incorrectly recognizes the syllable
number, the classification process will use a wrong database and the word
recognition will also be wrong. The data cut length and threshold values also
affected the speech recognition system accuracy. Version three had the fastest
computational time compared to other methods. The addition of syllable
detection algorithms to version three of the proposed method only increased the
computation time by 0.151 s compared with the conventional MFCC method.
Filename | Description |
---|---|
R1-EECE-3678-20200330204616.JPG | Figure 1 and Figure 2 |
R1-EECE-3678-20200330204651.JPG | Figure 3 |
R1-EECE-3678-20200330204715.JPG | Figure 4 |
R1-EECE-3678-20200330204745.JPG | Figure 5 |
Adiwijaya,
A., Aulia, M.N., Mubarok, M.S., Novia, W.U., Nhita, F., 2017. A Comparative
Study of MFCC-KNN and LPC-KNN for Hijaiyyah Letters Pronunciation
Classification System. In: IEEE Fifth
International Conference on Information and Communication Technology (ICoICT),
pp. 1–5
Banaeeyan, R., Karim,
H.A., Lye, H., Fauzi, M.F.A., Mansor, S., See, J. 2019. Acoustic Pornography
Recognition using Fused Pitch and Mel-Frequency Cepstrum Coefficients. International
Journal of Technology, Volume 10(7),
pp. 1335–1343
Can,
B., Artuner, H., 2013. A Syllable-based Turkish Speech Recognition System by using
Time Delay Neural Networks (TDNNs). In:
International Conference on Soft Computing and Pattern Recognition (SoCPaR).
Hanoi, Vietnam, pp. 219–224
Dahake,
P.P., Shaw, K., Malathi, P., 2016. Speaker Dependent Speech Emotion Recognition
Using MFCC and Support Vector Machine. In:
IEEE International Conference on Automatic Control and Dynamic Optimization
Techniques (ICACDOT), pp. 1080–1084
Gunjal,
S., Raut, R., 2015. Traditional Psychoacoustic Model and Daubechies Wavelets
for Enhanced Speech Coder Performance. International
Journal of Technology, Volume 6(2), pp. 190–197
Hidayat,
R., Bejo, A., Sumaryono, S., Winursito, A., 2018. Denoising Speech for MFCC
Feature Extraction using Wavelet Transformation in Speech Recognition System. In: IEEE 10th International
Conference on Information Technology and Electrical Engineering (ICITEE), Kuta,
pp. 280–284
Hossan,
Md.A., Memon, S., Gregory, M.A., 2010. A Novel Approach for MFCC Feature
Extraction. In: IEEE 4th
International Conference on Signal Processing and Communication Systems, Gold
Coast, Australia, pp. 1–5
Enriko, I.K.A.,
Suryanegara, M., Gunawan, D., 2016. Heart
Disease Prediction System using k-Nearest Neighbor Algorithm with Simplified
Patient’s Health Parameters.
Journal of Telecommunication, Electronic and Computer Engineering, Volume
8(12), pp. 59–65
Jo, J., Yoo,
H., Park, I.-C., 2016. Energy-Efficient Floating-Point MFCC Extraction Architecture
for Speech Recognition Systems. In: IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, Volume 24(2), pp. 754–758
Li, Q., Yang,
Y., Lan, T., Zhu, H., Wei, Q., Qiao, F., Liu, X., Yang, H., 2020. MSP-MFCC:
Energy-Efficient MFCC Feature Extraction Method with Mixed-signal Processing
Architecture for Wearable Speech Recognition Applications. In: IEEE Access,
Volume 8
Kristomo, D., Hidayat, R.,
Soesanti, I., 2017. Classification of the Syllables Sound using Wavelet, Renyi
Entropy and AR-PSD features. In: IEEE
13th International Colloquium on Signal Processing & Its
Applications (CSPA), Penang, Malaysia, pp. 94–99
Kumar, C., ur Rehman, F.,
Kumar, S., Mehmood, A., Shabir, G., 2018. Analysis of MFCC and BFCC in a
speaker identification system. In:
International Conference on Computing, Mathematics and Engineering Technologies
(ICoMET), Sukkur, pp. 1–5
Marlina,
L., Wardoyo, C., Sanjaya, W.S.M., Anggraeni, D., Dewi, S.F., Roziqin, A.,
Maryanti, S., 2018. Makhraj Recognition of Hijaiyah Letter for Children based
on Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machines (SVM)
Method. In: IEEE International
Conference on Information and Communications Technology (ICOIACT), Yogyakarta,
pp. 935–940
Masood,
S., Mehta, M., Namrata, Rizvi, D.R., 2015. Isolated Word Recognition using
Neural Network. In: IEEE Annual IEEE
India Conference (INDICON), New Delhi, India, pp. 1–5
Mufarroha,
F.A., Utaminingrum, F., 2017. Hand Gesture Recognition using Adaptive Network
Based Fuzzy Inference System and K-Nearest Neighbor. International Journal
of Technology, Volume 8(3),
pp. 559–567
Raczynski,
M., 2018. Speech Processing Algorithm for Isolated Words Recognition. In: IEEE International Interdisciplinary
PhD Workshop (IIPhDW), Swinouj?cie, pp. 27–31
Sawant,
S., Deshpande, M., 2018. Isolated Spoken Marathi Words Recognition using HMM. In: IEEE 2018 Fourth International
Conference on Computing Communication Control and Automation (ICCUBEA), Pune,
India, pp. 1–4
Shi,
L., Ahmad, I., He, Y., Chang, K., 2018. Hidden Markov Model Based Drone Sound
Recognition using MFCC Technique in Practical Noisy Environments. Journal of Communication and Network,
Volume 20, pp. 509–518
Soe,
W., Theins, Y., 2015. Syllable-based Myanmar Language Model for Speech Recognition.
In: IEEE/ACIS 14th
International Conference on Computer and Information Science (ICIS), Las Vegas,
NV, USA, pp. 291–296
Tomchuk,
K.K., 2018. Spectral Masking in MFCC Calculation for Noisy Speech. In: IEEE Wave Electronics and Its
Application in Information and Telecommunication Systems (WECONF), St.
Petersburg, pp. 1–4
Vijayan,
A., Mathai, B.M., Valsalan, K., Johnson, R.R., Mathew, L.R., Gopakumar, K.,
2017. Throat Microphone Speech Recognition using MFCC. In: IEEE International Conference on Networks & Advances in
Computational Technologies (NetACT), pp. 392–395
Winursito,
A., Hidayat, R., Bejo, A., Utomo, M.N.Y., 2018. Feature Data Reduction of MFCC using
PCA and SVD in Speech Recognition System. In:
IEEE International Conference on Smart Computing and Electronic Enterprise
(ICSCEE), Shah Alam, pp. 1–6