IMPROVED LIP-READING LANGUAGE USING GATED RECURRENT UNITS
DOI:
https://doi.org/10.12962/j24068535.v19i2.a1080Abstract
Lip-reading is one of the most challenging studies in computer vision. This is because lip-reading requires a large amount of training data, high computation time and power, and word length variation. Currently, the previous methods, such as Mel Frequency Cepstrum Coefficients (MFCC) with Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) with LSTM, still obtain low accuracy or long-time consumption because they use LSTM. In this study, we solve this problem using a novel approach with high accuracy and low time consumption. In particular, we propose to develop lip language reading by utilizing face detection, lip detection, filtering the amount of data to avoid overfitting due to data imbalance, image extraction based on CNN, voice extraction based on MFCC, and training model using LSTM and Gated Recurrent Units (GRU). Experiments on the Lip Reading Sentences dataset show that our proposed framework obtained higher accuracy when the input array dimension is deep and lower time consumption compared to the state-of-the-art.
Downloads
References
A. L. Akbar, C. Fatichah, and A. Saikhu, “Pengenalan Wajah Menggunakan Metode Deep Neural Networks Dengan Perpaduan Metode Discrete Wavelet Transform, Stationary Wavelet Transform, Dan Discrete Cosine Transform,” JUTI J. Ilm. Teknol. Inf., vol. 18, no. 2, pp. 158–170, 2020.
C. Pandian et al., “Raman distributed sensor system for temperature monitoring and leak detection in sodium circuits of FBR,” ANIMMA 2009 - 2009 1st Int. Conf. Adv. Nucl. Instrumentation, Meas. Methods their Appl., pp. 8–11, 2009, doi: 10.1109/ANIMMA.2009.5503761.
M. T. Kian and L. L. Choi, “GPS and UWB integration for indoor positioning,” 2007 6th Int. Conf. Information, Commun. Signal Process. ICICS, 2007, doi: 10.1109/ICICS.2007.4449630.
H. J. Seo and P. Milanfar, “Visual saliency for automatic target detection, boundary detection, and image quality assessment,” in Image Processing, 2010, pp. 5578–5581, doi: 10.1109/ICASSP.2010.5495239.
A. Saini and M. Biswas, “Object detection in underwater image by detecting edges using adaptive thresholding,” Proc. Int. Conf. Trends Electron. Informatics, ICOEI 2019, no. Icoei, pp. 628–632, 2019, doi: 10.1109/ICOEI.2019.8862794.
T. Afouras, J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Deep Audio-visual Speech Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–13, 2018, doi: 10.1109/TPAMI.2018.2889052.
J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 3444–3450, 2017, doi: 10.1109/CVPR.2017.367.
E. B. Nitchie, Lip-Reading Principles and Practise, vol. 11. Frederick A. Stokes Company, 2008.
N. K. Mudaliar, K. Hegde, A. Ramesh, and V. Patil, “Visual Speech Recognition: A Deep Learning Approach,” no. Icces, pp. 1218–1221, 2020, doi: 10.1109/icces48766.2020.9137926.
Z. Thabet, A. Nabih, K. Azmi, Y. Samy, G. Khoriba, and M. Elshehaly, “Lipreading using a comparative machine learning approach,” Proc. IWDRL 2018 2018 1st Int. Work. Deep Represent. Learn., pp. 19–25, 2018, doi: 10.1109/IWDRL.2018.8358210.
J. S. Chung and A. Zisserman, “Lip reading in the wild,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10112 LNCS, pp. 87–103, 2017, doi: 10.1007/978-3-319-54184-6_6.
L. Zaman, S. Sumpeno, and M. Hariadi, “Analisis Kinerja LSTM dan GRU sebagai Model Generatif untuk Tari Remo,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 2, p. 142, 2019, doi: 10.22146/jnteti.v8i2.503.
T. Thein and K. M. San, “Lip movements recognition towards an automatic lip reading system for Myanmar consonants,” Proc. - Int. Conf. Res. Challenges Inf. Sci., vol. 2018-May, no. 1, pp. 1–6, 2018, doi: 10.1109/RCIS.2018.8406660.
B. S. Lin, Y. H. Yao, C. F. Liu, C. F. Lien, and B. S. Lin, “Development of novel lip-reading recognition algorithm,” IEEE Access, vol. 5, no. c, pp. 794–801, 2017, doi: 10.1109/ACCESS.2017.2649838.
V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., no. August, pp. 1867–1874, 2014, doi: 10.1109/CVPR.2014.241.
D. King, "pypi," 5 12 2020. [Online]. Available: https://pypi.org/project/dlib/. [Accessed 19 1 2021].
Downloads
Published
Issue
Section
How to Cite
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in JUTI unless they receive approval for doing so from the Editor-in-Chief.
JUTI open access articles are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.