Nafa Zulfa, Nanik Suciati, Shintami Chusnul Hidayati


Lip-reading is one of the most challenging studies in computer vision. This is because lip-reading requires a large amount of training data, high computation time and power, and word length variation. Currently, the previous methods, such as Mel Frequency Cepstrum Coefficients (MFCC) with Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) with LSTM, still obtain low accuracy or long-time consumption because they use LSTM. In this study, we solve this problem using a novel approach with high accuracy and low time consumption. In particular, we propose to develop lip language reading by utilizing face detection, lip detection, filtering the amount of data to avoid overfitting due to data imbalance, image extraction based on CNN, voice extraction based on MFCC, and training model using LSTM and Gated Recurrent Units (GRU). Experiments on the Lip Reading Sentences dataset show that our proposed framework obtained higher accuracy when the input array dimension is deep and lower time consumption compared to the state-of-the-art.

Full Text:



A. L. Akbar, C. Fatichah, and A. Saikhu, “Pengenalan Wajah Menggunakan Metode Deep Neural Networks Dengan Perpaduan Metode Discrete Wavelet Transform, Stationary Wavelet Transform, Dan Discrete Cosine Transform,” JUTI J. Ilm. Teknol. Inf., vol. 18, no. 2, pp. 158–170, 2020.

C. Pandian et al., “Raman distributed sensor system for temperature monitoring and leak detection in sodium circuits of FBR,” ANIMMA 2009 - 2009 1st Int. Conf. Adv. Nucl. Instrumentation, Meas. Methods their Appl., pp. 8–11, 2009, doi: 10.1109/ANIMMA.2009.5503761.

M. T. Kian and L. L. Choi, “GPS and UWB integration for indoor positioning,” 2007 6th Int. Conf. Information, Commun. Signal Process. ICICS, 2007, doi: 10.1109/ICICS.2007.4449630.

H. J. Seo and P. Milanfar, “Visual saliency for automatic target detection, boundary detection, and image quality assessment,” in Image Processing, 2010, pp. 5578–5581, doi: 10.1109/ICASSP.2010.5495239.

A. Saini and M. Biswas, “Object detection in underwater image by detecting edges using adaptive thresholding,” Proc. Int. Conf. Trends Electron. Informatics, ICOEI 2019, no. Icoei, pp. 628–632, 2019, doi: 10.1109/ICOEI.2019.8862794.

T. Afouras, J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Deep Audio-visual Speech Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–13, 2018, doi: 10.1109/TPAMI.2018.2889052.

J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 3444–3450, 2017, doi: 10.1109/CVPR.2017.367.

E. B. Nitchie, Lip-Reading Principles and Practise, vol. 11. Frederick A. Stokes Company, 2008.

N. K. Mudaliar, K. Hegde, A. Ramesh, and V. Patil, “Visual Speech Recognition: A Deep Learning Approach,” no. Icces, pp. 1218–1221, 2020, doi: 10.1109/icces48766.2020.9137926.

Z. Thabet, A. Nabih, K. Azmi, Y. Samy, G. Khoriba, and M. Elshehaly, “Lipreading using a comparative machine learning approach,” Proc. IWDRL 2018 2018 1st Int. Work. Deep Represent. Learn., pp. 19–25, 2018, doi: 10.1109/IWDRL.2018.8358210.

J. S. Chung and A. Zisserman, “Lip reading in the wild,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10112 LNCS, pp. 87–103, 2017, doi: 10.1007/978-3-319-54184-6_6.

L. Zaman, S. Sumpeno, and M. Hariadi, “Analisis Kinerja LSTM dan GRU sebagai Model Generatif untuk Tari Remo,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 2, p. 142, 2019, doi: 10.22146/jnteti.v8i2.503.

T. Thein and K. M. San, “Lip movements recognition towards an automatic lip reading system for Myanmar consonants,” Proc. - Int. Conf. Res. Challenges Inf. Sci., vol. 2018-May, no. 1, pp. 1–6, 2018, doi: 10.1109/RCIS.2018.8406660.

B. S. Lin, Y. H. Yao, C. F. Liu, C. F. Lien, and B. S. Lin, “Development of novel lip-reading recognition algorithm,” IEEE Access, vol. 5, no. c, pp. 794–801, 2017, doi: 10.1109/ACCESS.2017.2649838.

V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., no. August, pp. 1867–1874, 2014, doi: 10.1109/CVPR.2014.241.

D. King, "pypi," 5 12 2020. [Online]. Available: [Accessed 19 1 2021].



  • There are currently no refbacks.