Identification of Spoken Language from Webcast Using Deep Convolutional Recurrent Neural Networks

Dong ZHU; Ming HUANG; Jing-jing YANG; De-zhang CHEN; Hao TANG; Da-rong YANG

doi:10.12783/dtcse/iteee2019/28737

Identification of Spoken Language from Webcast Using Deep Convolutional Recurrent Neural Networks

Dong ZHU, Ming HUANG, Jing-jing YANG, De-zhang CHEN, Hao TANG, Da-rong YANG

Abstract

This paper investigated two end-to-end approaches for the identification of spoken language from webcast sources. Long short-term memory (LSTM) and self-attention mechanism architectures are adopted and compared against a deep convolution network baseline model. These methods focused on the performance of spoken language identification (LID) on variable length utterance. The dataset used for experimental evaluation contains five language data collected from webcast (Webcast-5) and ten Chinese dialect language datasets from IFLYTEK (IFLYTEK-10). The end-to-end LID system was trained using five kinds of acoustic features: Mel-frequency cepstral coefficients (MFCCs), shifted delta cepstral coefficients (SDC), Perceptual Linear Prediction (PLP), log Mel-scale filter bank energies (Fbank) and spectrogram energies. The best model using a single feature set achieves an accuracy of 79.6% and a Cavg of 10.87%.

Keywords

Spoken language identification, Deep convolutional recurrent neural network, Variable length utterance

DOI
10.12783/dtcse/iteee2019/28737

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Identification of Spoken Language from Webcast Using Deep Convolutional Recurrent Neural Networks

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING