EMOTION DETECTION WITH SPEECH AS INPUT USING CONVOLUTION NEURAL NETWORKS
Keywords:
Emotion recognition, Feature Selection, Feature Extraction, Emotion AnalysisAbstract
With stress rising exponentially in day to day life, sudden emotions changes are triggering serious health parame- ters leading to surgeries. Even though it is challenging to identify, recognise emotions in spontaneous speech, it is a principal part of interactions between humans and computers. The challenges in identifying emotions during spontaneous speech are mainly attributed to the emotions expressed by the speaker not standing out as they do in acted speech. This paper proposes a framework for automatic speech stimulation that uses relevant information. The framework is moved to see that there is a great deal of inconsistency between human annotations when describing spon- taneous speech; disagreement is greatly reduced when additional information is provided. The suggested framework employs the known emotions shown by the sound bytes of the RAVDESS dataset and the understanding of how spoken words change over time during an audio call in order to accurately identify the speaker’s current emotional state during a conversation. We use convolution neural networks as well as the softmax activation function to classify the data into multiple classes once the features of the audio have been extracted. Our experimental results demonstrate that the accuracy of detection of emotion using speech as input is 68.7% and is feasible to deploy in a real-life scenario.
References
Chakraborty, R., Pandharipande, M. and Kopparapu, S.K., 2016, Decem- ber. Spontaneous speech emotion recognition using prior knowledge. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 2866-2871). IEEE.
Aggarwal, A., Srivastava, A., Agarwal, A., Chahal, N., Singh, D., Alnu- aim, A.A., Alhadlaq, A. and Lee, H.N., 2022. Two-way feature extraction for speech emotion recognition using deep learning. Sensors, 22(6), p.2378.
Pappas, D., Androutsopoulos, I. and Papageorgiou, H., 2015, October. Anger detection in call center dialogues. In 2015 6th IEEE international conference on cognitive infocommunications (CogInfoCom) (pp. 139-144). IEEE.
Pokorny, F.B., Graf, F., Pernkopf, F. and Schuller, B.W., 2015, September. Detection of negative emotions in speech signals using bags-of-audio- words. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 879-884). IEEE.
Yun-Maw, C., Yue-Sun, K., Jun-Heng, Y., Yu-Te, C. and Tsang-Long, P., 2006. Using Recognition of Emotions in Speech to Better Understand Brand Slogan. In Proceedings of the International Workshop on Multimedia Signal Processing.
Burkhardt, F., Van Ballegooy, M., Engelbrecht, K.P., Polzehl, T. and Stegmann, J., 2009, September. Emotion detection in dialog systems: Ap- plications, strategies and challenges. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 1-6). IEEE.
Atila, O. and S¸ engu¨r, A., 2021. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition. Applied Acoustics, 182, p.108260.
Mohamoud, A.A. and Maris, M., 2008. An implementation of anger detection in speech signals.
Proux, D., Marchal, P., Segond, F., Kergourlay, I., Darmoni, S., Pereira, S., Gicquel, Q. and Metzger, M.H., 2009, September. Natural language processing to detect risk patterns related to hospital acquired infections. In Proceedings of the Workshop on Biomedical Information Extraction (pp. 35-41).
Pawar, M.D. and Kokate, R.D., 2021. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimedia Tools and Applications, 80(10), pp.15563-15587.
Abdelhamid, A.A., El-Kenawy, E.S.M., Alotaibi, B., Amer, G.M., Ab- delkader, M.Y., Ibrahim, A. and Eid, M.M., 2022. Robust Speech Emotion Recognition Using CNN+ LSTM Based on Stochastic Fractal Search Optimization Algorithm. IEEE Access, 10, pp.49265-49284.
Polzehl, T., Schmitt, A., Metze, F. and Wagner, M., 2011. Anger recogni- tion in speech using acoustic and linguistic cues. Speech Communication, 53(9-10), pp.1198-1209.
Damiano, R., Lombardo, V., Monticone, G. and Pizzo, A., 2019, September. All about face. An experiment in face emotion recognition in interactive dramatic performance. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) (pp. 1-7). IEEE.
Kadiri, S.R. and Alku, P., 2020. Excitation features of speech for speaker-specific emotion detection. IEEE Access, 8, pp.60382-60391.
Polzehl, T., Schmitt, A., Metze, F. and Wagner, M., 2011. Anger recogni- tion in speech using acoustic and linguistic cues. Speech Communication, 53(9-10), pp.1198-1209.
Schuller, B., Batliner, A., Steidl, S. and Seppi, D., 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech communication, 53(9-10), pp.1062-1087.
Ververidis, D. and Kotropoulos, C., 2006. Emotional speech recognition: Resources, features, and methods. Speech communication, 48(9), pp.1162- 1181.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.