EMOTION DETECTION WITH SPEECH AS INPUT USING CONVOLUTION NEURAL NETWORKS

Authors

  • KARTIK SHARMA Dept. of Computer Engineering MPSTME, NMIMS University Mumbai,India
  • JEET SHAH Dept. of Computer Engineering MPSTME, NMIMS University Mumbai,India
  • SHREYANSH KUMAR Dept. of Computer Engineering Bennett University Greater Noida,India

Keywords:

Emotion recognition, Feature Selection, Feature Extraction, Emotion Analysis

Abstract

With stress rising exponentially in day to day life, sudden emotions changes are triggering serious health parame- ters leading to surgeries. Even though it is challenging to identify, recognise emotions in spontaneous speech, it is a principal part of interactions between humans and computers. The challenges in identifying emotions during spontaneous speech are mainly attributed to the emotions expressed by the speaker not standing out as they do in acted speech. This paper proposes a framework for automatic speech stimulation that uses relevant information. The framework is moved to see that there is a great deal of inconsistency between human annotations when describing spon- taneous speech; disagreement is greatly reduced when additional information is provided. The suggested framework employs the known emotions shown by the sound bytes of the RAVDESS dataset and the understanding of how spoken words change over time during an audio call in order to accurately identify the speaker’s current emotional state during a conversation. We use convolution neural networks as well as the softmax activation function to classify the data into multiple classes once the features of the audio have been extracted. Our experimental results demonstrate that the accuracy of detection of emotion using speech as input is 68.7% and is feasible to deploy in a real-life scenario.

References

Chakraborty, R., Pandharipande, M. and Kopparapu, S.K., 2016, Decem- ber. Spontaneous speech emotion recognition using prior knowledge. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 2866-2871). IEEE.

Aggarwal, A., Srivastava, A., Agarwal, A., Chahal, N., Singh, D., Alnu- aim, A.A., Alhadlaq, A. and Lee, H.N., 2022. Two-way feature extraction for speech emotion recognition using deep learning. Sensors, 22(6), p.2378.

Pappas, D., Androutsopoulos, I. and Papageorgiou, H., 2015, October. Anger detection in call center dialogues. In 2015 6th IEEE international conference on cognitive infocommunications (CogInfoCom) (pp. 139-144). IEEE.

Pokorny, F.B., Graf, F., Pernkopf, F. and Schuller, B.W., 2015, September. Detection of negative emotions in speech signals using bags-of-audio- words. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 879-884). IEEE.

Yun-Maw, C., Yue-Sun, K., Jun-Heng, Y., Yu-Te, C. and Tsang-Long, P., 2006. Using Recognition of Emotions in Speech to Better Understand Brand Slogan. In Proceedings of the International Workshop on Multimedia Signal Processing.

Burkhardt, F., Van Ballegooy, M., Engelbrecht, K.P., Polzehl, T. and Stegmann, J., 2009, September. Emotion detection in dialog systems: Ap- plications, strategies and challenges. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 1-6). IEEE.

Atila, O. and S¸ engu¨r, A., 2021. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition. Applied Acoustics, 182, p.108260.

Mohamoud, A.A. and Maris, M., 2008. An implementation of anger detection in speech signals.

Proux, D., Marchal, P., Segond, F., Kergourlay, I., Darmoni, S., Pereira, S., Gicquel, Q. and Metzger, M.H., 2009, September. Natural language processing to detect risk patterns related to hospital acquired infections. In Proceedings of the Workshop on Biomedical Information Extraction (pp. 35-41).

Pawar, M.D. and Kokate, R.D., 2021. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimedia Tools and Applications, 80(10), pp.15563-15587.

Abdelhamid, A.A., El-Kenawy, E.S.M., Alotaibi, B., Amer, G.M., Ab- delkader, M.Y., Ibrahim, A. and Eid, M.M., 2022. Robust Speech Emotion Recognition Using CNN+ LSTM Based on Stochastic Fractal Search Optimization Algorithm. IEEE Access, 10, pp.49265-49284.

Polzehl, T., Schmitt, A., Metze, F. and Wagner, M., 2011. Anger recogni- tion in speech using acoustic and linguistic cues. Speech Communication, 53(9-10), pp.1198-1209.

Damiano, R., Lombardo, V., Monticone, G. and Pizzo, A., 2019, September. All about face. An experiment in face emotion recognition in interactive dramatic performance. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) (pp. 1-7). IEEE.

Kadiri, S.R. and Alku, P., 2020. Excitation features of speech for speaker-specific emotion detection. IEEE Access, 8, pp.60382-60391.

Polzehl, T., Schmitt, A., Metze, F. and Wagner, M., 2011. Anger recogni- tion in speech using acoustic and linguistic cues. Speech Communication, 53(9-10), pp.1198-1209.

Schuller, B., Batliner, A., Steidl, S. and Seppi, D., 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech communication, 53(9-10), pp.1062-1087.

Ververidis, D. and Kotropoulos, C., 2006. Emotional speech recognition: Resources, features, and methods. Speech communication, 48(9), pp.1162- 1181.

Downloads

Published

31-12-2022

How to Cite

KARTIK SHARMA, JEET SHAH, & SHREYANSH KUMAR. (2022). EMOTION DETECTION WITH SPEECH AS INPUT USING CONVOLUTION NEURAL NETWORKS. International Journal for Research Publication and Seminar, 13(5), 61–65. Retrieved from https://jrps.shodhsagar.com/index.php/j/article/view/245

Issue

Section

Original Research Article