Music Stimuli Recognition in Electroencephalogram Signal

When humans are listening to music they perceive beats, rhythms and melodies. Music stimuli induce motor system activities and it has a powerful emotion trigger effect. Since music is a potential stimulus in electroencephalogram based emotion research we supposed that different kinds of songs are recognizable from electroencephalogram signal. In this study we try to recognize music-induced electroencephalogram responses with the popular Neurosky Mindwave device. This paper describes the test conditions and the efficiency of an artificial neural network in combination with different data pre-processing techniques. The final outcomes show the negative effect of frequency decomposition and that the meditation level has more significant effect on the recognition than a particular song. DOI:  http://dx.doi.org/10.5755/j01.eie.24.4.21482


I. INTRODUCTION
Electroencephalogram (EEG) signal is voltage fluctuation in the human brain which comes from the ionic current flow within neurons.EEG signals have been used mainly for neurological disorder detection and evaluation in clinical environment.In the early stage of EEG research the diagnosis is usually done by a clinical expert who is familiar with brain rhythms.The vast majority of EEG related papers deal with the application possibilities of EEG readers in medical applications such as epileptic seizure and mental disorder detection [1].The advantages of modern microelectromechanical technology made the EEG sensors available for the whole research community and take the measurements outside the special clinical environment.Currently the Neurosky Mindwave headset is one of the most popular and cheapest EEG readers on the market [2].
Nowadays the new machine learning (ML) methods replaced the circumstantial visual inspection based EEG analysis.Some different automated solutions already have been suggested for medical and non-medical purposes.According to the EEG processing task the ML algorithm type can be different.For instance, the authors of [3] [5].In this work we also applied ANN because the results in [6] demonstrated that a well-constructed ANN can outperform other ML methods.
Music has a well-known impact on emotional states.Already several neuroscientists investigated the connection between music and EEG signals in different contexts [7].The authors of [8] developed an automatic music rating system from brainwaves where the output is like or dislike.In [9] the researchers also described another music recommendation program which utilizes the user's emotional states.In lots of woks scientists tried to recognize the six elementary emotions (disgust, fear, sadness, anger, surprise and happiness) via brainwaves.Almost in all earlier emotion recognition articles the model training and testing were performed with the subject's self-reported emotions.However this feedback can be unreliable in many cases because emotions are not exact and change slowly [10].Unlike other ML problems such as voice or image recognition, in EEG classification there is uncertainty between labels because it is rather difficult to decide that the emotion was correctly determined and labeled.In addition the wide overlap between emotion states causes complications in classification.Therefore in this work we tried to recognize the concrete songs from EEG signal instead of emotional states.Our results also reveal the capabilities of the Neurosky headset which can be useful in later works.

II. BRAIN-COMPUTER INTERFACE
Currently the brain-compute interface (BCI) is an outstanding research area where the goal is to create a communication link between computer and human brain.It provides a direct way to transform brainwaves into physical effects without using muscles.
A widely used EEG reader is the single channel Neurosky Mindwave Mobile headset.It consists of two dry-electrodes.One of them is positioned on the forehead in the Fp1 position while the second one (an ear clip) ensures reference point on the A1 position according to the 10-20 electrode system.An illustration about the sensor and its electrodes locations can be seen on Fig. 1.The device provides noninvasive and painless measurement up to 8 hours with a fully charged AAA battery.Its sampling frequency is 512 Hz.The output data contain 12-bit raw brainwaves (between 3 Hz-100 Hz) and pre-determined attention and meditation levels from the build-in eSense meter (discrete values between 0-100).The communication between the computer and the sensor takes place via Bluetooth connection with 57600 baud rate.This EEG reader already has been used for control tasks where the attention level was the input of controllers.For instance, in [11] and [12] the authors controlled a toy car and a robot where their speed depended on the average or a threshold attention level.To this study we developed an own EEG signal acquisition framework in the Processing programming environment.The software establishes Bluetooth connection with the sensor a saves the raw data into text files.The data have been collected from 5 volunteers aged between 14 and 52 years.They have been informed about the purpose of the experiment and the operation of the brainwave recording device.Each subject sat comfortably in a chair in front of the computer in a silent room.During data acquisition, the participants kept their eyes closed and remained as motionless as possible.The music were long with 10 seconds wide silence interval between each song without any stimulus for relaxing and avoiding emotional contamination from previous songs.complete data set was divided into 20 % test, 20 % validation and 60 % training data.The data acquisition software, the collected data, list of songs and other materials are freely available from the http://irh.inf.unideb.hu/user/sutoj/eeg.phpwebpage after an official request.

III. METHOD
Traditionally, brainwaves can be decomposed into particular frequency bands.Each band relates to special feelings.A summary about frequency bands can be seen in Table I.The frequency bands are slightly distinct in different articles [13], [14].Frequency decomposition is a generally accepted technique in music stimuli analysis independently of the application type.However, the influence of music on the individual bands is not clear.For example, Fachner and Stegemann demonstrated that an increasing tendency can be observed in the frontal θ power during pleasant music listening [15].The authors of [8] showed that an increase can be seen in γ and in frontal θ band power during music perception which depends on pleasant and unpleasant sounds while Sun et al. claimed that different music styles influence EEG bands differently because the energy magnitude and intensity are varying due to the music type [7].Therefore we separated the frequency bands and investigated them independently of each other.Every change in the EEG signal which is not directly influenced by the brain potential difference is called artifact.The real signal which comes from the brain activity is significantly distorted by several known artifacts such as power line frequency, electrostatic interference, pulse, breathing, lid, eye and muscle movements, etc. [2].The interfering noises can be more than 10 times stronger than the real brain activity.In addition eye movement is one of the major noise sources in recordings.
Probably the most obvious strategy for dealing with noisy data is the digital filtering.The Neurosky sensor already has an embedded band pass filter (3 Hz-100 Hz band).At first we used the whole band without frequency interval subdivision.Later the frequency bands from Table I have been separated.Since the δ band is suppressed by the device's built-in band pass filter, it was ignored.To the band separation finite impulse response (FIR) filters have been used with Hamming window.FIR filters do not cause phase shift and they are predominantly used in EEG signal processing [15].The filter size has been determined by (1) where fs is the sampling frequency, fl and fu are the lower and upper cut off frequencies and N is the window size (nearest odd number).According to it the filter size to θ, α, β and γ filters are 423, 339, 101 and 25 respectively.The frequency response of filters can be seen on Fig. 2. Additional information about FIR filter design can be found in [16].In order to reduce muscle artifacts, participants had to remain in one place Most ML solutions follow a general machine learning chain.It consists of data acquisition, segmentation, feature extraction, classifier training and classification stages.In our case the signal is a single channel, discretized data flow.The data is divided into small pieces or in other words into windows.The window is the base of the training and classification phases because the entire information content from the window feeds the ML algorithm.In this work the window size was 3 seconds, 5 seconds and 10 seconds long with approximately 99 % overlap between adjacent windows in training phase and without overlap in the classification phase.By the wide overlap we achieved a richer training data set.
With feature extraction we try to take out the useful characteristic of raw signal.With an appropriate feature set the classifier model will be simpler and its performance will be better [6].In this study the features have been extracted from the time and frequency domains as in [17] where the authors collected the most relevant feature extraction methods to the human activity recognition problem.
Although activity and music stimuli recognition have two different objectives, many similarities exist between them.
The used features can be found in Table II.Each feature was normalized with the standard scaler (2) where Fu and Fn indicate the initial and normalized feature matrices while σj and µj are the standard deviation and mean of the j th feature type.Normalization makes features equally important The ANN is based on [6] where the authors compared three ANN structures with different activation and error functions.Their result showed that an ANN with mean square error function, L2 regularization and two layers with tangent and linear activation functions can be a good choice in several ML problems.The main hyper-parameters of the network such as the regularization strength (ν), initial learning rate (ϕ0) and the learning decay tendency (τ), were where W i indicates the weight matrix and η denotes the number of inputs on the i th layer;  Learning decay tendency: exponential according to (5 where τ is the decay factor, ϕ0 is the initial learning rate and ε refers to the epoch counter.

IV. RESULTS
At the beginning of the investigation the effect of frequency decomposition has been tested with 5 seconds wide window size.At first features from Table II have been extracted from the original 3 Hz-100 Hz signal and they were the input of the ANN.Thereafter the 3 Hz-100 Hz range has been divided into its bands with the above described FIR filters and the features from the individual bands were the new input of the network.As an example, a segment of the original signal and its frequency decomposed versions can be seen on Fig. 3. Finally, the features from the individual bands have been concatenated (θ-γ bands) and the extended feature vectors (with 56 elements) were the information source of the classifier.The results can be seen in Table III where each value is the best result after 100 trials.Surprisingly the frequency band decomposition caused performance degradation because neither features from the individual bands nor joint features from all bands were more efficient than features from the entire 3 Hz-100 Hz band.Usually the narrow θ and α bands produced small recognition rates while the widest γ band contains the most useful information from the four bands.The joint features from θ -γ improved the accuracy but this extension was not as efficient as we expected.
In the following step the effect of window size expansion and reduction have been examined without frequency decomposition.In the first case the window size was 3 seconds while in the second case the size was 10 seconds.The results also can be seen in Table III.The narrower window decreased the recognition rate but the wider window caused significant improvement.Since the highest recognition rate was more than 35 % which is significantly higher than the 10 % chance probability, we supposed that some kind of patter might exist in the EEG data.To find patterns, the confusion matrices to each subject also have been examined.As an example Table IV contains the confusion matrix of the first subject.
Confusion matrices show a very interesting correlation between elapsed time and recognition rate improvement.Actually we can conclude that the recognition accuracy is related to the relaxation level instead of the stimulation effect of a concrete music.

V. CONCLUSIONS
This paper described a novel music stimuli recognition approach which does not require any feedback from users.The results showed that the recognition of a concrete song from the EEG signal is very difficult.In spite of the week accuracy, two interesting conclusions can be drawn from the measurements.Firstly, the popular frequency decomposition approach which has been used in several earlier articles such as in [7] and [10] caused accuracy loss.Secondly, the most important conclusion comes from the confusion matrices.
They show that the recognition rate is discontinuously increasing with the elapsed time.Based on this observation we suppose that recognition depends on the relaxation state rather than the real effect of music.We did not find any mention about it in previous papers.Perhaps the Neurosky Mindwave headset causes this phenomenon.In order to investigate this hypothesis the next step toward this research direction is to test multi-channel EEG readers such as the Emotive EPOC headset which measures EEG activity from 14 points on the head.

1
Abstract-When humans are listening to music they perceive beats, rhythms and melodies.Music stimuli induce motor system activities and it has a powerful emotion trigger effectIn this study we try to recognize music-induced electroencephalogram responses with the popular Neurosky Mindwave device.This paper describes the test conditions and the efficiency of an artificial neural network in combination with different data pre-processing techniques.The final outcomes show the negative effect of frequency decomposition and that the meditation level has more significant effect on the recognition than a particular song.Index Terms-Artificial neural networks; Brain-computer interfaces; Digital filtering; Electroencephalography.
U refers to the uniform distribution.The additional network parameters were the following:  Neurons on the hidden layer: 50;  Training algorithm: gradient descent with 0.15 momentum and 10 samples mini batch;  Initial bias and weights: bias values were 0 while weights came from the (4

Fig. 3 .
Fig. 3.The original signal and its filtered versions.
. Amarasinghe et al. worked with artificial neural network (ANN) and tried to recognize two patterns to robot control [4].Lin et al. used ANN to the classification of four emotion categories which come from music stimuli used support vector machine (SVM) to emotional state Manuscript received 4 December, 2017; accepted 29 May, 2018.This research was supported by the ÚNKP-17-3-IV New National Excellence Program of the Ministry of Human Capacities.classification

TABLE I .
EEG FREQUENCY BANDS.

TABLE II .
USED FEATURES.

TABLE III .
HIGHEST RECOGNITION RATES AFTER 100 TRIALS.