Detection of EMG Signals by Neural Networks Using Autoregression and Wavelet Entropy for Bruxism Diagnosis

1 Abstract —Bruxism is known as the rhythmical clenching of the lower jaw (mandibular) by the contraction of the masticatory muscles and parafunctional grinding of the teeth. It affects patients’ quality of life adversely due to tooth wear, pain, and fatigue in the jaw muscles. Recently, effective diagnosis methods that use electromyography, electrocardiography, and electroencephalography have been developed for bruxism. However, these methods are not economical since they require specialization and can be performed in clinical conditions. Although using surface electromyography signals alone is an economical solution, it is difficult to identify fatigue and parafunctional movements of the jaw muscles via electromyography signals due to peripheral effects. In this study, to achieve an accurate diagnosis of bruxism economically


I. INTRODUCTION
Bruxism is a chronic dental problem seen in many people today. Its treatment is difficult and has a low success rate [1]. Bruxism is defined as the parafunctional grinding of the teeth and involuntary contraction of the lower jaw [2]- [4]. In bruxism patients, hypertrophy in masticatory muscles, tooth wear, implant fractures, tooth loss, and fatigue and pain in jaw muscles can be seen. These symptoms are due to the application of a force that is very high for the physiological structure by the involuntary contraction of jaw muscles [5]. Earliest studies performed with sensors for the diagnosis of bruxism aimed at detecting force. In these studies, healthy and afflicted individuals can be identified by monitoring the number of dental contacts in individuals with bruxism [6]. In bruxism events, the duration of contractions per hour during sleep is measured as 47.8 s-174.9 s. Forces that occur in the mouth during bruxing can be measured with strain gauge sensor systems placed in the lower and Manuscript received 29 November, 2020; accepted 6 April, 2021. upper jaws. In experiments on patients, it was found that 22.5 kg average force of duration 7.1 s can be generated 3.6 times in one hour [7]. Moreover, occlusal forces during bruxism can be detected by an artificial intelligence algorithm via strain gauge pressure sensors placed inside the mouth [8].
An examination of surface electromyography (sEMG) studies in the literature reveals that rhythmic parafunctional activity of jaw muscles can be observed in the population due to stress and anxiety [9], [10]. During these incidents, bursts of rhythmic masticatory muscle activity at a frequency of approximately 1 Hz and up to approximately 10 % of maximum voluntary clenching (MVC) can occur in EMG signals of masticatory muscles. In addition, these rhythmic contractions last more than 8 seconds during sleep [11], [12]. Detection of this activity, especially during sleep, is difficult and requires expert evaluation in the laboratory environment [13].
Recently, studies based on joint analysis of electrocardiography (ECG) and electroencephalography (EEG) signals, together with sEMG, have been carried out for the clinical diagnosis of bruxism. By classifying two features, namely, the mean absolute value (MAV) of the EMG signal, that is, its average amplitude over 1 s and heart rate (HR) obtained from ECG signals using multi-layer perceptrons (MLPs), it was demonstrated that changes in HR occur simultaneously with bruxism-based contractions of the jaw muscles. When MAV is considered together with HR, bruxism diagnosis results improve so that 80 % of healthy individuals and 60 % of low-frequency patients are predicted correctly, whereas 64 % classification accuracy can be achieved for the whole group [14]. When EMG signals are considered in combination with ECG and EEG by using the decision tree (DT) algorithm, 97.21 % ± 0.836 % accuracy can be achieved in evaluating bruxism [15]. However, these methods for clinical diagnosis of bruxism are costly and time-consuming, and they require specialization.
Currently, a grinding sound reported by the partner of the patient or tooth wear is used in the assessment of bruxism. Additionally, the diagnosis is made through various methods that make use of EMG, as well as clinical findings, such as Detection of EMG Signals by Neural Networks Using Autoregression and Wavelet Entropy for Bruxism Diagnosis pain in the temporomandibular joint and masticatory muscles. Assessment of bruxism through clinical practice is important for the diagnosis, but the accuracy of this assessment is not considered sufficient [16], [17]. For this reason, the golden standard for bruxism diagnosis, especially during sleep, is the evaluation of EMG of the masticatory muscles together with audio-video recordings through polysomnography in a sleep laboratory or hospital setting [18]- [20]. Currently, the complexity and high cost of this method is a disadvantage. Therefore, it is crucial to develop practical, efficient, and economical solutions based only on the use of EMG signals. In this study, experimental work has been performed for accurately detecting jaw clenching, jaw relaxation, teeth grinding through rhythmical chewing movements of the mandibular, and pain and fatigue in the lower masseter muscles for the diagnosis and treatment of bruxism disorder by using sEMG signals. For an accurate classification, autoregression (AR) coefficients yielded by the autoregression analysis of each EMG signal obtained from the subjects were used together with artificial neural networks (ANN) and 95.2 % success rate was achieved. The same signals were decomposed by using discrete wavelet transform (DWT). The components of the signal were obtained by using the Daubechies wavelet function (db4 8level) in this decomposition and the Shannon entropies of the discrete wavelet transform energy spectra (WShEn) of these components were calculated. All these calculated parameters were reclassified by ANN, and the highest success rate of 100% was achieved so that the physical effects seen in masseter muscles during bruxism could be classified most accurately. For this reason, the most accurate classification was obtained based on EMG measurements only. EMG measurements have been used in bruxism diagnosis for years. However, through the use of autoregression and wavelet transform energy spectra in bruxism diagnosis with EMG measurements, this study brings a novel approach to the literature. It also contributes to the literature as a practical, efficient, and economical solution by accurately identifying lower jaw movements that occur in bruxism diagnosis with only an EMG sensor.
This article is organized as follows. Section II introduces the materials and methods used towards diagnosing the bruxism conditions. Section III presents the experimental and analytical results through graphics and tables, and Section IV discusses the impact of these results. Section V concludes the article.

A. EMG Signal Acquisition and Pre-Processing
The experimental setup used for acquiring data from the subjects is shown in Fig. 1. A dual-channel data logger was designed for sEMG signals collected from the skin surface. EMG signals received from the skin surface were amplified with AD624 instrumentation amplifier (Gain: 1000, CMMR: 130 dB). The amplified signals should be freed from noise. Since the meaningful frequency of the EMG signals obtained from the muscles is within the 10 Hz-500 Hz range, a second-order Butterworth band-pass filter with corner frequencies of 10 Hz and 500 Hz was designed [21]. In the next stage, a 12-bit A/D converter was used for digitization. EMG signals consist of bipolar positive and negative signals. To sample the negative amplitudes of the signal in the positive region, DC voltage was added to the signals and they were passed through a shifting circuit. Analog EMG signals are digitized and transferred to PC environment by using a microcontroller with a 32-bit Atmel SAM3X8E ARM Cortex-M3 processor. Data received from the subjects are sampled to have 4500 samples per second (fs: 4500 Hz). The data are analysed by using MATLAB software.
In this study, data from ten subjects between the ages of 14 and 42 were taken repetitively at different times. The mean age of the subjects was 27.2 ± 9.57. All subjects gave their informed consent in accordance with the Helsinki Declaration and understood that they were free to opt out of the experiment at any time. Moreover, the participants filled a questionnaire about bruxism awareness, anxiety, sleep habits, stress, irritability, fatigue, current facial pain intensity, fatigue of the jaw muscles, and jaw pain. For the recording of measurements, AgCl bipolar electrodes of 15 mm diameter were placed on the skin surface over the masseter muscle of the subjects with 1 cm intervals [22]. Calibration and normalization of the amplitude values of the EMG signals were performed and MVC value was determined for each subject [23]. To do this, the subjects were first made to wear an oral apparatus to protect their teeth, and then amplitude values were recorded for each subject by asking them to perform a biting (jaw squeezing) movements for 3 seconds with the maximum contraction of the lower jaw together with relaxing (jaw opening) movement. In the next stage, the bruxism states of the signals obtained from the subjects should be categorized. It is especially difficult to interpret muscle fatigue and pain states. The pain threshold for muscle fatigue can vary from person to person. Therefore, muscle fatigue/pain in case of muscle fatigue was evaluated on a numerical scale from 0 to 10 according to the research diagnostic criteria in [24]. By this means, the relationship between fatigue/pain conditions in the classification of EMG signals obtained from the subjects was determined. To do this, each individual was asked to squeeze his/her jaw muscles until pain and fatigue occurred while also monitoring the biting (jaw clenching) movement at 50 % MVC through the amplitude values of real-time EMG signals. Even though there are variations from person to person in the onset of fatigue and pain, the average recording time was 60 s ± 20 s. This procedure was repeated at different times for each subject. Thus, EMG data for the fatigue and pain states of the subjects were obtained.
At another time, all subjects were asked to chew and grind their teeth randomly for 10 s repeatedly. At this stage, EMG data were recorded for each individual at 10 % MVC value of the jaw muscles. For analysing the recorded EMG signal data, windowing of 1 s periods was applied to each data group. Figure 2 depicts samples of real-time EMG signals obtained from the subjects in this study.
A total of 125 data records of EMG signals obtained from the subjects are used in the signal analyses to classify four different states. These states are: The jaw muscles are relaxed (0), The jaw muscles are closed (clenching) (1), Rhythmical teeth grinding (2), and Fatigue/pain (3). To identify these states, unlike other bruxism-related studies in the literature, DWT and AR models are used together to extract the most salient features from each EMG signal. From these two methods, 20 different parametric coefficients are obtained for each EMG signal of 1 s period (fs: 4500 Hz). Considering that there are 125 different EMG signals collected from each subject, a total of 20 × 125 = 2500 data are obtained. The block architecture of the proposed model that outlines the entire study is shown in Fig. 3.
In this model, the bruxism diagnosis is made by using the AR coefficients and WShEn values. The model proposed in this study will be explained below with mathematical equations. B. EMG Feature Extraction Using AR Modelling AR contains important state parameters of a system and is of great importance for time series analysis since it is very sensitive to time-dependent changes in a system. This model is an important one in terms of capturing the key characteristics of any data set [25], [26]. A linear AR model can be expressed as in (1) In (1), X m is the predicted value of the signal at a time and i a (i = 1, 2, …, p) are the weight coefficients. The parameter represents the order of the AR model. m  is the error term that denotes the difference between the actual and predicted values. This value includes the white noise signal or nonstationary signals in the form of impulses [28], [29].
In this model, m X is the time series and is expressed in terms of mi X  values. Thus, autoregressive modelling of a signal results in a set of parameters. These parameters contain important information about the pattern of the EMG signal and a feature set is created during classification by means of this information. AR coefficients of order 12 are generated from the expansion of m X the time series of each EMG signal. In the literature, AR coefficients may be created from the orders of 2, 3, 4, 6, and 8 [25]- [31]. However, in this study, AR coefficients of order 12 are created due to the complexity of the EMG signals. Employment of 12 orders in neural network classification provides high performance. In this context, we can express the AR coefficients of order 12 as i a (i = 1, 2, …, 12) in (2) and (3): In this study, (a 1 , a 2 , …, a 12 ) can be used in feature selection from EMG signals for bruxism diagnosis as coefficients of order 12 that represent m X time series.

C. EMG Feature Extraction Using Discrete Wavelet
Transform Two important functions are involved in the application of DWT to an EMG signal. These are the scaling function    t and the wavelet function  .
In (4), m denotes the number of scaling levels and n denotes the amount of shifting. The mathematical expression of the approximation coefficients formed by the convolution of this function with the input signal is as shown in (5)     In (7), T m,n denotes the detail coefficients and  m,n (t) represents the wavelet function. Here, translation of the scaling function is important for calculating the approximation and detail components. This is shown in (8) In this equation, the function   t  is obtained by multiplying the function   2tk   , which is obtained through translation in the time axis, with the scaling coefficient k c that corresponds to each k value. Another scaling function can be obtained from a previous scaling function through this equation. Here, there are two criteria that should be noted. To ensure orthogonality, the squared sum of the scaling coefficients and the sum of the scaling coefficients c k corresponding to each k value should be equal to two [32]- [35]. We can obtain the wavelet equation from (9) by using the scaling coefficients In (9), the mathematical expression   1 the wavelet coefficient. It is expressed as in (10)     2.
To decompose the signal into different bandwidths to calculate the coefficients c k in the scaling function and k b in the wavelet function, it should be passed through high-pass (HP) filter h(k) and low-pass filter (LP) g(k) as shown in Fig. 4. The functions obtained through this transformation are shown in (11) and (12) [32]- [35]: The coefficients c k and b k should be known to obtain these functions that are passed through low-pass and high-pass filters. Although there are different wavelet models in the literature to determine these coefficients, Daubechies wavelet makes the transformation more practical. This wavelet is symbolized as dbN. This kind of wavelet has 2N scaling coefficients [36]. Therefore, in this study, the Daubechies db4 function with a scaling of 8 (k = 0, 1, 2, …, 7) as in Fig. 5 is applied. By using this function, c k and b k can be obtained in the time domain as in (13) and (14) Approximation and detail components A(t) and D(t) can be obtained from c k and b k . In this study, Shannon entropies (E 1 , E 2 , …, E 8 ) of the detail components obtained through a scaling of 8 are calculated. For this, the wavelet energy spectra (E jk ) of each component are considered. This is expressed in (15) In (15), D j (k) expresses the detail components obtained through wavelet transformation in the frequency domain depending on the sampling frequency (fs) at scale j and time k. Hence, WShEn of each detail component as shown in Fig.  6 is expressed in (16) as (E j ) [39]- [41] .

D. Diagnosis of Bruxism by Neural Network Classifier
It is essential that the system gives the correct responses in real-time measurements. In other words, it should discern the characteristics correctly even though the states that arise during jaw movement are different from the states previously encountered. For this reason, multilayer perceptron (MLP) based on feed-forward back-propagation algorithm, which is the most commonly used artificial intelligence algorithm, was used in classification. Thus, it was preferred due to its advantage of offering a higher number of variable parameters compared to other artificial intelligence algorithms. In this study, to perform a fast back propagation, network structures are trained by using Levenberg-Marquardt back propagation. The network structure is implemented by using MLP as a multilayer detector [42]- [44]. The main reason behind our preference of Levenberg-Marquardt algorithm for training the network is that this algorithm has a high learning speed due to its requiring a second degree derivative. Currently, it is the preferred method in network training [45], [46]. In these network structures, the data from the input layer produce a result at the output layer and the weights are updated to minimize the average mean square error (MSE) through back propagation by comparing with the desired result. However, sometimes the network memorizes instead of learning. Therefore, the network should be trained properly. The data set is divided into three groups, such as training, validation, and test. MSE, bias, and weights in the training set are important for the performance of the created network. In other words, it is desired that the weights are updated correctly depending on the input data. For this reason, it should be ensured that the input data are meaningful and correlated, i.e., the regression coefficients should be high. Training operation is stopped when there is no more improvement in the validation set. The test set is for evaluating the performance of the network. Therefore, the interpretation of MSE in the test results is important for the success of the network. Here, it is expected that MSE is as close to zero as possible as a performance criterion. However, it is a major disadvantage for this network structure that there does not exist an effective method to minimize MSE. For this reason, in determining the weights of the input data, training, validation, and test ratios are found through trial and error by using different values of the neuron numbers of hidden layers with the MATLAB software. Having regression (R) close to 1 (slope approximately 45 º) and MSE close to 0 are major criteria for evaluating the success of the network structure. If (R) is close to 0, but MSE is far from 0, there does not exist a linear relationship between the variables. Therefore, the network can be retrained to obtain an improved network structure by changing the initial weights and biases [47]. Once a successful network structure is formed, data obtained from different subjects at various times can be distinguished quickly and easily. This way, the disorder states of the data from the subjects can be detected accurately and quickly.
In this study, AR coefficients and WShEn values of the 125 different EMG signals obtained from the subjects for the cases of parafunctional teeth grinding, jaw squeezing, and the resulting pain and fatigue that occur during bruxism comprise the input to the MLP. For this reason, two different network structures with respect to the input characteristics of the EMG signals are formed. These structures are depicted in Fig. 7. The data are divided into three groups in different ratios, such as training, validation, and test with respect to the input characteristics of the created structures. Regression (R) coefficients of the network created according to these ratios are shown in Table  I (16) Here, N is the number of classes, NCCS is the number of correctly classified samples, TNS is the total number of samples, which was obtained from the confusion matrix [48] by using the MATLAB software.

III. EXPERIMENTAL RESULTS
In this study, AR coefficients obtained from the autoregression analyses and WShEn values obtained from wavelet transformation of the pre-processed EMG signals received from the subjects are used by the created MLP structures for bruxism diagnosis to accurately classify the closing of the lower jaw (teeth clenching), muscle fatigue and pain in the lower jaw muscles (masseter), and especially parafunctional rhythmical jaw movements (teeth grinding). To do this, the signals shown in Fig. 1 are collected from the subjects and AR and WShEn feature parameters are obtained to classify these signals for diagnosis.
When AR and WShEn are used together for the most accurate classification, the performance of the network is high. This can be understood from the accumulation of a significant portion of the MSE values at the zero error line in the histogram bar graphics in Fig. 8 and the best validation performance error values being lower than AR and WShEn in Fig. 9. Moreover, it can be seen from Fig. 10 that the slope is approximately 45 o and the regression coefficient (R) is very close to 1 when the best correlation performance is obtained by using AR and WShEn together.
In Table I, when the dataset is divided in ratios of 60 % training, 30 % validation, and 10 % test with 10 hidden layers and tested with the input feature parameters of both AR and WShEn, R = 0.99 and R = 0.98 are obtained (Fig. 10).  As a result, it consists of work aimed at determining the performance of the network described so far. It can be clearly seen that an MLP structure with much higher performance can be created when AR and WShEn parameters are used together. This created structure can be trained quickly with high performance by the Levenberg-Marquardt algorithm.
In the next stage, EMG data, which are composed of the data obtained from the subjects and represent bruxismrelated states, were classified with the MLP structure created using AR and WShEn parameters. Performance results obtained from these classifications were given in confusion matrices for 4 different output states (Table II). State 0 represents jaw opening and comprises 23 % of our dataset of 125 data obtained from the subjects. The remaining 77 % of the data are composed of data related to bruxism states. These data represent the states 1, 2, and 3 shown in Table II. By using AR parameters, 95.2 % of the data were classified correctly, whereas 4.8 % were classified incorrectly. With WShEn parameters, classification was made with a success rate of 94.4 %. When both parameters (AR + WShEn) were used for classification, the data were classified with 100 % accuracy so that the highest performance was achieved.
Since only EMG signals are used in this study, the number and weights of features are essential for an accurate classification in terms of the effect of the selected features. In this regard, before classification with the MLP structures, the performance of the network was determined in accordance with the features used. For an accurate classification, the decision is made by looking at R and MSE values with respect to the performance criteria of the network.
To obtain the maximum R and minimum MSE, data obtained from the subjects were simulated with MATLAB by using AR and WShEn features both separately and in combination in different ratios and the most appropriate network topology was determined as 60 % training, 30 % validation, and 10 % test. Having high training and validation ratios is an important factor for the performance of the network. The network structure with the highest performance can be obtained by using AR and WShEn features together as shown in Table I and Figs. 8-10. For this reason, the classification with these two feature groups has 100 % accuracy.   [14]; however, this is not sufficient.
Recently, there are studies that also take EEG measurements into consideration. In these studies, when EMG measurements are considered in combination with ECG and EEG measurements by DT algorithm, an accuracy as high as 97.21 % ± 0.836 % can be achieved in the clinical diagnosis of bruxism [15]. However, it is impractical, as well as costly to supplement EMG measurements with ECG and EEG.
Additionally, currently polysomnography using audiovideo recordings is evaluated together with the EMG of the masticatory muscles for the diagnosis of bruxism [18]- [20].
As the application of these methods necessitates a sleep laboratory or clinical hospital setting, they are timeconsuming considering the duration of hospital stay and require expert supervision. In the method proposed in this study, AR coefficients of order 12 of the EMG signals received from the subjects are obtained through autoregression and used together with 8-level WShEn values of the same signals through Wavelet transform. It is possible to achieve 100 % accuracy by using only MLP with these selected features. When evaluated with respect to the outputs of this system, the electromyographic states of bruxism that occur in the masticatory muscles are determined by taking the measurements for the case of muscle fatigue into consideration in the classifications. Thus, in this study, all electromyographic activity that occurs during bruxism is supported by an accurate classification based on EMG measurements. By virtue of its identifying these signals with the highest accuracy, this study is important especially for developing therapeutic electronic-based stimulating biofeedback systems [49]- [51]. Furthermore, by being based only on EMG measurements, it can be implemented by using a portable EMG device that allows obtaining signals in a home setting without the need for a clinical setting. Therefore, it is practical, economical, and not time-consuming.

V. CONCLUSIONS
In this study, unlike other studies in the literature that consider EMG signals for bruxism disorder, jaw locking and random teeth grinding movements are considered together with pain and fatigue in the jaw muscles as a whole and an effective system for accurate bruxism diagnosis is proposed. By virtue of its being based on the design of a system that records only EMG signals in a home setting without the need for a laboratory setting, this study is a practical and low-cost system. Activities that occur during bruxism should be recorded real-time through EMG signals and the most effective features should be extracted by way of time and frequency analyses. This way, the signals acquired from EMG measurements can be classified with 100 % accuracy by a neural network (MLP) algorithm by using AR (12) coefficients of order 12 and WShEn obtained from the Wavelet transform of the same signals. With the approach proposed in this study, when the cost and time spent for using only EMG data are considered, it is a practical method for wearable or portable EMG devices and a suitable approach for the effective use of therapeutic wearable electronic stimulation systems in future studies.