Combine Harvester Cooling Water Temperature Prediction Based on CDAE-LSTM Hybrid Model

1 Abstract —Cooling water temperature of the combine harvester during operations can reflect the changes of its power consumption and even overloads caused by extreme workload. There is an existing problem when extracting water temperature information from harvesters: data redundancy and the loss of time series feature. To solve such problem, a Convolutional denoising autoencoder and Long-Short Term Memory Artificial Neural Network (CDAE-LSTM) hybrid model based on parameter migration is proposed to predict temperature trends. Firstly, the historical data of the combine harvester are taken into account to perform correlation analysis to verify the input rationality of the proposed model. Secondly, pre-training has been performed to determine the model’s initial migration parameters, along with the adoption of CDAE to denoise and reconstruct the input data. Finally, after the migration, the CNN-LSTM hybrid model was trained with a real dataset and was able to predict the cooling water temperature. The accuracy of the model has been verified by field test data gathered in June 2019. Results show that the root mean squared error (RMSE) of the model is 0.0817, and the mean absolute error (MAE) is 0.0989. Compared with the performance of LSTM on the prediction data, the RMSE improvement rate is 2.272 %, and the MAE improvement rate is 20.113 %. It is proven that the adoption of CDAE stabilizes the model, and the CDAE-LSTM hybrid model shows higher accuracy and lower uncertainty for time series prediction.


I. INTRODUCTION
Combine harvesters are the most widely used harvesting machinery in crop harvesting, and their performance often has a direct impact on harvest quality. In actual operation, the harvester load is not constant due to factors, such as crop density, surface undulation, and crop moisture content. This also makes the engine output power of the harvester show a trend of dynamic changes accordingly. When the rated working load is exceeded, it may even cause the harvester to fail severely [1]- [5]. When the engine power changes throughout its operations, the temperature of the cooling water changes accordingly. Therefore, the harvester's load status can be judged by the temperature of the cooling water. How to collect and analyse the relations between water temperature changes and harvester performance is a problem that needs to be solved urgently [6], [7].
In recent years, machine learning and deep learning combined with traditional feature extraction methods, such as frequency-domain features [8]- [12], wavelet energy entropy, have performed well in fault diagnosis and prediction. Since the introduction of Alex Net [13] in 2012, the deep neural network has been widely used in fault diagnosis and prediction. Therefore, the diagnosis and prediction of faults can be realized by analysing the characteristic parameters of related faults (such as speed, torque, vibration, etc.). Building a data-driven predictive model will be a new way of solving traditional problems. During parameter tuning, a large number of noises and vibrations were detected in the combine harvester's working situation [14]- [17]. They interfere with the feature extraction process, resulting in the final extracted key feature parameters showing nonlinearity, no marking, and mutual interference status. Therefore, how to choose a feature extraction model is one of the current research issues in detection data processing.
For noise issues, which interfere with the feature extraction process, which they causes the abnormal data of key feature parameters to be extracted. Some parameters also show nonlinearity, no marking, and mutual interference status. On the one hand, the time sequence processing was usually ignored in the analysis of working condition parameters. The fault parameter labelling requires manual marks, which is difficult to be used in the working conditions of the fault post-analysis [18]- [20]. On the other hand, different acquisition parameters were viewed separately in traditional research, which makes the variable load correlation difficult to reflect. On the contrary, the deep feature differences of the whole system and the analysis objects, which are mostly vibration signals, are easy to extract [21]- [24].
After viewing the above problems, this paper proposes an improved hybrid forecasting model to explore the relationship between the engine cooling water temperature and the combine harvester's performance. The model is based on the transfer learning convolutional autoencoder.

A. Input Selection of Prediction Model
The relations of working parameters of combine harvesters present multiple possible causes or one cause with various results. Different researchers have studied the key parts of the combine harvester [25]- [27]. To conduct correlation calculations, the independent time series characteristics of key components like speed, temperature, and engine working index, are selected. To couple the feature information affecting the parameters and fully excavate the potential internal relations and the spatial law of these features, the correlation calculations happen between the parameters are carried out to determine the acquisition parameters. We select four Rotation speeds of important load parts, fuel consumption, and other engine parameters as the inputs of our model.

B. Correlation Calculation
The correlation analysis is carried out for the original data. As for the combine harvester, a working condition parameter acquisition system is developed combined with a field experiment acquisition point, data from the operation or maintenance management platform. Pearson's correlation coefficient is selected as an indicator to calculate the correlation, the expression is as follows , cov x y is the covariance between parameters x and y Details of the dataset and the calculated image are shown in the Fig. 1; as illustrated, the selected key part parameters correlate 0.2 ≤ |Corr| < 0.52, which can be used as predictive input parameters for the actual distribution of training data over time. Figure 2 shows the correlation between the parameters.

C. Time Series Analysis
In this paper, the data are obtained from different sensors based on Controller Area Network-bus (CAN-bus); speed, fuel consumption, and cooling water temperature all belong to multivariate time series, which contain multiple unary time series as portions. The sampling time points of each unary time series are the same. All data can be represented in matrix form, each row represents a time point, and each column stands for a unary time series. The time series representation of the variables of multiple processes at the same time is as follows   12 , . ,, The Pearson correlation matrix in the previous section is adopted as the time series correlation matrix. The prediction of the cooling water temperature needs to learn the univariate process of each portion and to learn the relations and variation laws between each portion to make the prediction and controlling of the time series of cooling water temperature affected by the workload.
In fact, the use of LSTM is the combination of multiple regression and time series analysis. It needs to determine the prediction model's structure according to the cooling water temperature parameters before using the LSTM model. Multi-step prediction needs to be conducted to meet the requirements of operation and maintenance management for combine harvesters. In this case, we need to predict the data from the current time to a period. The input data can be obtained through a sliding window. Therefore, the persistence model can be built, which specific details will be described in Section III.

III. RESEARCH METHOD
First, using a convolution noise reduction in the model design, we follow the original autoencoder to reduce noise, build the pre-train model as closely as possible. Then the data were inputted into the improved CNN-LSTM model to make the prediction. The data format is one-dimensional sensor data. Adopting CNN-LSTM training parameters as the model parameters of the Convolutional denoising autoencoder (CDAE) can reduce the training difficulty. Adopting the feature extraction of deep data as the input of LSTM can reduce the influence of abnormal data on the time series and improve the prediction accuracy.
Specifically, the initial parameter range of the model structure is obtained through pre-training. The optimal hyperparameter combination is obtained by searching randomly from the search function. The pre-trained model parameters are used as the input of transfer learning for the initialization of the CDAE and CNN-LSTM model.
Transfer learning is a type of machine learning, which aims to obtain general knowledge representation from existing machine learning tasks. It is applied to other tasks to optimize the model and accelerate convergence. In transfer learning, parameter transfer is to transfer the weights of nodes in a partial layer network from a trained network to an untrained network with the same structure, rather than train a model for a certain task from the beginning. Training and fine-tuning a new model based on transferred model parameters can reduce the training time of hyperparameters, saving computing resources and making the model more robust. The generalization ability is stronger than the retrained model.

A. One-Dimensional Convolutional Network
The Convolutional layer in CNN is divided into one-dimensional and two-dimensional convolutions according to the dimensions of the input. We use one-dimensional convolution layer (Covn1D) to construct a hybrid model. Covn1D is used to excavate the deep features and potential information of input multi-parameter time series data. The Covn1D appearing in following are all one-dimensional convolution single-layer, and Convolutional Neural Networks are formed by multiple Covn1D layers and other Operation layers. Complete convolution operation adding the pooling layer, automatically extract the useful feature representation of the original data and output the feature vector. The convolution layer operation formula is as follows

B. Convolutional Noise Reduction Autoencoder
The autoencoder is greatly affected by the original data. To learn the original data with superimposed noise, we can use the DAE to get better results. Convolutional noise reduction autoencoder conducts the process of data compression and decompression, the original data are mixed with noise, and then the original data are reconstructed through unsupervised learning, the back-propagation optimization algorithm is used to make the reconstruction error rate meet the noise reduction requirement [28]. In this paper, the mean absolute error (MAE) is regarded as the reconstruction loss function in the autoencoder, DAE structure is shown in Fig. 3. Autoencoder uses the mapping relationship between the input layer and the output layer to realize sample reconstruction and extract features, the formula is as follows The structure of the convolutional autoencoder is based on the model structure of the encoder CNN in the pre-model, which adopts the same number of one-dimensional convolution layer and pooling layer, and the LSTM layer is regarded as the outermost layer of the decoder to replace the deconvolution layer. The hyperparameters are adjusted based on the initial state parameters of the transfer CNN-LSTM. The next-step adjustment is carried out after the deconvolution of the inner parameters of each layer and the outermost layer, and finally, the convolution noise reduction autoencoder, which can effectively meet the noise reduction requirement, can be obtained.

C. Long-Short Term Memory Artificial Neural Network
Long-Short Term Memory Artificial Neural Network (LSTM) is an improved feedback neural network based on Recurrent Neural Network (RNN), which can effectively deal with the sensitive problems of time series and solve gradient disappearance's problems on the premise of inheriting the characteristics of the RNN model [29]. For the problems that combine harvester working parameters belong to time series, LSTM can effectively simulate the combine harvester working logic and learn the implicit characteristics between parameters. The specific structure of single-layer multi-step LSTM is shown in Fig. 4. LSTM adds or removes information to the cell state through the threshold, which includes the forgetting gate, input gate, and output gate. In the threshold structure, σ represents the sigmoid layer, tanh represents the tanh layer. Cell state C and hidden layer state h are updated through vector calculation, the output vector of LSTM is as follows: In the formulas, ,

IV. EXPERIMENTS AND ANALYSIS
To verify the stability and effectiveness of the prediction method, field experiments were carried out (as shown in Fig.  5). The experimental object was a certain combine harvester. The nominal feeding rate is 8 kg/s, the cutting width is 2560 mm, and the matching power is 140 kW.

A. Equipment and Steps of Experiment
The experimental equipment includes a combine harvester working condition acquisition system based on CAN bus of SAE J1939 protocol (as shown in Fig. 6). To obtain the key working condition parameters of the combine harvester, a maintenance management platform was developed to upload real-time data. The acquisition device's sampling frequency was 1 HZ, and the sensor's acquisition accuracy error is shown in Table I.  Before the harvester started, the driver controlled the engine speed by manual throttle and kept the engine speed stable under the rated condition. After starting the operation, the driver accelerated to 6 km/h-8 km/h and kept the speed relatively stable, followed by the working condition acquisition system started to collect the rotating speed of the engine, threshing roller, re-thresher, grain conveyor screw, and other components, and the temperature of key components. The engine parameters were transmitted to the acquisition system through CAN bus.

B. Experimental Index and Date Processing
According to the requirements of the China national standard GB/T 8097-2008 "Equipment for harvesting Combine-Harvester-Test procedure" on crop and field conditions, the experimental field should be flat, and the crops should be in uniform growth.
During the experiment, the environment temperature was between 20 ℃-34 ℃, and the wind speed was between 1.6 m/s-3.3 m/s. CNN-LSTM was designed to predict model performance, regarding R 2 as the regression coefficient and MAE, and the Root Mean Squared Error (RMSE) as experimental indexes. The calculation formulas are as follows: where SSE is the sum of squares of residuals,   The data set is designed according to the data uploaded from sensors. Because parameter resources and the acquisition frequency are different, the data should be pre-processed. Thus, the collected 9900 data entries were divided into 8000 training sets and 1900 verification data sets. The original water temperature and speed sensor data were processed evenly through a sliding window, which was obtained using convolution operation. The convolution formula is as follows where a and v are input arrays, and the size of the convolution kernel takes a sliding step of 10. Min-Max scaling is used to improve the convergence speed and prediction accuracy of the model. The computer hardware configuration used in this paper is as follows: the processor is Inter(R) Core (TM)i7-9750H, 16G-RAM, the operating system is 64-bit Windows 10, and the GPU graphics board is NVIDIA GeForce GTX 1660 Ti. The software framework is the Keras deep learning framework, and the deep learning framework TensorFlow 2.0 is the back-end support, the programming language version is Python 3.7, and the integrated development environment is PyCharm. The overall process is shown in Fig. 7.

C. Model Training
We evaluated the statistics of our model according to the indexes. Fine-tuning shows that the RMSE and MAE were greatly affected by abnormal values. First, the regression coefficient R 2 is regarded as the optimization standard to determine the model's fitting degree. The result can determine the approximate range of the parameters through presenting parameters. The filters and the number of layers may be increased according to the condition of pre-training. It is found that when the number of layers was increased, the loss function decreased, and R 2 decreased as well. However, RMSE first increased and then decreased and MAE, on the contrary, decreased first and then followed by increase.
Different layers and different hyperparameters have a certain effect on R 2 , making it difficult to determine the best structure because the layer is too low. Further parameter and structure adjustments are needed. Besides, R 2 cannot fully reflect the prediction ability of the model. Taking the example of the water temperature of combine harvesters, the range of the independent variable of the actual value of water temperature is 0 °C-80 °C, which is too small, and the R 2 of the model is closed to 1. Therefore, the best prediction model can be determined by RMSE and MAE.
The above analysis is the basis for training after the model was designed. As described in the third section of the text, pre-training is required, and the initial transferred value to CDAE and CNN-LSTM for the next step of training.
1. Pre-training of Initial Model CNN-LSTM model [30] layers and initialization parameters need to be trained in the pre-training, where the specific training steps of Covn1D convolution layer and pooling layer are as follows:  Determine the size and dimension of the sample according to the input sample and parameters and initialize the kernel size;  Input pre-processed data into Covn1D according to the input format of one-dimensional convolutional neural network;  Adjust the kernel size and the number of filters to determine the best CNN extraction model parameters.
In this paper, the grid search "GridSearchCV" method is adopted to traverse batch_size and epochs and then continue to adjust other parameters [31]. The range of the parameters is shown in Fig. 8. When the epochs are 100 and batch_size is 3, the combination has better results. Then we used the random sampling tool to adjust other parameters.
RandomizedSearchCV is adopted to adjust the hyper-parameters of the CNN-LSTM model to obtain the optimal structure. The main parameters are introduced in the third section. After getting these key parameters, the search is carried out taking R 2 as the objective function. The hyper-parameters of the hybrid model determine the parameter types shown in the figure below. The training cost increases due to the correlation of the parameters, the optimization strategy with relatively low learning cost and high accuracy is selected for optimization to achieve pre-training. First, we get the best combination of initialization parameters obtained through pre-training. The structure and parameters of the pre-training model are taken as transfer objects (as shown in Table II). Pre-training can reduce the number of calculations needed and improve the accuracy in the initial training of the convolutional noise reduction autoencoder and CNN-LSTM model. The final model structure can be obtained after further adjustment of the inner structure.

Convolutional Denoise Autoencoder Training
The structure and initialization parameters of CDAE conduct parameter transfer learning based on the convolutional coder of the pre-training model (as shown in Table III). To ensure that the lowest feature extraction, a layer of Covn1D is set at the outermost layer. The encoder consists of Covn1D and UpSampling. The structure is shown in Fig. 9. The CDAE restructure the original data by inputting Gaussian noise in autoencoder, the data will be generated after reconstructing by autoencoder. The degree of restoration is affected by the noise center. The reconstruction effect of the structure of different encoders is shown in Fig.  10.  9. Structure of different CDAE Models. In the figure, the structure of the hybrid model is transferred from pre-training (Conv1D is a one-dimensional convolutional layer, LSTM layers are finally replaced with Covn1D to obtain the structure in Table III).

CNN-LSTM Model Training
In this paper, the pre-training model's structure is taken as the initialization of the CNN-LSTM. We set the time step as 5, the data from t -1 to t + 4 are input, and the parameters of the model are adjusted continuously to optimize the performance of the model with Adam optimization algorithm. To avoid the model falling into a local minimum, the initial learning rate is set as 0.001, and the parameters are adjusted until the result reaches the optimal. The parameter adjustment process is the same as the pre-training. The training process effect is shown in Fig. 11.
To verify the prediction accuracy of the model, RMSE and MAE were established as evaluation indicators, the model training effect can be evaluated by the promotion rate of RMSE and MAE. The calculation formulas are as follows:

Analysis of Model Prediction Effect
The training effect of the CDAE is shown in Fig. 12. The CDAE has a significant effect on data denoise. It improves the prediction efficiency combined with the data pre-process ability. Different CDAE structures have different denoise effects on the data, which can be seen in Fig. 12. The CDAE based on transfer learning can denoise more effectively. Figure 13(b) shows the degree of fit between the predicted cooling water temperature of the CDAE-CNN-LSTM model and the actual experimental results. To verify the accuracy and stability of the proposed prediction model, we select the Back Propagation (BP) network model, the LSTM model Fig. 13(a), and the CDAE-CNN-LSTM model to compare and analyse. It can be seen in Fig. 13 that the CDAE-CNN-LSTM has a better fitting result. The RMSE, the MAE, and the R 2 are 0.0817, 0.0989, and 0.9785, respectively. The training loss function shows that the CNN-LSTM has a lower degree of overfitting. Compared with the LSTM model, the promotion rate of RMSE and MAE is 2.272 % and 20.113 %, respectively. The prediction effects of different models are shown in Table IV.   Combining chart analysis and the CDAE-LSTM hybrid model has led to a fabulous fit than the traditional BP neural network. It is almost difficult for the BP neural network to fit the time series data (R²shows poor-fitting effect). The hybrid model R² is slightly lower than the original LSTM, but the drop rate is less than 1 %. Under the premise of fulfilling the requirements of fitting accuracy, the hybrid model is a lower risk, considering the impact of overfitting. As for accuracy analysis, on the one hand, the statistics RMSE can better reflect the influence of discrete points. From the table, we can see that CDAE-LSTM has good effects on abnormal data training. On the other hand, MAE can most intuitively reflect the prediction effects. Comparing CDAE-LSTM and LSTM from the fitting effect diagram, it shows that the hybrid model has better fitting accuracy V. CONCLUSIONS 1. The traditional regression model has poor processing efficiency within time series and it is unable to predict the parameters of unstable time series. Therefore, deep learning is introduced to the modelling and analysis of multi-source parameter time series. A neural network based on CDAE and CNN-LSTM is proposed. The cooling water temperature prediction model is proved by experiments that it has a high accuracy. 2. This paper proposes that the CDAE-LSTM model based on migration learning has a better prediction effectiveness than the original LSTM model. CDAE can effectively eliminate data anomalies caused by noise with limited sampling, as well as obtain the best migration process data. Since DAE is a lossy process for data processing, extended studies around output degradation may be considered in the future. 3. The prediction model's accuracy rate was analysed using the statistics of RMSE, MAE, and R², which verified that the model had low overfitting based on the field test data. Under the premise of risk (R² = 0.9785), it has a higher accuracy rate (MAE = 0.0989, RMSE = 0.0817), which shows the model's robustness. The obtained regression model can provide an early warning basis for the subsequent changes in the combine harvester. Furthermore, the trend of the fault parameters can provide potential ideas for the predictive diagnosis of the combine harvester.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.