Discriminating Feed Rate of Combine Harvester by Using Association Rule Mining

1 Abstract —The feed rate is an important evaluation index of combine harvester performance. The quick identification of the amount of feed rate that enters the combine during harvesting is of great significance for the efficiency and operational quality of the combine harvester. To address this issue, this study proposes a feed rate discrimination method based on association rule mining. A self-designed data acquisition system was designed, taking the wheat combine harvester as object, and collected seven speed signals and three torque signals when the feed rate was 6 kg/s~8 kg/s, 8 kg/s~10 kg/s, and 10 kg/s~11 kg/s, respectively. The collected time series data were discretized so as to facilitate the construction of transaction sets. Then, the association rules in the constructed transaction set were mined by FP-Growth, and the rules with weak or no correlation with the increase in feed rate were filtered using min-support, min-confidence, and min-lift of 1.3, 0.8, and 3, respectively, to obtain strong association rules. Then, the strong association rules were constructed as classifiers. The test results showed that the accuracy of the constructed classifier for the identification of 6 kg/s~8 kg/s, 8 kg/s~10 kg/s, and 10 kg/s~11 kg/s feed rates was 100 %, 96 %, and 98.7 %, respectively. Research results can provide a basis for the adjustment of the working state of the combine harvester.


I. INTRODUCTION
Wheat is China's main food source, and in 2021, the wheat planting area has reached 23.57 million hectares, and the production is 136.94 million tons, both ranking among the highest levels in the world. In China, although the manufacturing standard of combine harvesters has improved greatly, there are still difficulties such as high fault rate and unstable operation performance in the wheat harvester working process, which seriously restricts the development of the wheat industry. Feed rate is a key indicator to measure the work performance of the combine harvester, which refers to the sum of the grains and stalks that pass through the combine harvester per unit of time. Generally, the combine harvester has a rated feed rate according to its structural characteristics. When the actual feed rate is higher than the rated value, it often leads to abnormal working order of load-bearing parts, such as the blockage of stalk auger, Manuscript received 10 January, 2023; accepted 2 April, 2023.
conveyor, threshing cylinder, and other parts. On the other hand, it will also cause the loss rate and the crushing rate to exceed standard, affecting the economic benefits of farmers. In summary, determining the feed rate during harvesting is of great importance, as it can enhance the overall performance and efficiency of the combine harvester.
When the crop enters the combine harvester, it can be regarded as a crop flow, which successively passes through the stalk auger, conveyor, threshing cylinder, tailing auger, and grain auger. By installing torque or speed sensors on these critical rotating components, it is possible to monitor the feed rate. For example, Chen, Li, and Ji [1] constructed an indoor bench and used it to carry out experiments, analysing the relationship between conveyor torque and feed rate. Meanwhile, Wang, Hu, Wu, Yu, and Cao [2] established the quadratic function relationship between the torque of the peanut combine harvester picking platform and the feed rate through field experiments. Although torque information is very useful for feed rate determination, torque sensor installation is often very difficult. Therefore, Zhang, Sun, Liu, Zhang, Li, and Li [3] designed a feed rate monitoring system based on torque of the header drive shaft. The system integrates ZigBee wireless transmission technology and 4G communication technology, effectively solving the problem that the torque sensor is difficult to install. In addition to directly using torque, some researchers use power to reflect the level of feed rate. Sun, Liu, Ou, Zhang, Zhang, and Li [4] built two feed rate detection methods based on the power of the stalk auger and the conveyor, respectively, by monitoring the torque and speed of the stalk auger and conveyor. Using power to detect the feed rate provides greater accuracy than using torque alone. Furthermore, to extract useful information more accurately from the collected sensor signals, Jiang et al. [5] analysed the noise frequency range based on wavelet transform, decomposed, denoised and reconstructed the signal, and thus obtained the fitting relationship between torque and feed rate. In addition to the traditional signal analysis technology, machine learning technology has also been applied in feed rate monitoring. Fan et al. [6] realised the real-time monitoring of the feed rate by using the neural network of particle swarm optimisation-back propagation, where header height, grain moisture, and the torque of the header shaft as input of the network, and feed Discriminating Feed Rate of Combine Harvester by Using Association Rule Mining rate as output.
In existing studies, the torque change of the stalk auger or conveyor was used mainly to reflect the feed rate, while the relationship between other critical components (such as the threshing cylinder, grain augers, and tailing augers) and the feed rate was less studied. Therefore, we introduced the torque or rotational speed of the threshing cylinder, grain auger, tailing auger into the monitoring of feed rate to measure the feed rate more comprehensively. An increase in monitoring sites also means an increase in the amount of data. The increase of monitoring positions generated a large amount of data to be processed. Although the neural network has shown good performance in processing large data, it is a black-box model, so the interpretability of the results is weak. Therefore, we use the Association Rule Mining (ARM) to mine the relationship between each monitoring parameter and the feed rate. ARM is one of the active research methods in data mining, which was first proposed by Agrawal [7]. ARM technology not only automatically mines the relationship between two things, but also presents this relationship as a "rule" to better reflect the internal mechanism of things. Initially, its purpose was to discover the relationships between different commodities in the transaction database. The association rules reflect the interdependence and correlation between one thing and another things, which can be regarded as a "if-then" relationship. In ARM technology, the most common method is Apriori, but Apriori needs to scan the database several times during the mining process, which is computationally very expensive. To solve this problem, many researchers have proposed various improvement methods, such as Park-Chen-Yu method (PCY), XML Frequent Pattern Tree (XFP-Tree), Graphical Processing Unit Apriori (GP-Apriori), and Frequent Pattern Growth (FP-Growth), which can effectively reduce the number of database scans in the mining process and improve mining efficiency [8], [9]. ARM has been widely used in different fields, such as fault diagnosis of complex industrial machinery [10], etiological analysis of diseases in medical science [11], [12], stock price prediction in economics [13], etc.
The following is how this paper is structured. Data acquisition, preprocessing methods, and association rules mining methods are shown in Section II. Section III contains a description of the results of the proposed method. The conclusions are discussed in Section IV. is an item set, and the elements in it are called "item". Suppose D is a transaction set, and each transaction T is a set of items so that . TI  There is a unique identifier corresponding to each transaction, and the "transaction ID" is recorded as TID. An association rule can be denoted as ,

A. The Basic Principle of ARM
XY    Usually, X and Y are called the "antecedent" and "consequence of this rule", respectively [14].
In this study, the process of using ARM to discriminate the state of feed rate is shown in Fig. 1.
The data collected in the field are divided into a training data set and a test data set. After preprocessing, both the original training set and the test set are turned into the form of transaction sets to facilitate the mining of association rules in the data set. In the transaction set, parameters such as feed rate, speed, torque were converted into items, the training data set forms a training item set, and the test data set forms a test item set. The data in the training item set was used to mine the association rules between different feed rate states and the torque or speed of the key component. In the mining process, to quickly obtain association rules, we choose FP-Growth. In addition, to reduce the number of mined rules, three indicators of support, confidence, and lift were used to filter the mined rules. Rules that do not meet the indicator threshold were removed and rules that meet the threshold were called "frequent rules" and used as classifiers. A test item set was used to verify the accuracy of the classifier. Input test item set into the classifier, and search for frequent rules in the classifier in it, so as to realise the discrimination of different feed rate states.

B. Data Acquisition
The feed rate of the combine harvester is determined by (1) where Q represents the feed rate,  represents the crop density, H represents the header width, v represents the travel speed, and  represents the ratio of grass to grain.
When the combine harvester is working, , H ,  and  can be regarded as constants. Therefore, the feed rate can be regarded as a variable that is only affected by , v and the feed rate in different states can be obtained by controlling the travel speed.
Before the start of the experiment,  and  were collected by the "five-point method", and one square metre of wheat was harvested manually at each point. The weight of the grains and residues was obtained by manual threshing, and the average weight of the grains at five points was taken as the density, and the grass-to-grain ratio was also taken as the average of five points. The header width of the combine harvester used in the test is 2.5 m. Figure 2 shows the monitored locations and parameters. Hall sensors and strain-resistor torque sensors were used to collect speed and torque, respectively. The signal acquisition frequency was 50 Hz, and the sensor was connected to the data acquisition card. Finally, the data acquisition card transmitted the collected data to the computer. The rated feed rate of the combine harvester used in the test is 6 kg/s. Through the pretest, the feed rate was divided into three levels, and the state of the combine harvester at different feed rates is shown in Table I.
To obtain the data in the three feed rate states, the test procedure is shown in Fig. 3, which consists of two main stages as follows.
1. The combine harvester enters the preparation area and ensures that it is in a stable condition and then drives out of the preparation area. 2. To achieve an increase in feed rate, three acceleration zones were established within the data acquisition area. In the first acceleration zone, the combine harvester accelerates from 4 km/h to 6 km/s (feed rate from 6 kg/s to 8 kg/s); then, it accelerates from 6 km/h to 8 km/h (feed rate from 8 kg/s to 10 kg/s); in the last zone, the acceleration process from 8 km/h to 10 km/h is completed.
In this stage, after the feed rate reaches 11 kg/s, it no longer increases with the increase of the travel speed.
The test was conducted for three days, and the test was repeated three times between 14:00 and 15:00 each day according to the test procedure.

C. Data Preprocessing and Transaction Set Construction
Because of uneven field terrain, dust, mechanism vibration, etc., outliers or missing data often appear in the original data, so the original data needs to be cleaned. Outliers in the data were removed and missing values were filled by cubic-spline interpolation. These two processing steps are conventional and simple and will not be described in detail here.
Typically, the set of transactions used to mine association rules contains two parts, the TID and the items contained in each transaction. In this study, the data acquisition time was used as TID and each sensor data was used as the content of items. The original data, after rejection and interpolation, were a complete continuous time series, which did not satisfy the requirement that the items in the transaction set were discrete data, so the Piecewise Aggregate Approximation (PAA) method was used to discretise the time series.
The PAA process is shown in Fig. 4.  To facilitate the representation, the names of the speed and torque data collected from different parts are redefined according to Table Ⅱ. After PAA conversion, the final content format of the obtained transaction set is shown in Table Ⅲ.

D. Association Rules Mining
Mining association rules from transaction sets are divided into two main steps: mining of frequent item sets and generation of association rules. Among them, the mining of frequent item sets tends to take a long time. Therefore, the FP-Growth algorithm was selected to reduce the time spent in the mining process. The method is a mining algorithm proposed by Zhang [15] that does not require the generation of candidate item sets, and only needs two scans of the data set during the whole algorithm execution, which greatly reduces the mining time. The core of FP-Growth algorithm is the construction of FP-tree, which relies on the FP-tree to achieve efficient mining efficiency. The implementation of FP-Growth consists of two main parts, the construction of FP-tree and the mining of frequent items.
The construction of FP-tree mainly includes the following six steps [16].
1. To begin, the original transaction set undergoes its initial scan to count the frequency of each item in all transaction records. Then sort each item based on the number of occurrences and remove the items that do not satisfy the min-sup (i.e., minimum support  (4) and (5) until the tree contains only one item. It can be seen that in the construction of the FP-tree, the support is used to measure the frequency of the items. The definition of support is shown in (2) where, A and B are two terms, denoting the number of transactions where both A and B occur and the number of all transactions. It can be seen from that when all N is large, the () supp A B  will be very small, so several attempts are needed for the determination of min-sup, and the number of frequent item sets mined will be greatly increased when the min-sup is too small. In this situation, the support is improved and denoted im-support, as shown in (3) In the denominator, only the number of transactions containing A occurrences is retained. That is, im-support means the ratio of the number of transactions in which A and B appear simultaneously to the number of transactions in which A appears, rather than the number of all transactions.
In addition to support, confidence and lift were also introduced into the determination of frequent item sets in this study. Confidence indicates the probability of occurrence of item B when item A occurs, which can be regarded as the conditional probability and defined by (4). The definition of lift is shown in (5), which reflects the correlation between A and B. When lift > 1 and higher, it means that the positive correlation between the two items is high, lift < 1 and lower means that the correlation between the two items is low, lift = 1 means that the two items are independent of each other:

P AB supp(A B) conf A B P B A P A im supp(A)
The mining process of the FP-Growth algorithm after introducing three indicators is shown in Fig. 5.
In the mining process, after setting thresholds for support, confidence, and lift, support was used to filter the item sets first, and the frequent item sets were preliminarily determined. Then, after filtering the frequent item sets with confidence and lift, the frequent item sets obtained were used as strong association rules, and these strong association rules were used as classifiers. The strong association rules were mined at three feed rate levels to form three classifiers, respectively. These classifiers were then combined to create a larger, more robust classifier that could effectively identify patterns and make accurate predictions.
The process of discriminating of feed rate using the test item set is shown in Fig. 6. The association rules contained in the three classifiers are not the same, and the test item set is searched for the existence of rules that are the same as the strong association rules contained in one of the classifiers, so as to achieve the judgment of the feed rate level. As shown in Fig. 6, the rules contained in the test item set are the same as the rules in the first classifier, so it is judged that the combine harvester is in the feeding amount stage of 6 kg/s to 8 kg/s at this time. Fig. 6. The process of discriminating feed rate using classifiers.

A. Results of Transaction Set Construction
After PAA processed the original data, the dimension of data can be effectively reduced. Figure 7 shows the result of the PAA processing with the original length of 23150. It can be seen that after processing, the sequence length is only 926 and that the discretised data sequence still maintains the trend of data variation. When the feed rate increased from 6 kg/s to 8 kg/s, three torque signals showed an upward trend from L2 to L3, while seven speed signals were concentrated in L2. The processed data are stored in the transaction set format and the final result is shown in Table Ⅳ.  TABLE Ⅳ. THE CONSTRUCTED TRANSACTIONS SET.  TID  Items  1  FRL3  RSL3  STL2  SSL3  CTL2  CSL3  TCTL2  TSSL3  BSL2  TSL2  GSL2  2  FRL3  RSL3  STL2  SSL3  CTL2  CSL3  TCTL2  TSSL2  BSL2  TSL2  GSL2  3  FRL3  RSL3  STL2  SSL3  CTL2  CSL3  TCTL2  TSSL2  BSL2  TSL2  GSL2  4  FRL4  RSL2  STL3  SSL2  CTL3  CSL2  TCTL3  TSSL2  BSL2  TSL2  GSL2  5  FRL4  RSL3  STL3  SSL2  CTL3  CSL2  TCTL3  TSSL2  BSL2  TSL2  GSL2  … …

B. Distribution of Strong Association Rules
After preprocessing, all data were stored in the form of transaction set. In the constructed transaction set, the distribution of transaction numbers under the three feed rate levels is shown in Table Ⅴ. 70 % of the data at the three feed rate levels were used as the training item set and 30 % as the test item set.
The association rules were mined using the data from the training item set. To fully understand the distribution of rules embedded in the data, the rules are first mined without specifying the minimum thresholds of the three indicators. A total of 515 association rules were mined when the feed rate was at 6 kg/s-8 kg/s, 503 association rules were mined at 8 kg/s-10 kg/s, and 483 rules were mined when the feed rate was at 10 kg/s-11 kg/s. The distribution of the association rules obtained is shown in Fig. 8.  Fig. 8. Distribution of all association rules in the three feed rate states: (a) All association rules in the feed rate at 6 kg/s-8 kg/s; (b) All association rules in the feed rate at 8 kg/s-10 kg/s; (c) All association rules in the feed rate at 10 kg/s-11 kg/s. Among the association rules initially mined, there are a large number of rules that are independent of the feed rate. As can be seen from Fig. 8, it is evident that only a few association rules out of the initially mined ones fall within the spatial range where all three indicators are high, implying that most rules are not useful. To eliminate rules that were considered less relevant, minimum thresholds were set for support, confidence, and lift measures. Specifically, a threshold of 0.3 was established for support, 0.8 for confidence, and 3 for lift. Any rules that did not satisfy these criteria were filtered out. The 14, 16, and 42 strong association rules were obtained in states with feed rates of 6 kg/s to 8 kg/s, 8 kg/s to 10 kg/s, and 10 kg/s to 11 kg/s, respectively, and their distribution is shown in Fig. 9. Table  Ⅵ shows the specifics of some strong association rules.  The first strong association rule in the feed rate state of 6 kg/s-8 kg/s is "FRL4 RSL2 STL4 CTL4 TCTL4", which indicates that when the feed amount is the fourth level (L4), i.e., near 8 kg/s, stalk auger torque, the conveyor torque, and the threshing cylinder torque are all at level 4. As can be seen from the rules presented in Table Ⅵ, these rules can express the relationship between the feed rate and each important parameter. These strong association rules mined are used as three classifiers. When the torque or speed state is monitored to be the same as the state in the strong association rule, the corresponding state of the feed rate can be obtained.

C. Feed Rate State Discriminant Results
The test item set was input into the classifier and the results of the feed rate recognition accuracy are shown in Table Ⅶ. It can be seen that the constructed classifier has reached 100 %, 96 %, and 98.7 % correct discrimination rates for the three levels of feed rate, respectively. It indicates that the method of using strong association rules can effectively discriminate the state of feed rate.

IV. CONCLUSIONS
In this study, we propose an ARM-based method to discriminate the feed rate, to achieve the monitoring of the working state of the combine harvester during operation.
During the field test, not only was the torque of the stalk auger, conveyor, and threshing cylinder collected, but seven speed signals were also collected. In addition, the change of feed rate was simulated by controlling the travel speed at different levels, and torque and speed data were obtained for three states of feed change.
To improve mining efficiency, the PAA method was first used to reduce the data dimension of the original time series. The dimensionally reduced data were stored in the form of transaction sets to facilitate the mining of association rules. To obtain high-quality association rules, the FP-Growth mining algorithm was improved, i.e., in addition to using support degree, confidence, and lift were also used to filter association rules that are less relevant. Finally, strong association rules were obtained and used as classifiers.
Based on the final results, the discrimination accuracy for the three feed rate states of 6 kg/s to 8 kg/s, 8 kg/s to 10 kg/s, and 10 kg/s to 11 kg/s was 100 %, 96 %, and 98.7 %, respectively. This indicates that the method used can effectively monitor the changes in feed rate states during the combine harvesting process.
In the future, the aim is to combine optimisation algorithms such as Particle Swarm Optimisation (PSO) or ant colony optimisation algorithm into the mining process, which could speed up the process of determining the optimal value of support, confidence and lift, and further improve the efficiency of mining association rules.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.