Clustering-Interpolation Method and Its Application to Wind Turbine Generator Curve

The real-time operating wind turbine power curve (WPC) of a wind turbine generator (WTG) is not completely identical to a WPC provided by the manufacturer because of various factors. In order to obtain an accurate WPC model that can consider various factors, this paper improves a bisecting k-means clustering algorithm. The improved clustering algorithm is used for partitioning the measured data into a certain number of groups, which can be expressed in their centroids. The interpolation method based on the polynomial is carried out for modelling a WPC of WTG. The modelled WPC is applied to the reliability analysis of the generating systems with a wind farm. The results show that the accuracy of the linear interpolation is higher than that of quadratic interpolation and cubic spline interpolation when there are a relatively large number of clusters. DOI: http://dx.doi.org/10.5755/j01.eee.20.8.5195

1 Abstract-The real-time operating wind turbine power curve (WPC) of a wind turbine generator (WTG) is not completely identical to a WPC provided by the manufacturer because of various factors.In order to obtain an accurate WPC model that can consider various factors, this paper improves a bisecting k-means clustering algorithm.The improved clustering algorithm is used for partitioning the measured data into a certain number of groups, which can be expressed in their centroids.The interpolation method based on the polynomial is carried out for modelling a WPC of WTG.The modelled WPC is applied to the reliability analysis of the generating systems with a wind farm.The results show that the accuracy of the linear interpolation is higher than that of quadratic interpolation and cubic spline interpolation when there are a relatively large number of clusters.
Index Terms-Wind turbine power curve, bisecting k-means clustering, interpolation, wind farm, reliability analysis.

I. INTRODUCTION
Wind energy is a clean and renewable energy source, and it is also the world's fastest growing energy resource [1]- [2].Because of the stochastic and intermittent nature of wind speed, an accurate prediction of wind power is difficult [3]- [4].The measured wind turbine power curve (WPC) of a wind turbine generator (WTG) in a specific wind farm is different from a WPC provided by a WTG manufacturer [5]- [6].The possible difference may be because of various factors, such as the wake effect, air density, barometric pressure, temperature variations, clouds, and rain.However, some factors might not be accurately included in a mathematical model for calculating the power output of a WTG [6].Therefore, in order to calculate the power output of a WTG under a given wind condition and consider the impact of the complex environmental factors, the historical wind data (including wind speed and wind direction) and the corresponding power output data of a WTG in a real wind farm can be used for modelling a new WPC of a WTG.
The difference between a WPC provided by a WTG manufacturer and an empirical WPC measured from a real wind farm has been analysed [5]- [6].In order to improve the short-term wind power prediction, four direction-dependent WPC models for different wind direction ranges have been built on the basis of the measured wind data of a real wind farm [5].A statistical forecasting system is described in [6] for the short-term prediction of wind energy production on the basis of the adaptive combination of alternative dynamic models.
Considerable work has been done for the development of WPC models for the monitoring and forecasting of wind power.In [7], four data-mining approaches for monitoring WPC are compared.This comparison shows that the adaptive neuro-fuzzy-interference system model has the best performance among the analysed approaches.In [8], a probabilistic model of WPC for monitoring purposes based on the copulas theory is developed, and the copula function is used for dealing with the complexity of the relationship between the wind speed and the wind power.Parametric and nonparametric models of WPC have been developed in [9]- [11] aiming to obtain accurate models for the online monitoring and forecasting of wind power.In [12], existing WPC models, such as polynomial power curve, exponential power curve, cubic power curve, and approximate cubic power curve, have been compared to the manufacturer's power curve.In [13], a nonlinear formula for approximating the manufacturer's WPC is proposed on the basis of the interpolation method; the parameters in this formula can be analytically determined.However, the WPC models in [12]- [13] do not consider the environmental factors.In [14], three discrete operational WTG curves (i.e., the power curve, the rotor curve, and the blade pitch curve) are introduced for monitoring a wind farm's performance, and a k-means clustering is applied.In [15], three different machine learning models are used for estimating the relationship between the wind speed and the power output of a wind farm, and an equivalent power curve model of an entire wind farm under normal operating conditions is built for detecting the anomalous functioning conditions of the wind farm.
The measured power output of WTG in a real wind farm is the real power that synthesizes the impact of complex environmental factors.Therefore, in order to incorporate the impact of complex factors, a WPC model based on an improved bisecting k-means clustering algorithm and the interpolation method is presented in this paper.As the wind speed is the dominant factor among the factors, the measured wind speed and the power output data as two-dimensional coordinate points will be clustered into several points on the basis of the Euclidean distance.However, the layout of wind farm from which measured data are obtained is constant.Because the wake effect in different layouts is not the same, wake model should be independently considered in order to apply the proposed model to a wind farm with an arbitrary layout.Thus, the wake model will be built for accurately calculating the power output of a wind farm.An analytical WPC formula can be constructed on the basis of the method proposed in this paper.The proposed WPC model will be applied to the reliability analysis of generating systems with the integration of a wind farm.

A. Improved Bisecting k-means Clustering Algorithm
Data mining is an automatic or semiautomatic process of extracting valuable information from large amounts of data [16]- [17].In recent years, data mining has attracted a considerable amount of attention from the information industry and the society, because of the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge [18].Clustering analysis is a data-mining technique used for partitioning data objects into a certain number of groups [17].The partitioned objects within a group are similar to one another and different from the objects in the other groups.Further, the greater the similarity within a group and the greater the difference between groups, the better or more distinct is the clustering [19].The standard k-means clustering algorithm (SKMC) is one of the best-known and most popular algorithms used in clustering, and it seeks an optimal partition of the data by using different criteria [20]- [21].However, the results obtained from the SKMC highly depend on the initialization of the clustering parameters; in other words, different initializations may produce different results.In order to overcome the disadvantage of SKMC, a bisecting k-means clustering algorithm (BKMC) is proposed in [22].However, there is room for improvement of the BKMC.In order to describe an improved bisecting k-means clustering algorithm (IBKMC), the clustering processes of SKMC and BKMC are summarized below, and the processes are illustrated in Fig. 1 and Fig. 2, respectively, where the points with circle shape represent a set of data that need to be clustered and the points with star shape represent the centroids of the groups.
Finding k groups by using the SKMC algorithm: Step 1) Select randomly k points as the initial centroids (initialization).
Step 2) Assign all points to the closest centroid, and recalculate the centroid of each cluster.
Step 3) Repeat Step 2) until the centroid of each group remains constant.
Finding k groups by using BKMC: Step 1) Select a group to split.
Step 2) Assign all points that belong to the selected group to two groups using SKMC.
Step 3) Repeat Step 2) until the centroids of two groups remain constant.
Assume that all points on a plane need to be partitioned into k groups.SKMC initially needs to randomly select k centroids.However, BKMC initially splits all points into only two groups and then, continues to select one group on the basis of a criterion function to split until k groups are constructed.
Each split based on BKMC assigns only the points from the selected group into two new groups.However, some points in the previously split groups may have a better similarity to a new group, but the points that belong to the previously split groups will not assigned again on the basis of BKMC.Therefore, the paper presents an IBKMC algorithm based on SKMC and BKMC.Assume that a possible result is obtained using BKMC, as shown in Fig. 3(a).However, if IBKMC is conducted, the potential result shown in Fig. 3(b) may be formed.It can be seen from Fig. 3 that three points and one point in group 1 shown in Fig. 3(a) are further assigned to group 2 and group 3, respectively, as shown in Fig. 3(b), and the centroids of the groups are also changed.In this paper, the wind speed and the corresponding power output of WTG as a coordinate point can always be depicted in a Cartesian coordinate system.The x axis represents the wind speed, and the y axis represents the power output.Euclidean distance will be used for representing the distance between a point Qji(vji, pji) and its centroid Mj(mvj, mpj), where j represents group j and i represents point i in group j.The values of mvj and mpj can be calculated as follows: where Nj denotes the number of points in group j.
The root mean square error (RMSE) for group j, designated as RMSEGj, as a criterion function is used for selecting a group that needs to be split The entire clustering error for a different cluster group k, designated as RMSCE, can be calculated as follows where Nm denotes the amount of measured wind data.SKMC is sensitive to the selection of the initial centroids.BKMC can overcome the sensitivity of the selection of the initial centroids, but it does not re-assign the points that belong to the previously split groups.Therefore, an IBKMC combining SKMC and BKMC is presented in this paper.In IBKMC, BKMC is used for splitting a group into two groups and producing two centroids that are used as the initial centroids of SKMC.The clustering processes of IBKMC are described in detail as follows: Step 1) Initially, all points construct a group.Let j=1.
Step 2) Select randomly two points from a selected group as the initial centroids.
Step 3) Calculate the Euclidean distances between all points in the selected group and the two centroids, and assign all points to the closest centroid.
Step 4) Recalculate the centroids of the two groups.
Step 6) Use the j centroids as the initial centroids of SKMC for reassigning all points that belong to the j centroids, and recalculate the centroids of the groups.
Step 7) Go to Step 2) to split the next group with the maximum RMSEGj until j=k.
Step 8) Check all centroids.For any i and j∈(1, 2, … Nj), and i≠j, if mvi=mvj, the centroid of the jth group will be replaced by a closest point to (mvj, mpj), and go to Step 6) until all mvi are unequal.

B. WPC Modelling Based on Interpolation
In mathematics, a polynomial equation of degree n can be expressed as (5) 0 ( ) , where v denotes a variable (it denotes wind speed in this paper); a0, a1, …, an represent the coefficients of the nth degree polynomial equation; and p(v) denotes the functional value of the polynomial equation (it denotes the power output of a WTG with a wind speed v in this paper).
According to the proposed IBKMC from Steps 1) to 8), k clustering centroids (M1(mv1, mp1), M2(mv2, mp2), …, Mk(mvk, mpk)) are obtained.A function that matches the values at the clustering centroids is developed.A (k-1)th degree polynomial equation that passes through all the clustering centroids can be formulated analytically for obtaining the algebraic equation of WPC.Equation ( 5) can be rewritten as follows: In (6), mvj and mpj for any j are known values, whereas a0, a1, …, an are the unknown values.Therefore, in order to obtain the deterministic expression of a polynomial of degree k-1, equation ( 6) needs to be solved for obtaining the values of a0, a1, …, an.Equation ( 6) can be expressed using a matrix as follows: Equation ( 7) can be simplified using Mv×A=Mp.Matrix Mv is a Vandermonde matrix [23][24].Thus, the determinant of matrix Mv can be calculated using (8): Due to mvi≠mvj for any i and j (i≠j).Thus, matrix Mv is invertible.Therefore, A can be calculated using (9 where Mv * denotes the algebraic cofactor of matrix Mv. In this paper, the linear interpolation and quadratic interpolation based on the above analysis is used for modelling the WPCs of a WTG.The cubic spline interpolation introduced in [25] is also used for modelling a WPC.RMSEWPC from ( 10) is used for calculating the error of the different interpolation methods where Pm(v) denotes the measured power output of a WTG having the wind speed v; PWPC(v) represents the power output calculated by the WPC having the wind speed v.

A. Wake Model
As the wake effect has a significant influence on the energy production of a wind farm [26], it should be incorporated into the analysis of a power system with the integration of a wind farm.References [27] and [28] introduced the Jensen wake model used for a flat terrain and the Lissaman model used for a complex terrain.

B. Reliability Analysis
A wind farm is incorporated into a conventional generating system, and an analytic method [29], [30] is used for evaluating the reliability of the combined generating system (CGS).In the wind farm, the wind speed data as the input variable are used for calculating the wind power output using the WPC model proposed in this paper.
In this paper, reliability indices, such as the loss of load probability (LOLP), loss of load expectation (LOLE, hours/year), and expected energy not supplied (EENS, MWh/year), are evaluated.These indices can be expressed as follows: , , .
where Sf denotes a set of all system failure states; pi and Ci indicate the probability and the capacity of failure state i, respectively; T denotes the total time length.An index for describing the energy production of a wind farm, designated as EPWF (MWh/year), is calculated for comparing the accuracy of the different WPC models.The error of EPWF for the different WPC models, designated as EPER, is calculated for directly reflecting the accuracy. , where EPWFm denotes the energy production of the measured data in a real wind farm.EPWF indicates the energy production obtained from a WPC model.

IV. CASE STUDIES
WTGs with a rated power of 850 kW are installed in a real wind farm, and the cut-in speed, rated speed, and cut-out speed of the WTG are 3 m/s, 11 m/s, and 20 m/s, respectively.A nonlinear relationship between the wind speed and the power output of a WTG measured in the real wind farm is illustrated in Fig. 4. A WPC provided by the wind turbine manufacturer, designated as MWPC, and a typical WPC [31], designated as TWPC, are also illustrated in Fig. 4. The power output data in failure states of WTG are removed from data which was used to model WPC.Because of the impact of various factors, the real relationship between the wind speed and the power output of a WTG is not always completely in accordance with the WPC provided by the manufacturer.Therefore, it is essential to model a WPC of the WTG for precisely reflecting the relationship by using the method proposed in this paper.

A. WPC Modelling based on Proposed Method
1) Clustering based on IBKMC.
The measured wind speed and power output data as points in a two-dimensional Euclidean space are clustered into a certain number of groups (k) on the basis of the proposed IBKMC.The percentage of RMSECE shown in Fig. 5 is used for assessing the clustering error.It can be seen from Fig. 5 that RMSECE gradually decreases with an increase in the number of clusters, but it starts to decline slowly at k = 15.
The clustering centroids with k = 15 are shown in Table I, and for a comparative analysis, the centroids are also illustrated in Fig. 6.
It can be seen from Fig. 6 that in the vicinity of the rated wind speed, there is a relatively large difference between MWPC and the clustering centroids.2) WPC modelling based on interpolation method.According to the clustering centroids, linear interpolation, quadratic interpolation, and cubic spline interpolation are used for modelling the WPC of a WTG.The RMSEWPC of different interpolation methods is shown in Fig. 7.
In Fig. 7, RMSEWPC based on these three interpolation methods first decreases rapidly with an increase in k and then tends to stabilize.However, when the value of k is greater than 9, cubic spline interpolation has a greater error than linear interpolation and quadratic interpolation.
In this paper, WPC obtained by linear interpolation is designated as LWPC, whereas that obtained by quadratic interpolation is designated as QWPC.On the basis of the numerical analysis theory, the coefficients of piecewise LWPC and QWPC are obtained, as shown in Table II and Table III, where LB and UB denote the lower bound and the upper bound, respectively.When the wind speed is less than 2.24 m/s, the power output is zero, and when the wind speed is higher than 13.21 m/s, the power output is 850 kW.LWPC and QWPC are illustrated in Fig. 8.

A. Application of WPC Model to Reliability Analysis
The wind farm with 20 identical WTGs is incorporated into the Roy Billinton Test System (RBTS), which has an annual peak load of 185 MW and an installed capacity of 240 MW.The detailed basic reliability data are presented in [32].
In the wind farm, the WTGs are distributed in four rows and five columns.The distance between any two adjacent rows is 400 m, and the distance between two adjacent columns is 300 m.The rotor diameter (2r) is 56 m, and the hub height is 70 m.Force outage rate of a WTG is 0.04.It is assumed that the failed upstream WTG will not produce wake.
In order to compare the accuracy of the proposed WPC model, the measured wind speed data are used for evaluating the reliability of the CGS.As LWPC has a smaller RMSEWPC when k has a relatively large value, it will be analysed in this section.LWPC with k = 15, 20, 30, 40, 50 is used for evaluating the reliability of the CGS and for calculating EPWF and EPER.The results with or without considering the wake effect are shown in Table IV   From Table IV, the indices of LWPC and TWPC can be compared with those of the measured data.It can be seen that LWPC with any value of k given in Table IV has a higher accuracy than TWPC, and EPER gradually decreases with an increase in k.According to the comparison of Table IV and Table V, the wake effect decreases the reliability of the generating system and the wind energy production.

V.CONCLUSIONS
The energy production of a wind farm is affected by various factors.However, some factors might not be expressed using an accurate mathematical model for calculating the wind energy production.Therefore, it is essential to model a wind turbine generator (WTG) according to the measured wind speed and the power output data for including the impacts of various factors.
In this paper, we present an improved bisecting k-means clustering algorithm (IBKMC).IBKMC is used for partitioning the measured data into a certain number of groups that can be expressed using their centroids.The interpolation method based on polynomials is applied to the clustering centroids for modelling the wind power curve (WPC) of a WTG.The linear interpolation method (LIM), quadratic interpolation method (QIM), and cubic spline interpolation method (CSIM) are used for analyses based on the root mean square error (RMSE) in this paper.The WPC model is applied to the reliability analysis of a combined generating system (CGS) with the integration of a wind farm.The CGS with and without considering the wake effect is evaluated for comparing the impact of the wake effect.The results show that the accuracy of the WPC models first decreases rapidly with an increase in the number of clusters and then tends to stabilize.Moreover, LIM has a superior performance in comparison with QIM and CSIM when there are a relatively large number of clusters.

TABLE I .
CLUSTERING CENTROIDS WITH K = 15.

TABLE II .
EQUATION COEFFICIENTS BASED ON LINEAR INTERPOLATION.
and Table V, respectively.

TABLE IV .
RESULTS WITHOUT CONSIDERING WAKE EFFECT.