Modelling Correlated Forecast Error for Wind Power in Probabilistic Load Flow

1Abstract—The deepening penetration of wind power has brought about increasing uncertainty in power grid. In system operation, this uncertainty is mainly attributed to forecast uncertainty, which remains a challenging issue in uncertainty analysis. In this paper, a statistical model based on the mixed skewed distribution is developed to provide a perfect fitting for the conditional wind power forecast error in a single wind farm. The dependence structure of forecast error for multiple wind farms is obtained by pair-copula method, which takes mutual dependence of two arbitrary wind farms into account. The case study on a realistic transmission network in China is presented and different modelling schemes are compared to demonstrate the effectiveness of the proposed model in the application of probabilistic load flow.


I. INTRODUCTION
The great proliferation of wind power generation in power grid has posed severe challenges to system planning and operation.The increased uncertainty due to stochastic nature of wind power is the main obstacle for enhancing the admissibility of wind generation.Forecast system that provides short-term wind power prediction is not so reliable and the accuracy deteriorates dramatically as the horizon increases [1].Probabilistic load flow (PLF) [2] is an important tool in uncertainty analysis and could reveal all possible working conditions in the context of probabilistic theory.A number of methodologies such as Monte Carlo simulation (MCS) [3], cumulant method [4] and point estimate method (PEM) [5] have been developed in PLF studies.However, most of the work was concentrated on advanced algorithms and was on the long-term basis, little attention has been paid to the uncertainty modelling.
To date, few publications have focused on the modelling of forecast error for wind power, which is however, the most general and crucial uncertainty to be considered in PLF studies.In [6], wind power forecast error is assumed to follow a normal or near-normal distribution.This assumption has been proved to work perfectly well for geographically Manuscript received 15  dispersed wind farms in [7].However, Bludszuweit et al. [8] challenged the normally distributed model as it failed to capture variable kurtosis and skewness of the error and proposed the beta distribution.Hybrid approaches that combine some typical distributions have also been reported in [7], [9].
The main contribution of this work is to solve PLF problem from a novel perspective of forecast error modelling, it includes: 1) Developing a mixed skewness model (MSM) that is flexible enough to capture the biased and long-tailed conditional forecast error for wind power.2) Constructing intricate dependence structure of high-dimensional forecast error among multiple wind farms using pair-copula.3) Investigating the impact of different modelling schemes on the results of PLF simulation.
The remainder of the paper is organized as follows.Section II describes the characteristics of short-term wind power forecast error.Section III introduces the concept of MSM and explains how it can be used to generate forecast error samples.Section IV employs the pair-copula theory in dependence modelling of correlated forecast errors among multiple wind farms.Statistical test on the model and a case study for PLF simulation in real transmission systems are shown in Section V. Section VI concludes the paper with a discussion of future work.

II. CHARACTERISTICS OF SHORT-TERM WIND POWER FORECAST ERROR
For the sake of universality of our study, we chose the datasets from several onshore wind farms located in the east coast of China.The day-ahead forecasts (24 h) of wind power were produced using the NWP-based method at a temporal resolution of 15 min, which is most widely used in short-term wind power prediction.
Figure 1 illustrates the histogram of an example of short-term wind power forecast error for a single wind farm.All the data are standardized in per unit with the installed capacity as the base value.It is obvious that the error distribution has two main characteristics, which remains true to other wind farms through statistical analysis: 1. Biased -The mean value of the distribution is non-zero (equal to 0.04) and the entire figure is shifted to the right with its skewness being 0.43.The forecast errors are also found to be conditional upon their forecasts, presenting different distribution characters.Practically, we may divide the forecast results into different power bins (i.e., [0, 0.05], [0.05, 0.1], [0.95, 1]) and then sort the errors into their corresponding bins.As shown in Fig. 2, the error distributions in four continuous forecast bins (range from [0.4, 0.6] p.u.) have differences in either shape or position.Besides the aforementioned two characteristics of bias and long-tail, multimodality may occur in some specific bins, which is the case in Fig. 2(d), posing great challenges to any typical fitting distributions.

III. MODELLING FORECAST ERROR OF A SINGLE WIND FARM USING MIXED SKEWNESS MODEL
It is not trivial to fit the conditional forecast error (shown in Fig. 2) using one single distribution.In this section, we resort to a novel mixed model, i.e.MSM that could capture the characteristics of the associated error for a single wind farm without imposing much effort on parameter estimation and model post-processing.
A MSM can be viewed as a combination of several skewed distribution components, whose PDF, for one-dimensional case, can be defined as ; , , , where ωi is the weight of the ith component of MSM.μ, σ and λ denote the parameters that reflect the location, scale and skewness of the distribution, respectively.fSN ( . ) is the skewed (normal) distribution, defined as where φ( .) and Φ( .) are PDF and CDF of the standard normal distribution, respectively.Particularly, when μ = 0, σ = 1, λ = 0, x is a standard normal random variable, denoted as x ~N (0,1).
The proposed MSM is more flexible and could better represent the asymmetry, heavy-tail and multimodality of the short-term wind power forecast error, compared with other commonly used distributions [10].Figure 3 presents the PDF of a nonstandard random variable modelled by MSM with three components.Theoretically, with n approaches infinite, MSM can inerrably fit any atypical distributions.However, this could be time-consuming and cause the intractability.Usually, n = 2 or n = 3 can meet the requirements of engineering purposes in terms of accuracy.The estimation of parameters ωi, μi, σi 2 and λi of each component are calculated iteratively by applying the expectation maximization (EM) algorithm [11], which is the one used for large number of measured datasets.Once all parameters are determined, the analytical form of wind power forecast error is obtained.Then we will employ the scenario generation technique in [12] to yield a set of scenarios that follow the MSM for late simulations.The scenario generating process is explicitly shown in Fig. 4.
 Fig. 4. Random number generation of the mixed skew distribution.

IV. MODELLING FORECAST ERROR OF MULTIPLE WIND FARMS USING PAIR-COPULA
The wind power among different wind farms in geographically neighbouring areas is considered to be mutually correlated, and so are the forecast errors.To expand our model to multiple wind farms, the structure of dependence between forecast errors needs to be handled with attention.
Copula theory [13] could provide an effective way of modelling stochastic dependence.According to Sklar theorem [14], there exists a Copula function C that can be written as ( ), ( ), , ( ) , , , , In most cases, bivariate dependence can be accurately modelled by some typical copulas, such as Normal-copula, t-copula and the Archimedean-copula family.However, with the increase of random variables, the high dimensional copula function in (3) fails to capture the mutual dependence between two arbitrary variables.To overcome the constraint of just using one type of copulas, we employ the pair-copula method [15] to model forecast errors considering the correlation among multiple wind farms.
Finally, the main steps of generating samples of correlated forecast errors in multiple wind farms can be summarized as Step 1: Obtain the marginal distributions of ε1, ε2, …, εm in m wind farms by MSM, and set u1 = F(ε1), u2 = F(ε2), … , um = F(εm).
Step 2: Select the suitable pair-copula in each layer and perform the parameter estimation.
Step 4: For each sampling point of zi, calculate the uniform random variable ui by iteratively using the (6).
Step 5: Obtain the corresponding forecast error samples by inverse marginal transformation εi = Fi -1 (ui), which serve as the inputs of subsequent PLF analysis.
It is noted that no analytical form can be found for the inverse marginal CDF of forecast error using MSM.Hence, the empirical CDF Fe (ε) based on the discrete data generated through Fig. 4 is obtained.For more information on the empirical CDF, one may refer to [17].

A. Source of the Data
Here, we demonstrate the value of modelling the correlated wind power forecast errors in probabilistic load flow with a real regional power system.As shown in Fig. 6, six wind farms along the east coast of Fujian, China were chosen in this paper.The wind power outputs and their day-ahead forecasts in 2015 were obtained with 1-h resolution.The real transmission system in the same area was extracted from part of the Fujian power grid, which includes three voltage levels (500 kV, 220 kV and 33 kV).The topology of the network with 34 buses is shown in Fig. 7 with six wind farms connected to bus 12, 18, 19, 22, 25 and 33.The load data at each bus is well monitored and its standard deviation (STD) is set 5 % of the mean value.It is apparent that MSM offers a better fit for the biased and long-tailed forecast error histograms.To show this numerically, chi-square test (χ 2 -statistics) is applied to measure the goodness-of-fit for various distributions, as displayed in Table I.The lowest value of MSM confirms it as the best fit of all.The performance of the pair-copula method in modelling high-dimensional forecast errors among six wind farms was also tested.Three typical copulas (Normal-copula, t-copula and Gumbel-copula) were chosen to fit the dependence and the goodness-of-fit was measured using two indices (χ 2 -statistics and the Euclidean distance de).As shown in Table II, the pair-copula has the smallest values for both χ 2 and de, showing the superiority in multivariate dependence modelling as compared with the single copulas.
Though the Euclidean distance of Gumbel-copula is a bit shorter than that of the Normal and t-copula, it takes much more time in parametric fitting.Hence, to reduce the computational burden, we would simply choose the Normal-copula in the pair-copula construction.

C. PLF Analysis of the Test System
This section studies the impacts of the proposed forecast error model for wind power on PLF analysis.The PLF simulation is based on MCS with 10000 samples.An improvement could be made by applying more advanced sampling techniques, however, this study is out of the scope of this paper.The programs were implemented with Matlab 2014b on a PC with Intel core i5 3.0 GHz and 3 GB of RAM.
To compare the results obtained by different modelling methods, four subcases are defined and described as follows: Base case: The exact measurement data of wind power outputs for six wind farms are applied to PLF calculation at a certain time spot, and the load uncertainty is modeled with 5 % STD.
Case 1: The output samples for each wind farm are obtained directly from the typical wind power model (using combined Weibull wind speed and the power conversion curve [18]), and the load uncertainty is set as before with 5 % STD.
Case 2: It is assumed that forecast errors for different wind farms are uncorrelated.The wind power samples are generated by adding forecast errors that follow the MSM to their forecast at a certain time spot, and the load uncertainty is set with 5 % STD.
Case 3: The same as case 2, except that the dependence of forecast errors among different wind farms is modelled with pair-copula.
The relative error indices introduced in [19] are used to demonstrate the accuracy of the results, defined as: 100%, where 1, 2,3 i  superscript γ refers to the type of output variables (Voltage magnitude V, angle θ, line active power P or reactive power Q); The simulation results of the base case are regarded as the reference, and the mean and STD are expressed as μB and σB, respectively.Similarly, μCi and σCi are the mean and STD obtained from the case i.
As there are more than one output variables of each type, we use both the average and maximal error indices to evaluate the performance of different cases accurately.The corresponding results of all the output variables are elaborately displayed in Table III.It can be seen that there is little difference in the expectation value for the three cases, while it is not true in terms of STD.The relative errors of STD in case 1 are the largest of all, demonstrating that traditional model of the wind output is outperformed by our error-based model in PLF computation due to lack of considering the forecast information.Moreover, the error indices of STD in case 3 is comparatively less than that obtained by case 2, which substantiates the importance of taking into account multivariate dependence among multiple wind farms in the modelling.The PDF curves of case 3 in both Fig. 9(a) and Fig. 9(b) are most close to the base case, which are consistent with the previous error analysis.The large deviations of the PDF in case 1 from the actual distribution (base case) would affect the system risk assessment, resulting in highly conservative decisions with high operational costs.

VI. DISCUSSION
The variability of the wind power generation is well studied by characterizing the forecast error instead of directly modelling the actual output.The biggest advantage of the proposed model is that it is versatile for any error distribution and spatial dependency for both a single and clustering wind farms.However, the performance of our methodology can be degraded given the low sampling rate of forecasting system or insufficient measurement data, hence adding pseudo measurements through linear interpolation [20] may be necessary.
There is always a trade-off between the model accuracy and execution time that e.g., the higher the number of skewed normal components, the better the approximation of MSM, which in turn leads to higher number of parameters to estimate.In that way, Akaike's Information Criterion (AIC) in [19] can be employed to determine the appropriate number of components for our model.Also, Gaussian copula is sometimes preferred in terms of mutual dependence modelling, even if it is not the optimal selection for its flexibility and for the sake of time saving.
To the best of our knowledge, this is the first work studying the PLF from an error modelling perspective.While several limitations still exist in the current work.Firstly, some future work about the bias correction may help guarantee a non-zero mean of forecast errors and eliminate the seasonal trends.Secondly, a time-dependent model for wind power forecast error that incorporates autocorrelation needs further investigation, because the error at a certain time step is dependent on the error in the previous and subsequent time steps.Finally, it is worth developing better sampling techniques to improve the computational efficiency.

VII. CONCLUSIONS
This paper proposes a comprehensive model for wind power forecast error and shows how it improves the PLF analysis.The PDF of forecast errors in a single wind farm is conditional to their forecasts and is modelled by MSM.The spatial correlation of forecast errors among adjacent wind farms is obtained from the pair-copula method which takes the mutual dependence into account.Statistical tests show the effectiveness of the proposed model and several important remarks are as follows: 1.The MSM used shows better performance than any other typical distributions in capturing the characteristics of conditional forecast error for a single wind farm due to its flexibility and versatility.2. Our error-based model in the application of short-term PLF studies is superior to the traditional model of wind output by utilizing the information of forecasts, and the impacts of high-dimensional dependence of forecast error among adjacent wind farms on PLF results are pronounced.
3. The dependence structure of forecast error among different wind farms can be constructed more precisely by using pair-copula method than those by using multivariate copulas, and the enhancement is nontrivial in PLF computation.
It should be noted that the proposed modelling method can also be extended to other uncertainty problems of power systems such as stochastic unit commitment (SUC) and probabilistic optimal power flow (POPF).

Fig. 6 .
Fig. 6.Sites used for wind power data in this study.

Fig. 7 .Fig. 8 .
Fig. 7. Network topology of the real transmission system.B.Model ValidationTo verify the effectiveness of the proposed MSM, two other distributions (normal and Laplace) were compared by fitting the forecast error histograms for wind farm B in specific bins.The fitted distributions along with the MSM in bins 6 and 11 are shown in Fig.8.

Fig. 9 .
PDF comparison of bus voltage and line flow in the four cases: (a) Voltage magnitude of bus 21; (b) Real power of line 20-21.
March, 2017; accepted 6 June, 2017.This research was funded by a grant (No. 2242016K41064) from the Central Universities of China.This research was performed in cooperation with State Grid Electric Power Research Institute of China.

TABLE I .
CHI-SQUARE GOODNESS-OF-FIT FOR VARIOUS DISTRIBUTIONS.

TABLE II .
FITTING TEST OF DEPENDENCE MODELLING USING COPULAS.

TABLE III .
ERROR COMPARISONS OF THE REAL TRANSMISSION SYSTEM (%).