Estimating the Distributed Generation Unit Sizing and Its Effects on the Distribution System by Using Machine Learning Methods

Many approaches about the planning and operation of power systems, such as network reconfiguration and distributed generation (DG), have been proposed to overcome the challenges caused by the increase in electricity consumption. Besides the positive effects on the grid, contributions on environmental pollution and other advantages, the rapid developments in renewable energy technologies have made the DG resources an important issue, however, improper DG allocation may result in network damages. A lot of studies have been practised with analytical and heuristic methods based on load flow for optimal DG integration to the network. This novel method based on estimation is proposed to determine the size of DG and its effects on the network to get rid of the coercive and timeconsuming load flow techniques. Machine learning algorithms, such as Linear Regression, Artificial Neural Network, Support Vector Regression, K-Nearest Neighbor, and Decision Tree, have been used for the estimations and have been applied to well-known test systems, such as IEEE 12-bus, 33-bus, and 69bus distribution systems. The accuracy of the proposed estimation methods has been verified with R-squared and mean absolute percentage error. Results show that the proposed DG allocation method is effective, applicable, and flexible.


I. INTRODUCTION
Since installing new central power plants and transmission lines to meet the increase in electricity consumption requires high cost, it is recommended to integrate smaller production units close to consumption areas. The small powerful generating units are known as distributed generation (DG). Interest in DG has increased thanks to various benefits, such as reducing system losses, improving the voltage profile, reducing pollutant emissions, and increasing system reliability. Incorrect DG allocation to the distribution system (DS) does not benefit, but on the contrary, it hurts the power system. To ensure effective DG allocation, various optimization studies have been reported in the literature. Since a predictive method is used in this Manuscript received 3 January, 2021; accepted 2 April, 2021. study, literature studies on estimations related to power system also follow.
In [1], DG placement and network reconfiguration by using heuristic algorithms, such as genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE), and artificial bee colony (ABC), have been carried out to decrease losses and increase system efficiency. In [2], to minimize the annual energy losses, a technique based on mixed integer nonlinear programming (MNLP) for various renewable DG placements has been developed and applied on rural DS. Size of DG units, feeder capacity, voltage limits, and penetration limits have been chosen as the constraints of the problem. In [3], it has been aimed to integrate DG with minimum power as possible and minimize active power losses by using various PSO and DE algorithms for IEEE 33-bus and 69-bus radial DSs. In [4], DGs have been placed in the weakest buses to reduce system losses and improve voltage magnitude and stability. Whereas sensitivity indices and quadratic curve fitting technique have been used for single DG allocation, loss enhancement index and power loss reduction index have been used for multiple DGs. In [5], whale optimization algorithm has been used to determine the optimal size and location of four different renewable DG types in four distribution systems, such as IEEE 15-bus, 33-bus, 69-bus, and 85-bus test systems. Thus, the authors have ensured reducing the system losses, enhancing the voltage magnitude, and increasing reliability. In addition, the results obtained have been compared with other studies in the literature. In [6], a bacterial foraging optimization algorithm has been used to find the optimum DG size. Whereas minimizing network losses and operating costs and increasing the voltage stability have been determined as the objective function, the current carrying capacities of the lines have been as a constraint. In [7], three analytical approaches have been proposed to determine the best location, size, and power factor of DG unit considering energy loss minimization. In [8], an analytical method has been used to find the sizing and sitting of DGs in balanced radial DS for ensuring minimum network loss. In [9], the optimal location and size of solar photovoltaic DG have been found by using biogeography-based optimization to minimize losses considering the voltage profile and voltage harmonic distortion limits. In [10], an analytical method for the allocation of DGs in radial DS has been introduced to reduce active and reactive power losses. In [11], a hybrid method based on tabu search and GA has been proposed to integrate DG and capacitor banks to DS for enhancing system performance. In [12], a modified jaya algorithmbased multi-objective function for allocation of DG with high penetration has been produced to reduce active losses and improve the voltage profile and has been carried out on standard IEEE 33-bus DS. In [13], an optimization technique based on hybrid shuffled frog leap teaching and learning algorithm has been proposed and applied for optimum location and size of electric vehicle fast charging stations and DG units to DS. In [14], fault current limiters (FCLs) in series with DGs have been used to minimize the negative effects of DG allocation to reduce power system losses. Nondominated sorting GAs have been used to determine DG location, DG size, and FCLs size and performed on two test systems. In [15], an optimization technique based on PSO for determining the size of a solar photovoltaic system (SPV) has been proposed to minimize the cost of SPV integrated to grid. In [16], DG placement optimization problem has been solved in the practical test system of Korea using the optimal locator index to determine DG location and using Kalman Filter to determine DG size. In [17], the ant colony system algorithm has been used to increase system reliability by allocating DG and reclosers. In [18], PSO has been used to allocate DG for increasing the loadability of distribution systems. In [19], to minimize network losses and maximize voltage magnitude, the optimal DG placement has been performed by using the cuckoo search algorithm. Harmony search algorithm [20], an analytical approach [21], ordinal optimization [22], and MINLP [23] have been used to allocate DG in DS.
In [24], a multiple linear regression (LR)-based methodology has been proposed for long-term load estimation using hourly resolution load data of 57 countries. In [25], by using meteorological data of Muzaffarabad city in Pakistan, electricity consumption has been estimated with the help of traditional machine learning (ML) algorithms. In [26], to estimate the electricity load of Jeju island, a GAoptimized approach consisting of ML algorithms, such as support vector regression (SVR), k-nearest neighbor (KNN), and XGBoost, has been proposed. In [27], the authors have aimed to increase the performance of intraday load forecasts by grouping according to customer behavior similarities and using smart meter data. In [28], in addition to short-term load and generation estimates for the Slovenia power system, active power losses have been estimated by using fuzzy logic decision. In [29], the power consumptions of the state of Maine, the region of New England, Singapore, and New South Wales of Australia have been estimated in the short term by using the method based on the second decision mechanism and cross multimodel. In [30], KNN has been used for short-term load forecasting (STLF) on Smart City Demo Aspern buildings. In [31], a locally weighed SVR has been used to forecast two real world electric loads. In [32], the fuzzy LR method has been used to predict electricity loads of holidays in the short term. In [33], multivariate LR and feed-forward neural network (FFNN) have been suggested for load prediction of Delhi, India. In [34], a knowledge-based expert system has been implemented for annual load forecasting. In [35], recurrent artificial NN has been used to forecast mid-term daily peak load. In [36], a hybrid model for load forecasting has been created using data preprocessing technology, individual forecast algorithm, and weight determination theory. A method based on wavelet decomposition and quadratic gray NN combined with the enhanced Dickey-Fuller test [37], dynamic model selection based on Q-learning [38], boosting based multiple kernel learning method [39], a hybrid method consisting of convolutional NN and long short-term memory based deep learning (DL) [40], a DL method [41], and a hybrid model consisting of clustering and FFNN [42] have been proposed for STLF.
In [43], regression-based analysis has been used to estimate the bus voltages of IEEE 12-bus DS. In [44], wind speed estimation has been made for Batman province in Turkey by using ANN.
Works in [1]- [23] are studies on DG allocation, and the works in [24]- [44] are on predictions related to the power system. As can be seen from the literature review, the load is mostly accepted as constant and DG output is controllable when allocating DG. In practice, the loads and DG output constantly vary. Calculating losses and other factors using power flow-based algorithms is difficult and timeconsuming.
The contribution of this study is the estimation of DG size, network active losses, reactive losses, and minimum bus voltages without of power flow calculations. Machine learning algorithms, such as Linear Regression (LR), Artificial Neural Network (ANN), Support Vector Regression (SVR), K-Nearest Neighbor (KNN), and Decision Tree (DT), are used for five estimation cases. The cases are implied to IEEE 12-bus, 33-bus, and 69-bus standard test systems and the obtained results are compared.
Remaining of the paper is organized as follows. The methods used for estimation are summarized in Section II. Estimation error and performance evaluation methods are explained in Section III. In Section IV, brief information is given about the data collection and the programs used in this study. Section V presents the results of five cases created for estimations. Results are discussed in Section VI. Finally, the conclusions are located in Section VII.

II. METHODS USED FOR THE PROPOSED APPROACH
Machine learning is the computer modelling of systems that make predictions by making inferences from data with mathematical and statistical operations. It has become more popular as computers have become more powerful in recent years. In this paper, machine learning algorithms are proposed for the estimation process and they are as follows.
A. Linear Regression LR is one of the prediction methods that can determine the relationship between two or more variables that have a cause-effect relationship and make predictions from the future unknown about that subject by using this relationship.
In this method, a mathematical model is used to explain the relationship between two or more variables and this model is called "linear regression model". There are two types of regression models. These are simple LR with one independent variable and multivariate LR analysis with more than one independent variable. The linear equation can be written as where y is the dependent variable (estimated value), w i is the coefficient of weight, x i is the independent variable, and ε is the error value. In this study, the independent variables are factors, such as load change, DG size, and location, and the dependent variable is system losses affected by these factors.

B. Artificial Neural Network
ANN, first modelled in 1943 by neurophysiologist Warren McCulloch and mathematician Walter Pitts, is one of the artificial intelligence algorithms and has a wide range of uses [45]. Since it is modelled by imitating the human brain, ANN has the features of generating, forming, and interpreting new information, as well as learning information. It is a good classifier algorithm that gives successful results in the analysis of repeatedly measured data sets. ANN has varieties, such as feed-forward, back propagation, single layer, and multilayer. Figure 1 shows a multilayer network structure with three inputs. Although ANN has many advantages, there are also some disadvantages of them. These drawbacks can be given as follows [46]:  It is not possible to know what is in the system;  Stability analysis is problematic, except for some networks;  It can be difficult to apply to different systems.

C. Support Vector Regression
SVR was modelled by Drucker, Burges, Kaufman, Smola, and Vapnik using support vector machines [47]. It is a kernel-based machine learning algorithm used for classification and regression. This method has much better performance and ability to solve nonlinear problems compared to other traditional learning methods.
Examples of linear and nonlinear SVR are shown in Fig.  2 and Fig. 3, respectively [48]. The error tolerance is zero for the data inside the area defined as ε. Notably, the variable ξ represents the fault tolerance outside the sensitive area and is also referred to as training error in the literature. In the region within the sensitive area, the value of ξ is zero. The fault tolerance can be expressed as in (2) ,, ( , ) 0, .
x y x y L x y others If it is within the defined range, the error is equal to zero. The term C is a constant that provides the balance between the experimental error expressed as the cost and the weight vector. It is important to determine these values for high accuracy support vector regression modelling.

D. K-Nearest Neighbor
KNN, first introduced by Fix and Hodges in 1951, is the learning algorithm that finds the closest neighbor among the variables [49]. KNN is also known as the lazy student in the literature and it is often preferred because of its high performance in very wide areas of use. The most important point here is the distance between data points and the k value. K is an important parameter used to determine distances and is chosen as an odd number. Manhattan, Minkowski, Mahalanobis and Euclidean distance measures are generally used to calculate the distance between data [50]. Figure 4 shows a schematic diagram of KNN. When k is taken as 3, the sample classification result is prismatic. When k = 7 and k = 11, the results are triangle and prismatic, respectively. It is clear from Fig. 4 that the classification results change significantly when k takes different values. Depending on the distance calculation method used, different nearest neighbors can be found, which affects the classification results [51].

E. Decision Tree
DT is a classification algorithm in the form of a tree structure consisting of leaves, branches, and root nodes. DT works similarly to other ML algorithms. The algorithm generates decision trees using information gain and entropy.
A decision tree classification example can be seen in Fig.  5. In this example, the decision to go out is classified as yes or no, depending on the weather. Outlook is defined as the root node. Weather probabilities are primarily classified as sunny, overcast, and rainy, and these are branch node values. The second branch node values are the variables of humidity and wind. In cases the weather is sunny and the humidity is high, or when it is rainy and the wind is strong, the decision to go out is determined as no, and in other cases, it is classified as yes.

A. R-Squared
Actual values are used to test the accuracy of the estimation models. An R-squared error-based analysis is used to predict the performance and reliability of the model [24]. The R-squared error is calculated as follows where y i is the actual value, ŷ i is the predicted value, y i is the mean of the actual values, and N is the number of samples. R-squared takes values between zero and one (0 ≤ R 2 ≤ 1).

B. Mean Absolute Percentage Error (MAPE)
The ratio between absolute prediction errors and the real values is defined as MAPE. It is used to measure the performance of predictions and evaluate the results [27]. MAPE is calculated as follows

IV. DATA COLLECTION AND USED PROGRAMS
In this study, a data consisting of normalized load variation, DG power injections, DG locations, active power losses, reactive power losses, and minimum busbar voltages have been used for the three test systems, such as IEEE 12bus, 33-bus, and 69-bus distribution systems. The normalized load variation data changing according to season and time of day is given in [2] and it is shown graphically in Fig. 6. These data are provided on the Github repository [52]. Developed prediction models in this study have been created by WEKA, which consist of the initial of Waikato Environment for Knowledge Analysis. It is a free data mining software that includes many machine learning algorithms. While making predictions, 75 % of the data have been used for training and 25 % -for testing. The algorithms selected in WEKA for ML methods are given in Table I. Predictions have been made by optimizing the necessary parameters of ML methods. Related graphics and error performance analysis have been obtained as a result of processing the outputs of WEKA through the MATLAB software.

V. RESULTS OF PREDICTION MODELS
A. Case 1 In this case, DG sizing is estimated using normalized load variation (NLV). The five estimation methods mentioned in Section II are applied for 33-bus and 69-bus DSs.
The regression model equations of LR for DG sizing estimation are given in Table II. Estimation results and actual values (AV) are shown in Fig. 7 for 33-bus and in Fig. 8 for 69-bus DSs. Calculated error and performance evaluations are compared in Table III.

B. Case 2
In this case, the active power losses in 33-bus and 69-bus DSs are estimated using NLV through all mentioned methods.
The regression model equations of LR are given in Table  IV. Relevant forecast values and actual values are shown in Fig. 9 and Fig. 10 for 33-bus and 69-bus DSs, respectively.
Comparison of calculated evaluation values are given in Table V.     Fig. 11 for 33-bus and in Fig. 12 for 69-bus distribution system.

D. Case 4
In this case, the minimum or in other words worst busbar voltages are forecasted using NLV for 33-bus and 69-bus DSs. The regression model equations of LR are given in Table VIII for minimum voltage estimation. Estimation and actual values are shown in Fig. 13 and Fig. 14

E. Case 5
In this case, the active power losses are estimated using normalized load variation, DG sizing, and location. Except for LR, the other four methods are applied on 12-bus, 33bus, and 69-bus DSs. LR is not used in this case because it gives large errors well above the acceptable limit.
Results estimated by the four methods and actual values are shown in Fig. 15 for 12-bus, in Fig. 16 for 33-bus, and in  Table X. Fig. 15. Comparison of actual value and active power loss predictions using the normalized load variation, DG sizing, and DG location for 12-bus distribution system.

VI. DISCUSSION
The following inferences can be made by looking at all tables and figures in Section V. In case 1, the estimates made with all proposed methods are very close to the real values and the best results (0.756 % and 0.9695 %) are obtained by SVR. In the second and third cases, while the predictions made with the other four methods except LR are very successful, LR gives high MAPE (25 %) and low Rsquared (0.6) value and SVR gives the best results (  0.2 %). In Section IV, all methods make successful predictions with nearly zero (< 0.05 %) errors and the best (0.001 %) is SVR. In case 5, the best results (< 1.5 %) are obtained by KNN.

VII. CONCLUSIONS
In this paper, a new DG allocation approach based on estimation is proposed. In the first four cases, DG size, active losses, reactive losses, and worst voltages are estimated using the normalized load variation (single input), while in the last case, only active losses using normalized load level, DG location, and DG size (three inputs) are estimated. Machine learning algorithms, such as LR, ANN, SVR, KNN on WEKA, and DT, for single input predictions are applied on IEEE 33-bus and 69-bus test systems. Mentioned algorithms, except LR for the last case, are applied on 12-bus, 33-bus, and 69-bus distribution systems. SVR gives the best results for single input estimations (cases 1-4), whereas the best results are obtained with KNN for multi-input predictions. Whereas SVR can be used to obtain the best predicted output, such as DG size, system losses, and minimum voltages, in the systems before DG integration, KNN can be used for estimation after DG integration. If there is no linear relationship between the input and output data, it is not appropriate to make predictions with LR. The results of the analysis demonstrate that the proposed approach can be adequate to determine DG size and its effects on DS.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.