• International Journal of Technology (IJTech)
  • Vol 13, No 6 (2022)

Anomaly Prediction in Electricity Consumption Using a Combination of Machine Learning Techniques

Anomaly Prediction in Electricity Consumption Using a Combination of Machine Learning Techniques

Title: Anomaly Prediction in Electricity Consumption Using a Combination of Machine Learning Techniques
Rawan ELhadad, Yi-Fei Tan, Wooi-Nee Tan

Corresponding email:


Cite this article as:
EL-Hadad, R., Tan, Y.-F, Tan, W.-N, 2022. Anomaly Prediction in Electricity Consumption Using a Combination of Machine Learning Techniques. International Journal of Technology. Volume 13(6), pp. 1317-1325

85
Downloads
Rawan ELhadad Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Yi-Fei Tan Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Wooi-Nee Tan Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Email to Corresponding Author

Abstract
Anomaly Prediction in Electricity Consumption Using a Combination of Machine Learning Techniques

Electricity demand is increasing proportionally to the increase in power usage. Without a doubt, energy efficiency has gained significant importance and attention, with one of the primary concerns being the detection and forecasting of abnormal consumption.  In this paper, the authors proposed a method to predict the occurrence of abnormal consumption behavior in advance. The proposed method utilizes the Isolation Forest algorithm to label the smart meter electricity consumption readings as normal or abnormal. It generates a sequence of data with varying lengths. Based on the data sequence, two supervised machine learning algorithms, Random Forest, and Decision Tree were developed to forecast the occurrence of power anomaly consumption. Experiment results showed that the proposed methods consistently detect and predict the abnormal status 30 minutes ahead.  There is no significant difference between Random Forest and Decision Tree performance on different smart meter readings, dataset sizes, and other data sequence lengths. The proposed methods portray an alternative approach that is capable of auto-label normal and abnormal data and, as a result, dealing with the sequence of label data in the prediction process while avoiding the dynamic behavior of the power consumption data.

Decision Tree; Isolation Forest; Power consumption anomaly; Prediction; Random Forest

Introduction

Power consumption is expected to rise sharply in emerging economies. According to World Energy Outlook (2019): Electricity, the total energy consumption will increase from 19% in 2018 to 24% in 2040, resulting in a 2.1% annual increase in global electricity demand until 2040. As fossil fuels are burned to generate energy, increasing power consumption demand may contribute to global warming. Furthermore, according to the report, power theft accounts for 50% of all generated energy in developing countries.  Any excessive use of electricity should be avoided. Moreover, increasing the energy efficiency of existing energy systems and lowering harmful emissions into the atmosphere is necessary to transition from fossil fuels to "green" energy (Brazovskaia & Gutman, 2021).
 As a result, it is the responsibility of all parties to consume electricity efficiently.  Among the energy efficiency challenges that must be overcome to reduce power consumption are analyzing, detecting, and forecasting abnormal power consumption. When consumers are alerted to unusual consumption, they can take appropriate action immediately, thereby helping to reduce their electricity bills.

Smart sensors and sub-meters installed in residential buildings create vast amounts of data daily.  If these data are utilized correctly, they may be able to assist end-users, energy suppliers, and utility corporations in recognizing, and explaining unusual power consumption. Therefore, anomaly detection could prevent minor issues from turning into major issues. It will also help with better decision-making, decreasing energy waste, and promoting sustainable and energy-efficient behavior (Himeur et al., 2021b).

Anomaly detection, as the name implies, is a technique for detecting data that is out of the ordinary. Anomalies in data are events that do not follow the expected pattern of behavior.  In reality, anomalies are classified into three types: point anomalies, contextual anomalies, and collective anomalies.  A point anomaly occurs when a single point in the data is excessively high or low compared to other points. A contextual anomaly occurs when a data instance is regarded as abnormal in one context but normal in another. Whereas a collective anomaly happens when a group of linked data instances is anomalous relative to the entire data set, where individual data instances are not anomalies on their own, but their presence as a group is (Chandola et al., 2009).

Anomaly detection in power consumption can be applied to various domains. One of them involves non-technical loss. Villar-Rodriguez et al. (2017) established a method for detecting smart-meter failures utilizing load curve technique profiling and time series analysis. The dataset was obtained from a Spanish utility. Various distance-based learning algorithms and an inner classification technique were applied, such as the Local Outlier Factor and Least Square Anomaly detection. To group non-technical losses, the authors used a clustering technique. While Ford et al. (2014) suggested a fraud detection framework on a smart grid using artificial neural networks. The dataset in this research was gathered from 5000 residential and 600 business buildings from 2009 to 2011. The authors claimed that the suggested model outperformed existing models in the same area of research at that time.

Detecting unusual power usage in domestic appliances is critical for grid optimization and reducing undesired electrical absorptions in residential structures. Abnormal power usage in home appliances can cause severe chaos. According to Krishnan et al. (2019), none of the items on the market could provide complete protection from utility risks, and the power protection systems in single-phase system homes are not entirely hazard-proof. This is evidenced by the continuing occurrences of fatal electrical shock injuries and fatalities, as well as the ongoing fires caused by electrical hazards. Thus, anomaly feedback can alert customers and assist them in detecting faulty equipment or devices that consume more energy than required or necessary.   Several data mining and statistical methodologies have been used to uncover data patterns to find anomalies in energy use.  Most of these systems recommend detecting irregularities in an offline mode due to the high volume of data involved (Zhang et al., 2021). Nevertheless, Bhattacharya and Sinha (2017) proposed intelligent fault analysis in the electrical power grid to detect any grid faults. In this project, two classifiers were built:  Support Vector Machine (SVM) and Long Short-Term Memory (LSTM). The authors were able to predict the maximum level voltage using historical fault data and allocating the location of fault occurrence with the help of the proposed model. Meanwhile, Castangia et al. (2021) employed an anomaly detection framework to track the hourly energy usage of three common power absorption sources: the baseline, the refrigerator, and electrical gadgets. As the focus of their research, they concentrated on single-point deviations and aberrant patterns. By analyzing single-point variances, they could identify short-term power surges caused by unplanned electrical faults or abrupt changes in end-user routines.

Another application of anomaly detection is identifying unusual consumption patterns and thus reducing excessive power consumption. It is a crucial step in developing efficient energy-saving systems that reduce total power consumption and carbon emissions.  In this context, a group of researchers (Himeur et al., 2021a) proposed two novel schemes, one of which was an unsupervised learning approach and the other a supervised learning approach.  The first scheme was an Unsupervised Abnormality Detection based on One-Class Support Vector Machine (UAD-OCSVM) to detect abnormal consumption data.   The second scheme was a Supervised Abnormality Detection based on Micro-moments (SADM2) and an enhanced K-nearest neighbors’ model.  The results demonstrated that SADM2 outperforms other machine learning algorithms in anomaly detection and real-time processing capabilities at a much lower computational cost. On the other hand, Yuan et al. (2020) used the Breadth-First Search (BFS) mechanism and threshold to detect anomalies in power consumption.  Their proposed method could achieve high detection accuracy over a small number of smart meters.  Himeur et al. (2020) proposed a new model for detecting abnormal energy consumption by extracting micro-moment features using a rule-based model and deploying a Deep Neural Network (DNN) to enhance the efficiency of classifying and detecting anomalies.

Many researchers deploy LSTM to forecast electric consumption because it is powerful, flexible, and can deal with complex multi-dimensional time-series data (Alraddadi & Othman 2022).  Chahla et al. (2020) provided a unique unsupervised method for detecting power consumption abnormalities where they deployed the K-means algorithm to group the data along with LSTM to forecast the electric consumption in the next 24 hours. Moreover, Fenza et al. (2019) also deployed LSTM and K-means to propose a drift-aware algorithm for detecting anomalies in smart grids. The authors utilized deep learning approaches, showing that the proposed algorithms could accurately predict the anomalies.

In this research, we would like to develop a model that can detect anomalies in power consumption patterns in advance by using a combination of machine learning algorithms.  Most existing projects in the same field of study are solely concerned with predicting power consumption or forecasting the consumption value before applying an algorithm to detect anomalies.  As is well known, the dynamic nature of power consumption data complicates data analysis. This project aims to avoid dynamic behavior by transforming power consumption data into a sequence of labeled data, which is then used in the machine learning algorithm-based prediction process.  The proposed work can predict the anomaly power consumption one unit of time in advance, depending on the frequency of smart meter readings. One of the challenges is labeling the power consumption data into abnormal or normal. Furthermore, the way the data is prepared for machine learning algorithms to predict the anomaly. The paper is organized as follows: The second section is dedicated to methodology, the third to results and discussion, and the fourth to conclusion and future work.

Experimental Methods

    The proposed methodology consists of two major components: the first is the data labeling for power consumption, and the second is the construction of a prediction model utilizing supervised machine learning techniques. The classifiers used in this work are Random Forest (RF) and Decision Tree (DT). Subsections 2.1 to 2.4 elaborate on the methodology framework. To begin, data collection and data cleaning are discussed as essential components of data analysis. Subsections 2.2 and 2.3 discuss the labeling process and predictive models. The performance indicator used in evaluating the proposed method is detailed in Subsection 2.4.

2.1. Data Collection

        The electrical dataset was obtained from the Archive of The Irish Social Science Data on Commission for Energy Regulation Smart Metering Project (2012), which included power usage data from 987 smart meters collected between 2009 and 2010.  The dataset has three attributes: "Meter ID,” "Electricity Consumed (in kWh)," and "Five Digit Code”.  The word "Meter ID" is the smart meter's name, and the "Five Digit Code" represents the date and time.  "Electricity Consumed (in kWh)" refers to the amount of electricity consumed during a 30 minutes interval. Table 1 shows an example of some data.

Table 1 Example of data instances

Meter ID

Five Digit Code

Electricity Consumed (in kWh)

1392

19501

0.157

1392

19502

0.144

1392

19504

0.138

1287

19501

0.840

The classifier's performance in predicting the occurrence of anomalies was investigated using 11 smart meters in this study. Out of these 11 smart meters, 10 smart meters consist of about 23,000 to 25,000 readings and will be used in the experiment as large datasets. The last remaining smart meter contains 3,984 readings and will be categorized as a small dataset.  To investigate the impact of data size in detection, the 9 smart meters in large datasets are trimmed to 3,984 readings and used as small datasets in the experiment. Consequently, the size of the small dataset is approximately 16% of the size of the large dataset. 

2.2. Data labeling using Isolation Forest

        The data provided is solely power consumption data, with no indication of normal or abnormal usage. Thus, the Isolation Forest (IF) algorithm is used to classify the power consumption utilization into normal or abnormal. Table 2 illustrates an example of the labeled data after using the IF.  The anomaly scores calculated from IF range from 0 to 1. If the score is less than 0.5, then it will be labeled as "A" (Abnormal); otherwise, it will be labeled as "N" (Normal).

Following the labeling stage, the power consumption values become a sequence of "N" and "A". The sequence of "N" and "A" labels will then be divided into a shorter sequence with a size of m elements.  For example, elements 1 to m will form the first sequence as the inputs to the prediction model, with (m+1)th element being the output of the prediction model.  Whereas elements 2 to m+1 will create the second sequence as the inputs to the prediction model with (m+2)th element as the output, and so forth.  If the consumption sequence is "NAAANNNA..." and m = 5, then the first sequence is "NAAAN", the second sequence is "AAANN", the third sequence is "AANNN", and so on.  The corresponding outputs for these sequences are "N", "N", and "A", etc.    The raw electrical usage data was converted into data sequences with labels of "N" or "A" using IF. The number of normal labels was greater than the number of abnormal labels after the data was processed and segmented into sequences. As a result, the dataset is unbalanced.

Table 2 Example of labeled data

Electricity consumed

(kWh)

Label

0.728

N

1.046

A

0.625

N

0.625

N

2.3. Predictive Models

        One of the objectives of this research is to build a predictive model that can predict whether power consumption is "N" or "A" 30 minutes in advance.  Since the data have been labeled by IF, the supervised machine approach in the classification technique is chosen to build the model. According to Himeur et al. (2021b), annotated training datasets that label both normal and abnormal power are required to train machine learning classifiers (binary or multi-class) for supervised anomaly identification. In classification, a tree-based machine learning algorithm is widely used for prediction.  The two most well-known tree-based machine learning algorithms are DT and RF. First, DT is a supervised learning technique where the process starts with the root node and then branches out to the parent node, child node, or lead node accordingly. The leaf nodes represent the outcomes, whereas the parent and child nodes are the data that were split.  RF is built using a large number of DT, and the final result is based on a majority vote (Nallathambi & Ramasamy, 2017). 

        The built model is generic and flexible for the datasets from any smart meter. From the whole dataset, the data will be split into train data and test data.  The test data is treated as unseen data as the model is trained using the train data. 

2.4. Model Evaluation

        In the classification model evaluation process, a confusion matrix will first be obtained; it consists of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). Precision, Recall, F1 score, and Accuracy will be used as the metrics to assess the model. The formulas for Precision, Recall, F1 score, and Accuracy are shown in Equations (1) to Equations (4).

Precision is defined as the ratio of predictions of positives that are predicted correctly to the positive predictions in total. While Recall or also known as sensitivity is the ratio of predicted positive examples correctly to total positive examples. The F1 score is a balanced measure indicator based on Precision and Recall (Poh et al., 2019).  For each model, the Precision, Recall, F1 score, and Accuracy are calculated. 

Results and Discussion

    To evaluate the performance of the proposed approach. The readings from various smart meters are used. The data from these smart meters exhibit different characteristics.  The labeled power consumption status of varying length m is input into the predictive model to predict the (m+1)th occurrence, which indicates the power consumption status for the next 30 minutes. To evaluate the performance of the proposed method, the predicted and actual status are compared.

        Tables 3 and 4 indicate the average performance of the proposed models for smart meter readings of varying lengths of m for the large and small dataset, respectively.  Table 3 presents the performance of the large dataset, while Table 4 demonstrates the performance of the small dataset. In addition to calculating the model's performance in detecting anomalies, similar calculations are also performed to evaluate the robustness of the proposed model in predicting the normal status. 

Table 3 Average values of the Precisions, Recall, and F1 scores for the large dataset

RF

DT

m=3

Normal

Precision

0.94

0.94

Recall

0.94

0.94

F1

0.94

0.94

Abnormal

Precision

0.75

0.75

Recall

0.71

0.71

F1

0.73

0.73

m=4

Normal

Precision

0.94

0.94

Recall

0.94

0.94

F1

0.94

0.94

Abnormal

Precision

0.76

0.76

Recall

0.73

0.73

F1

0.74

0.74

m=5

Normal

Precision

0.93

0.93

Recall

0.95

0.95

F1

0.94

0.94

Abnormal

Precision

0.76

0.76

Recall

0.72

0.72

F1

0.74

0.74

m=6

Normal

Precision

0.94

0.94

Recall

0.94

0.94

F1

0.94

0.94

Abnormal

Precision

0.76

0.76

Recall

0.73

0.73

F1

0.74

0.74

m=7

Normal

Precision

0.92

0.92

Recall

0.94

0.94

F1

0.93

0.93

Abnormal

Precision

0.76

0.73

Recall

0.71

0.72

F1

0.74

0.74

 

Overall, the proposed methods perform better at predicting normal status than abnormal for RF and DT in both small and big datasets.  The average values of the Precision, Recall, and F1 scores are above 90% for small and large datasets over different lengths of m in detecting normal cases. In contrast, these three metrics vary from 68% to 76% for various lengths of m in predicting abnormal status. The detection of abnormal is not as good as the detection of normal; this could be because the occurrence of abnormality is much lesser than normal. Consequently, the models do not receive adequate training on the pattern of an abnormal. According to Feng et al. (2020), imbalanced datasets can cause a decrease in the detection accuracy rate.  Traditional classification algorithms frequently struggle to learn from unbalanced datasets when the training set contains a disproportionate number of samples from the majority class compared to the other minority classes (Sisodia & Verma, 2019). 

In terms of accuracy, Figure 1 illustrates the average performance of the RF model on a large dataset versus a small dataset.  As observed, the highest average accuracy attained was 90% at m = 3 and m = 4 for the large dataset, then slightly decelerated to 89.9% from m = 5 to m = 7. While on the other side, the RF performance model remained at 89% from m = 3 to m = 7, with a minor dropping that can be negligible.

Table 4 Average values of the Precisions, Recall, and F1 scores for the small dataset

RF

DT

m=3

Normal

Precision

0.94

0.94

Recall

0.94

0.94

F1

0.94

0.94

Abnormal

Precision

0.75

0.75

Recall

0.71

0.71

F1

0.73

0.73

m=4

Normal

Precision

0.95

0.95

Recall

0.94

0.94

F1

0.94

0.94

Abnormal

Precision

0.76

0.76

Recall

0.73

0.73

F1

0.74

0.74

m=5

Normal

Precision

0.93

0.93

Recall

0.94

0.94

F1

0.93

0.93

Abnormal

Precision

0.74

0.73

Recall

0.68

0.70

F1

0.70

0.71

m=6

Normal

Precision

0.93

0.93

Recall

0.94

0.94

F1

0.93

0.93

Abnormal

Precision

0.73

0.73

Recall

0.69

0.69

F1

0.71

0.71

m=7

Normal

Precision

0.92

0.92

Recall

0.94

0.92

F1

0.93

0.94

Abnormal

Precision

0.74

0.73

Recall

0.69

0.68

F1

0.71

0.71


Figure 1  Average values of the Accuracy of RF for the large dataset and the small dataset

       Figure 2 shows the average performance of DT on the large dataset and small dataset in terms of accuracy. As observed, the highest accuracy score was 90% using the large dataset when m = 4 and slightly dropped to 89.8% from m = 5 to m = 7. On the other hand, the DT model achieved 89.8% at m = 3 and fluctuated between 89.5% to 89.4% from m = 4 to m = 7. 

In general, the performance scores are quite stable in terms of Precision, Recall, F1 score, and Accuracy, regardless of the different electricity consumption profiles, the prediction techniques employed, the size of the training datasets, or the length of the sequence used (m).    


Figure 2 Average values of the Accuracy of DT for the large dataset and the small dataset

Conclusion

This study introduces a framework for forecasting electrical usage anomalies of one unit (30 minutes) of time ahead. The proposed method provides an approach to label the power consumption data as either normal or abnormal while also transforming the data from the continuous form to discrete categories and thus reducing the complexity of the power consumption data. The sequence of discrete data can then be utilized in supervised training machine approaches: DT or RF to predict the abnormality. Experiments were performed on 11 smart meters collected from the Irish Social Science Data Archive (ISSDA). The results show that the proposed methods perform well in detecting abnormal occurrences, with an accuracy of about 90%.  The results of the experiments show that the proposed approaches perform quite consistently, regardless of the dataset size used in the training phase, prediction approach, or length of the sequence used (m).   A few issues, such as data imbalance and adaptability of the approach to other smart meters, must be addressed in the future.

Acknowledgement

        This research is partially funded by TM R&D, Malaysia (MMUE/ 220024) and Multimedia University IR Fund (MMUI/ 210128).

References

Alraddadi, G.H., Othman, M.T.B., 2022. Development of an Efficient Electricity Consumption Prediction Model using Machine Learning Techniques. International Journal of Advanced Computer Science and ApplicationsVolume13(1), pp. 376384

Bhattacharya, B., Sinha, A., 2017. Intelligent Fault Analysis in Electrical Power Grids. In: IEEE 29Th International Conference on Tools with Artificial Intelligence (ICTAI)

Brazovskaia, V., Gutman, S., (2021). Classification of Regions by Climatic Characteristics for the Use of Renewable Energy Sources. International Journal of Technology. Volume 12(7), pp. 15371545 

Castangia, M., Sappa, R., Girmay, A., Camarda, C., Macii, E., Patti, E., 2021. Detection of Anomalies in Household Appliances from Disaggregated Load Consumption. In: 2021 International Conference on Smart Energy Systems and Technologies (SEST), pp. 1-6

Chahla, C., Snoussi, H., Merghem, L., Esseghir, M., 2020. A Deep Learning Approach for Anomaly Detection and Prediction in Power Consumption Data. Energy Efficiency, Volume 13(8), pp. 16331651

Chandola, V., Banerjee, A., Kumar, V.,2009. Anomaly Detection: A Survey. ACM Computing Surveys, Volume 41, pp. 1–58

Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-2010 [dataset]. 1st Edition. Irish Social Science Data Archive. SN: 0012-00.

Feng, L., Xu, S., Zhang, L., Wu, J., Zhang, J., Chu, C., Wang, Z., and Shi, H., 2020. Anomaly Detection for Electricity Consumption in Cloud Computing: Framework, Methods, Applications, and Challenges. EURASIP Journal on Wireless Communications and Networking, Volume (194), pp. 1–12

Fenza, G., Gallo, M., Loia, V., 2019. Drift-Aware Methodology for Anomaly Detection in Smart Grid. IEEE Access, Volume 7, pp. 96459657

Ford, V., Siraj, A., Eberle, W., 2014. Smart Grid Energy Fraud Detection using Artificial Neural Networks. In: IEEE Symposium on Computational Intelligence Applications In Smart Grid (CIASG)

Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A., 2020. A Novel Approach for Detecting Anomalous Energy Consumption Based on Micro-Moments and Deep Neural Networks. Cognitive Computation, Volume 12(6), pp. 13811401

Himeur, Y., Alsalemi, A., Bensaali, F., Amira, A., 2021a. Smart Power Consumption Abnormality Detection in Buildings Using Micro-Moments and Improved K-Nearest Neighbors. International Journal of Intelligent Systems, Volume 36(6), 28652894

Himeur, Y., Ghanem, K., Alsalemi, A., Bensaali, F., Amira, A., 2021b. Artificial Intelligence Based Anomaly Detection of Energy Consumption in Buildings: A Review, Current Trends, and New Perspectives. Applied Energy, Volume 287, p. 116601

Krishnan, S., Chinthakunta, V., Kok Swee, S., 2019. Smart Home Meter Profiler with Load Authentication, Shock Protection, Fault Proof, and Restricted Demand Management. International Journal of Technology. Volume 10(7), pp. 12861296 

Nallathambi, S., Ramasamy, K., 2017. Prediction of Electricity Consumption Based on DT And RF: An Application on USA Country Power Consumption. In: IEEE International Conference on Electrical, Instrumentation, And Communication Engineering (ICE ICE)

Poh, S., Tan, Y., Cheong, S., Ooi, C., Tan, W., 2019. Anomaly Detection on In-Home Activities Data Based on Time Interval. Indonesian Journal of Electrical Engineering and Computer Science, Volume 15(2), p. 778

Sisodia, D.S., Verma, U., 2019. Distinct Multiple Learner-Based Ensemble SMOTEBagging (ML-ESB) Method for Classification of Binary Class Imbalance Problems. International Journal of Technology, Volume 10(4), pp. 721730 

Villar-Rodriguez, E., Del Ser, J., Oregi, I., Bilbao, M.,Gil-Lopez, S., 2017. Detection of Non-Technical Losses in Smart Meter Data Based on Load Curve Profiling and Time Series Analysis. Energy, Volume 137, pp. 118128

World Energy Outlook., 2019. Part of World Energy Outlook.  Available online at https://www.iea.org/reports/world-energy-outlook-2019, Accessed on 3 May 2022

Yuan, Y., Dehghanpour, K., Bu, F., Wang, Z., 2020. Outage Detection in Partially Observable Distribution Systems Using Smart Meters and Generative Adversarial Networks. IEEE Transactions on Smart Grid, Volume 11(6), pp. 54185430

Zhang, J., Zhang, H., Ding, S., Zhang, X.,2021. Power Consumption Predicting and Anomaly Detection Based on Transformer and K-Means. Frontiers In Energy Research, Volume 9, pp. 18