Published at : 03 Nov 2022
Volume : IJtech
Vol 13, No 6 (2022)
DOI : https://doi.org/10.14716/ijtech.v13i6.5931
Rawan ELhadad | Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia |
Yi-Fei Tan | Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia |
Wooi-Nee Tan | Faculty of Engineering, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia |
Electricity
demand is increasing proportionally to the increase in power usage. Without a
doubt, energy efficiency has gained significant importance and attention, with
one of the primary concerns being the detection and forecasting of abnormal
consumption. In this paper, the authors
proposed a method to predict the occurrence of abnormal consumption behavior in
advance. The proposed method utilizes the Isolation Forest algorithm to label
the smart meter electricity consumption readings as normal or abnormal. It
generates a sequence of data with varying lengths. Based on the data sequence, two supervised machine learning algorithms, Random
Forest, and Decision Tree were developed to forecast the occurrence of power
anomaly consumption. Experiment results showed that the proposed methods
consistently detect and predict the abnormal status 30 minutes ahead. There is no significant difference between
Random Forest and Decision Tree performance on different smart meter readings,
dataset sizes, and other data sequence lengths. The proposed methods portray an
alternative approach that is capable of auto-label normal and abnormal data
and, as a result, dealing with the sequence of label data in the prediction
process while avoiding the dynamic behavior of the power consumption data.
Decision Tree; Isolation Forest; Power consumption anomaly; Prediction; Random Forest
Smart
sensors and sub-meters installed in residential buildings create vast amounts
of data daily. If these data are
utilized correctly, they may be able to assist end-users, energy suppliers, and
utility corporations in recognizing, and explaining unusual power consumption. Therefore, anomaly
detection could prevent minor issues from turning into major issues. It will
also help with better decision-making, decreasing energy waste, and promoting
sustainable and energy-efficient behavior (Himeur
et al., 2021b).
Anomaly
detection, as the name implies, is a technique for detecting data that is out
of the ordinary. Anomalies in data are events that do not follow the expected
pattern of behavior. In reality,
anomalies are classified into three types: point anomalies, contextual
anomalies, and collective anomalies. A
point anomaly occurs when a single point in the data is excessively high or low
compared to other points. A contextual anomaly occurs when a data instance is regarded
as abnormal in one context but normal in another. Whereas a collective anomaly happens when a group of linked
data instances is anomalous relative to the entire data set, where individual
data instances are not anomalies on their own, but their presence as a group is
(Chandola et al., 2009).
Anomaly
detection in power consumption can be applied to various domains. One of them
involves non-technical loss. Villar-Rodriguez et al. (2017) established a
method for detecting smart-meter failures utilizing load curve technique
profiling and time series analysis. The dataset was obtained from a Spanish
utility. Various distance-based learning algorithms and an inner classification
technique were applied, such as the Local Outlier Factor and Least Square Anomaly
detection. To group non-technical losses, the authors used a clustering
technique. While Ford et al. (2014)
suggested a fraud detection framework on a smart grid using artificial neural
networks. The dataset in this research was gathered from 5000 residential and
600 business buildings from 2009 to 2011. The authors claimed that the
suggested model outperformed existing models in the same area of research at
that time.
Detecting
unusual power usage in domestic appliances is critical for grid optimization
and reducing undesired electrical absorptions in residential structures.
Abnormal power usage in home appliances can cause severe chaos. According to Krishnan et al. (2019), none of the items on the
market could provide complete protection from utility risks, and the power
protection systems in single-phase system homes are not entirely hazard-proof.
This is evidenced by the continuing occurrences of fatal electrical shock
injuries and fatalities, as well as the ongoing fires caused by electrical
hazards. Thus, anomaly feedback can alert customers and assist them in
detecting faulty equipment or devices that consume more energy than required or
necessary. Several data mining and
statistical methodologies have been used to uncover data patterns to find
anomalies in energy use. Most of these
systems recommend detecting irregularities in an offline mode due to the high
volume of data involved (Zhang et al., 2021).
Nevertheless, Bhattacharya and Sinha
(2017)
proposed intelligent fault analysis in the electrical power grid to detect any
grid faults. In this project, two classifiers were built: Support Vector Machine (SVM) and Long
Short-Term Memory (LSTM). The authors were able to predict the maximum level
voltage using historical fault data and allocating the location of fault
occurrence with the help of the proposed model. Meanwhile, Castangia et al. (2021) employed an anomaly
detection framework to track the hourly energy usage of three common power
absorption sources: the baseline, the refrigerator, and electrical gadgets. As
the focus of their research, they concentrated on single-point deviations and
aberrant patterns. By analyzing single-point variances, they could identify
short-term power surges caused by unplanned electrical faults or abrupt changes
in end-user routines.
Another
application of anomaly detection is identifying unusual consumption
patterns and thus reducing excessive power consumption. It is a crucial step in
developing efficient energy-saving systems that reduce total power consumption
and carbon emissions. In this context, a
group of researchers (Himeur et al., 2021a)
proposed two novel schemes, one of which was an unsupervised learning approach
and the other a supervised learning approach.
The first scheme was an Unsupervised Abnormality Detection based on
One-Class Support Vector Machine (UAD-OCSVM) to detect abnormal consumption data. The second scheme was a Supervised Abnormality
Detection based on Micro-moments (SADM2) and an enhanced K-nearest neighbors’
model. The results demonstrated that
SADM2 outperforms other machine learning algorithms in anomaly detection and
real-time processing capabilities at a much lower computational cost. On the
other hand, Yuan et al. (2020) used the Breadth-First
Search (BFS) mechanism and threshold to detect anomalies in power
consumption. Their proposed method could
achieve high detection accuracy over a small number of smart meters. Himeur et al.
(2020) proposed a new model for detecting abnormal energy consumption by
extracting micro-moment features using a rule-based model and deploying a Deep
Neural Network (DNN) to enhance the efficiency of classifying and detecting
anomalies.
Many
researchers deploy LSTM to forecast electric consumption because it is
powerful, flexible, and can deal with complex multi-dimensional time-series
data (Alraddadi & Othman 2022). Chahla et al. (2020) provided a unique
unsupervised method for detecting power consumption abnormalities where they
deployed the K-means algorithm to group the data along with LSTM to forecast
the electric consumption in the next 24 hours. Moreover, Fenza et al. (2019) also deployed LSTM and K-means
to propose a drift-aware algorithm for detecting anomalies in smart grids. The
authors utilized deep learning approaches, showing that the proposed algorithms
could accurately predict the anomalies.
In
this research, we would like to develop a model that can detect anomalies in
power consumption patterns in advance by using a combination of machine
learning algorithms. Most existing
projects in the same field of study are solely concerned with predicting power
consumption or forecasting the consumption value before applying an algorithm
to detect anomalies. As is well known,
the dynamic nature of power consumption data complicates data analysis. This
project aims to avoid dynamic behavior by transforming power consumption data
into a sequence of labeled data, which is then used in the machine learning
algorithm-based prediction process. The
proposed work can predict the anomaly power consumption one unit of time in
advance, depending on the frequency of smart meter readings. One of the
challenges is labeling the power consumption data into abnormal or normal.
Furthermore, the way the data is prepared for machine learning algorithms to
predict the anomaly. The paper is organized as follows: The second section is
dedicated to methodology, the third to results and discussion, and the fourth
to conclusion and future work.
The proposed methodology
consists of two major components: the first is the data labeling for power
consumption, and the second is the construction of a prediction model utilizing
supervised machine learning techniques. The classifiers used in this work are
Random Forest (RF) and Decision Tree (DT). Subsections 2.1 to 2.4 elaborate on
the methodology framework. To begin, data collection and data cleaning are
discussed as essential components of data analysis. Subsections 2.2 and 2.3
discuss the labeling process and predictive models. The performance indicator
used in evaluating the proposed method is detailed in Subsection 2.4.
2.1. Data Collection
The electrical dataset was obtained from
the Archive of The Irish Social Science Data on Commission for Energy
Regulation Smart Metering Project (2012), which included power usage data from 987 smart meters collected
between 2009 and 2010. The dataset has
three attributes: "Meter ID,” "Electricity Consumed (in kWh),"
and "Five Digit Code”. The word
"Meter ID" is the smart meter's name, and the "Five Digit
Code" represents the date and time.
"Electricity Consumed (in kWh)" refers to the amount of
electricity consumed during a 30 minutes interval. Table 1 shows an example of
some data.
Table 1 Example of data instances
Meter ID |
Five Digit Code |
Electricity Consumed (in kWh) |
1392 |
19501 |
0.157 |
1392 |
19502 |
0.144 |
1392 |
19504 |
0.138 |
1287 |
19501 |
0.840 |
The
classifier's performance in predicting the occurrence of anomalies was
investigated using 11 smart meters in this study. Out of these 11 smart meters,
10 smart meters consist of about 23,000 to 25,000 readings and will be used in
the experiment as large datasets. The last remaining smart meter contains 3,984
readings and will be categorized as a small dataset. To investigate the impact of data size in
detection, the 9 smart meters in large datasets are trimmed to 3,984 readings
and used as small datasets in the experiment. Consequently, the size of the
small dataset is approximately 16% of the size of the large dataset.
2.2. Data labeling using Isolation Forest
The data provided is solely power
consumption data, with no indication of normal or abnormal usage. Thus, the
Isolation Forest (IF) algorithm is used to classify the power consumption
utilization into normal or abnormal. Table 2 illustrates an example of the
labeled data after using the IF. The
anomaly scores calculated from IF range from 0 to 1. If the score is less than
0.5, then it will be labeled as "A" (Abnormal); otherwise, it will be
labeled as "N" (Normal).
Following the
labeling stage, the power consumption values become a sequence of "N"
and "A". The sequence of "N" and "A" labels will
then be divided into a shorter sequence with a size of m elements. For example,
elements 1 to m will form the first
sequence as the inputs to the prediction model, with (m+1)th element being the output of the prediction model. Whereas elements 2 to m+1 will create the second sequence as the inputs to the prediction
model with (m+2)th element as the
output, and so forth. If the consumption
sequence is "NAAANNNA..." and m
= 5, then the first sequence is "NAAAN", the second sequence is
"AAANN", the third sequence is "AANNN", and so on. The corresponding outputs for these sequences
are "N", "N", and "A", etc. The raw electrical usage data was converted
into data sequences with labels of "N" or "A" using IF. The
number of normal labels was greater than the number of abnormal labels after
the data was processed and segmented into sequences. As a result, the dataset
is unbalanced.
Table 2 Example of labeled data
Electricity consumed (kWh) |
Label |
0.728 |
N |
1.046 |
A |
0.625 |
N |
0.625 |
N |
2.3. Predictive Models
One of the objectives of this research is to build a
predictive model that can predict whether power consumption is "N" or
"A" 30 minutes in advance.
Since the data have been labeled by IF, the supervised machine approach
in the classification technique is chosen to build the model. According to Himeur et al. (2021b), annotated training datasets
that label both normal and abnormal power are required to train machine
learning classifiers (binary or multi-class) for supervised anomaly
identification. In classification, a tree-based machine learning algorithm is
widely used for prediction. The two most
well-known tree-based machine learning algorithms are DT and RF. First, DT is a
supervised learning technique where the process starts with the root node and
then branches out to the parent node, child node, or lead node accordingly. The
leaf nodes represent the outcomes, whereas the parent and child nodes are the
data that were split. RF is built using
a large number of DT, and the final result is based on a majority vote (Nallathambi & Ramasamy,
2017).
The built model is generic and flexible for the datasets from
any smart meter. From the whole dataset, the data will be split into train data
and test data. The test data is treated
as unseen data as the model is trained using the train data.
2.4. Model Evaluation
In the classification model evaluation process, a confusion matrix will first be obtained; it consists of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). Precision, Recall, F1 score, and Accuracy will be used as the metrics to assess the model. The formulas for Precision, Recall, F1 score, and Accuracy are shown in Equations (1) to Equations (4).
Precision
is defined as the ratio of predictions of positives that are predicted
correctly to the positive predictions in total. While Recall or also known as
sensitivity is the ratio of predicted positive examples correctly to total
positive examples. The F1 score is a balanced measure indicator based on
Precision and Recall (Poh et al., 2019). For each model, the
Precision, Recall, F1 score, and Accuracy are calculated.
To evaluate the performance of the proposed approach. The readings
from various smart meters are used. The data from these smart meters exhibit
different characteristics. The labeled
power consumption status of varying length m
is input into the predictive model to predict the (m+1)th occurrence, which indicates the power consumption status for
the next 30 minutes. To evaluate the performance of the proposed method, the
predicted and actual status are compared.
Tables
3 and 4 indicate the average performance of the proposed models for smart meter
readings of varying lengths of m for the large and small dataset,
respectively. Table 3 presents the
performance of the large dataset, while Table 4 demonstrates the performance of
the small dataset. In addition to calculating the model's performance in
detecting anomalies, similar calculations are also performed to evaluate the
robustness of the proposed model in predicting the normal status.
Table 3 Average values of the Precisions, Recall, and F1 scores for the
large dataset
RF |
DT |
|||
m=3 |
Normal |
Precision |
0.94 |
0.94 |
Recall |
0.94 |
0.94 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.75 |
0.75 |
|
Recall |
0.71 |
0.71 |
||
F1 |
0.73 |
0.73 |
||
m=4 |
Normal |
Precision |
0.94 |
0.94 |
Recall |
0.94 |
0.94 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.76 |
0.76 |
|
Recall |
0.73 |
0.73 |
||
F1 |
0.74 |
0.74 |
||
m=5 |
Normal |
Precision |
0.93 |
0.93 |
Recall |
0.95 |
0.95 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.76 |
0.76 |
|
Recall |
0.72 |
0.72 |
||
F1 |
0.74 |
0.74 |
||
m=6 |
Normal |
Precision |
0.94 |
0.94 |
Recall |
0.94 |
0.94 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.76 |
0.76 |
|
Recall |
0.73 |
0.73 |
||
F1 |
0.74 |
0.74 |
||
m=7 |
Normal |
Precision |
0.92 |
0.92 |
Recall |
0.94 |
0.94 |
||
F1 |
0.93 |
0.93 |
||
Abnormal |
Precision |
0.76 |
0.73 |
|
Recall |
0.71 |
0.72 |
||
F1 |
0.74 |
0.74 |
Overall, the proposed methods
perform better at predicting normal status than abnormal for RF and DT in both
small and big datasets. The average
values of the Precision, Recall, and F1 scores are above 90% for small and
large datasets over different lengths of m
in detecting normal cases. In contrast, these three metrics vary from 68% to
76% for various lengths of m in
predicting abnormal status. The detection of abnormal is not as good as the
detection of normal; this could be because the occurrence of abnormality is
much lesser than normal. Consequently, the models do not receive adequate
training on the pattern of an abnormal. According to Feng
et al. (2020), imbalanced datasets can cause a decrease in the detection
accuracy rate. Traditional classification
algorithms frequently struggle to learn from unbalanced datasets when the
training set contains a disproportionate number of samples from the majority
class compared to the other minority classes (Sisodia & Verma, 2019).
In terms of accuracy, Figure 1
illustrates the average performance of the RF model on a large dataset versus a
small dataset. As observed, the highest
average accuracy attained was 90% at m
= 3 and m = 4 for the large dataset,
then slightly decelerated to 89.9% from m
= 5 to m = 7. While on the other
side, the RF performance model remained at 89% from m = 3 to m = 7, with a
minor dropping that can be negligible.
Table 4 Average values of the
Precisions, Recall, and F1 scores for the small dataset
RF |
DT |
|||
m=3 |
Normal |
Precision |
0.94 |
0.94 |
Recall |
0.94 |
0.94 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.75 |
0.75 |
|
Recall |
0.71 |
0.71 |
||
F1 |
0.73 |
0.73 |
||
m=4 |
Normal |
Precision |
0.95 |
0.95 |
Recall |
0.94 |
0.94 |
||
F1 |
0.94 |
0.94 |
||
Abnormal |
Precision |
0.76 |
0.76 |
|
Recall |
0.73 |
0.73 |
||
F1 |
0.74 |
0.74 |
||
m=5 |
Normal |
Precision |
0.93 |
0.93 |
Recall |
0.94 |
0.94 |
||
F1 |
0.93 |
0.93 |
||
Abnormal |
Precision |
0.74 |
0.73 |
|
Recall |
0.68 |
0.70 |
||
F1 |
0.70 |
0.71 |
||
m=6 |
Normal |
Precision |
0.93 |
0.93 |
Recall |
0.94 |
0.94 |
||
F1 |
0.93 |
0.93 |
||
Abnormal |
Precision |
0.73 |
0.73 |
|
Recall |
0.69 |
0.69 |
||
F1 |
0.71 |
0.71 |
||
m=7 |
Normal |
Precision |
0.92 |
0.92 |
Recall |
0.94 |
0.92 |
||
F1 |
0.93 |
0.94 |
||
Abnormal |
Precision |
0.74 |
0.73 |
|
Recall |
0.69 |
0.68 |
||
F1 |
0.71 |
0.71 |
Figure 1 Average
values of the Accuracy
of RF for the large dataset and the small dataset
Figure
2 shows the average performance of DT on the large dataset and small dataset in
terms of accuracy. As observed, the highest accuracy score was 90% using the
large dataset when m = 4 and slightly
dropped to 89.8% from m = 5 to m = 7. On the other hand, the DT model
achieved 89.8% at m = 3 and
fluctuated between 89.5% to 89.4% from m =
4 to m = 7.
In general, the performance scores are quite stable in terms of Precision, Recall, F1 score, and Accuracy, regardless of the different electricity consumption profiles, the prediction techniques employed, the size of the training datasets, or the length of the sequence used (m).
Figure
2 Average values of the Accuracy of DT
for the large dataset and the small dataset
This study
introduces a framework for forecasting electrical usage anomalies of one unit
(30 minutes) of time ahead. The proposed method provides an approach to label
the power consumption data as either normal or abnormal while also transforming
the data from the continuous form to discrete categories and thus reducing the
complexity of the power consumption data. The sequence of discrete data can
then be utilized in supervised training machine approaches: DT or RF to predict
the abnormality. Experiments were performed on 11 smart meters collected from
the Irish Social Science Data Archive (ISSDA). The results show that the
proposed methods perform well in detecting abnormal occurrences, with an
accuracy of about 90%. The results of
the experiments show that the proposed approaches perform quite consistently,
regardless of the dataset size used in the training phase, prediction approach,
or length of the sequence used (m). A few issues, such as data imbalance and
adaptability of the approach to other smart meters, must be addressed in the
future.
This research is partially
funded by TM R&D, Malaysia (MMUE/ 220024) and Multimedia University IR Fund
(MMUI/ 210128).
Alraddadi,
G.H., Othman, M.T.B., 2022. Development of an Efficient Electricity
Consumption Prediction Model using Machine Learning Techniques. International
Journal of Advanced Computer Science and Applications, Volume13(1), pp. 376–384
Bhattacharya,
B., Sinha, A., 2017. Intelligent Fault Analysis in Electrical Power Grids. In: IEEE 29Th International Conference on Tools
with Artificial Intelligence (ICTAI)
Brazovskaia,
V., Gutman, S., (2021). Classification of Regions by Climatic Characteristics
for the Use of Renewable Energy Sources. International Journal of Technology. Volume 12(7), pp. 1537–1545
Castangia,
M., Sappa, R., Girmay, A., Camarda, C., Macii, E., Patti, E., 2021. Detection
of Anomalies in Household Appliances from Disaggregated Load Consumption. In: 2021 International Conference on
Smart Energy Systems and Technologies (SEST), pp. 1-6
Chahla,
C., Snoussi, H., Merghem, L., Esseghir,
M., 2020. A Deep Learning Approach for Anomaly Detection and Prediction in Power Consumption Data. Energy Efficiency,
Volume 13(8), pp. 1633–1651
Chandola,
V., Banerjee, A., Kumar, V.,2009. Anomaly Detection: A Survey. ACM Computing Surveys, Volume
41, pp. 1–58
Commission for Energy Regulation (CER). (2012).
CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-2010
[dataset]. 1st Edition. Irish Social Science Data Archive. SN: 0012-00.
Feng,
L., Xu, S., Zhang, L., Wu, J., Zhang, J., Chu, C., Wang, Z., and Shi, H., 2020. Anomaly
Detection for Electricity Consumption in Cloud Computing: Framework, Methods,
Applications, and Challenges. EURASIP Journal on Wireless
Communications and Networking, Volume (194), pp. 1–12
Fenza,
G., Gallo, M., Loia, V., 2019. Drift-Aware Methodology for Anomaly
Detection in Smart Grid. IEEE Access, Volume 7,
pp. 9645–9657
Ford,
V., Siraj, A., Eberle, W., 2014. Smart Grid Energy Fraud Detection using
Artificial Neural Networks. In: IEEE
Symposium on Computational Intelligence Applications In Smart Grid (CIASG)
Himeur,
Y., Alsalemi, A., Bensaali, F., Amira, A., 2020. A Novel Approach for Detecting
Anomalous Energy Consumption Based on Micro-Moments and Deep Neural Networks. Cognitive
Computation, Volume 12(6), pp. 1381–1401
Himeur,
Y., Alsalemi, A., Bensaali, F., Amira, A., 2021a.
Smart Power Consumption Abnormality Detection in Buildings Using Micro-Moments and Improved K-Nearest Neighbors. International
Journal of Intelligent Systems, Volume 36(6),
2865–2894
Himeur,
Y., Ghanem, K., Alsalemi, A., Bensaali, F., Amira, A., 2021b. Artificial Intelligence
Based Anomaly Detection of Energy Consumption in
Buildings: A Review, Current Trends, and
New Perspectives. Applied Energy, Volume 287,
p. 116601
Krishnan,
S., Chinthakunta, V., Kok Swee, S., 2019. Smart Home Meter Profiler with Load
Authentication, Shock Protection, Fault Proof, and Restricted Demand
Management. International Journal of Technology. Volume 10(7),
pp. 1286–1296
Nallathambi,
S., Ramasamy, K., 2017. Prediction of Electricity Consumption Based on DT And
RF: An Application on USA Country Power Consumption. In: IEEE International Conference on Electrical,
Instrumentation, And Communication Engineering (ICE ICE)
Poh,
S., Tan, Y., Cheong, S., Ooi, C., Tan, W., 2019. Anomaly Detection on In-Home Activities
Data Based on Time Interval. Indonesian Journal of
Electrical Engineering and Computer Science, Volume 15(2),
p. 778
Sisodia,
D.S., Verma, U., 2019. Distinct Multiple Learner-Based Ensemble SMOTEBagging
(ML-ESB) Method for Classification of Binary Class Imbalance Problems. International
Journal of Technology, Volume 10(4), pp. 721–730
Villar-Rodriguez,
E., Del Ser, J., Oregi, I., Bilbao, M.,Gil-Lopez, S., 2017. Detection of
Non-Technical Losses in Smart Meter Data Based on Load Curve Profiling and Time
Series Analysis. Energy, Volume 137,
pp. 118–128
World
Energy Outlook., 2019. Part of World Energy Outlook. Available online at
https://www.iea.org/reports/world-energy-outlook-2019, Accessed on 3 May 2022
Yuan,
Y., Dehghanpour, K., Bu, F., Wang, Z., 2020. Outage Detection in Partially
Observable Distribution Systems Using Smart Meters and Generative Adversarial
Networks. IEEE Transactions on Smart Grid, Volume 11(6), pp. 5418–5430
Zhang,
J., Zhang, H., Ding, S., Zhang, X.,2021. Power Consumption Predicting and
Anomaly Detection Based on Transformer and K-Means. Frontiers In Energy
Research, Volume 9, pp. 1–8