Published at : 29 Nov 2019
Volume : IJtech
Vol 10, No 7 (2019)
DOI : https://doi.org/10.14716/ijtech.v10i7.3252
Tong-Sheng Wong | Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia |
Gaik-Yee Chan | Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia |
Fang-Fang Chua | Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia |
One major challenge in delivering and
accessing cloud applications is the management of Quality of Services (QoS). It
is mandatory for cloud service providers to ensure their performance and fulfil
QoS, as defined in the Service Level Agreement (SLA). In this paper, we propose
a Scaling and Fault Tolerance (SFT) algorithm to deploy preventive or remedial
measures based on 16 decision rules for QoS violation detection and prediction.
We simulate the SFT algorithm in a cloud simulator with four scenarios to
measure its effectiveness in handling events such as faulty virtual machines
(VMs), or over and under-provisioning of resources. Our experimental results
show that the proposed SFT algorithm performs effectively (close to a 90%-100%
effective rate) in providing preventive or remedial measures and reducing the
number of VMs when they are not needed.
Cloud computing; Fault tolerance; Quality of service violation; Replication; Scalability
According to the
National Institute of Standards and Technology (NIST) (Mell & Grance,
2011), cloud computing has emerged as one compelling paradigm for providing
convenient and on-demand network access to a shared pool of configurable
computing resources that can be rapidly provisioned and released with minimal
management effort or service provider interaction. This has made possible the
hosting of cloud services provided by cloud service providers, such as Software
as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a
Service (IaaS). An increasing number of organizations are adapting to cloud
computing platforms for their daily business functions. These organizations are
showing determination in embracing Industry 4.0, as cloud computing helps to
pool and centralize information for making better business decision. The focus
of this paper is therefore on the Quality of Service (QoS) pertaining to SaaS.
Cloud service
providers are mandated to enforce service performance and the quality of their
services, as defined in the Service Level Agreement (SLA). From their perspective, maintaining the
conditions defined in the SLA and maximizing the QoS metrics are important tasks.
We define QoS metrics as CPU load, response time and throughput, as elements of
the performance aspect, for evaluating cloud services (Bardsiri & Hashemi,
2014).
Cloud
monitoring tools measure and collect cloud QoS data, and this information is
used for making decisions on scaling cloud resources horizontally, and also for
providing fault tolerance
The work
by Wong et al. (2019) has proven the feasibility of using horizontal scaling as
a preventive measure and fault tolerance mechanism of replication, as
rectification for QoS violations. In this paper, we propose a scaling and fault
tolerance (SFT) algorithm and evaluate its effectiveness based on CloudSim Plus
(Filho et al., 2017), a toolkit with libraries for the simulation of cloud
computing scenarios. A total of four scenarios were used in the simulation to
evaluate the effectiveness of the proposed SFT algorithm. One scenario provided
a preventive measure for probable cloud QoS violation; another a remedial measure for certain cloud QoS
violation; while the other two monitored consideration of over- or under-
provisioning of virtual machines (VMs). One example for monitoring the
provisioning of VM is by reducing the number of idle VM based on real-world QoS
measurement of cloud services, as discussed by Zheng et al. (2014).
Our
experimental results show that the proposed SFT algorithm performs effectively
(close to a 90%-100% effective rate) in providing preventive or remedial measures and
reducing the number of VMs when they are not needed, consequently guaranteeing
QoS performance, as defined in the cloud services SLA. Additionally, the 16
decision rules determine QoS violation at four levels, namely no violation,
normal, probable violation and certain violation; unlike many other works, that
only define violations as normal or violation, the four levels are able to
detect and predict whether a violation will occur or not before it actually
happens. The SFT algorithm takes appropriate action to prevent the occurrence
of actual violation, or rectifies it if violation has occurred. Together with
the 16 decision rules, it thus contributes towards another aspect of detection,
prediction, prevention and rectification measures with regard to response time
and throughput for cloud QoS violations.
Unlike
the work of Aruna and Aramudhan (2016), which includes cost in its proposed
method of using fuzzy sets to shortlist providers based on the QoS agreed in
the SLA, the SFT algorithm does not include the cost factor in resolving QoS
violations. This will be left for future work.
Scalability
is defined as the handling of increasing workloads by allocating more resources
to the system (Lehrig et al., 2015). There are two general scalability
approaches, namely horizontal and vertical scaling. Horizontal scaling involves
adding or removing VMs to spread the load across multiple distributed VMs,
while vertical scaling involves increasing and decreasing the power of an
existing VM by means of more memory (RAM), storage (HDD/SSD), or processors
(CPUs). In this paper, the focus is on application scalability, which is
defined as the maintenance of cloud services application performance goals by
avoiding QoS violation events when the workload submitted by users increases
(Kuperberg et al., 2011).
Fault
tolerance, as defined by Ganesh et al. (2014), is the ability of the cloud
environment to handle unanticipated changes, such as hardware failure, software
defects or network congestion. Two standard policies, namely proactive and
reactive fault tolerance, can be used for real-time cloud applications.
Proactive fault tolerance can predict faults, errors and failures, and once a
suspicious component has been detected, it will be replaced proactively.
Proactive fault tolerance techniques include pre-emptive migration, software
rejuvenation and self-healing.
Reactive
fault tolerance reduces the effect of failure on applications being executed
when the failure effectively occurs. Examples of reactive fault tolerance
techniques are check pointing or restart, replication, job migration and task
resubmission. In this paper, the focus is on implementing a reactive
fault-tolerance policy on computation failure, which involves hardware or
infrastructure failure.
In this
paper, we have presented the design and implementation of a system that can
perform VM-scaling, replication and task retry. We developed a scaling and
fault tolerance (SFT) algorithm to deploy preventive measures or take remedial
action based on QoS decision outcomes with regard to response time and
throughput. Experiments based on four scenarios to measure the effectiveness of
the algorithm in handling events such as faulty VMs and over- and
under-provisioning were conducted. Our experimental results show that the
algorithm was effective 90% to 100% of the time when handling probable
violation events using a scaling technique as the preventive measure; when
taking remedial action using replication and task re-submission as fault
tolerance techniques; and in resolving over-provisioning. The SFT algorithm,
together with the 16 decision rules, thus contributes to an additional aspect
of detection, prediction, prevention and rectification measures of response
time and throughput for cloud QoS violations.
Filename | Description |
---|---|
R3-EECE-3252-20190913171809.docx | list of tables & figures |
Aruna, L., Aramudhan, M., 2016.
Framework for Ranking Service Providers of Federated Cloud Architecture using
Fuzzy Sets. International Journal of Technology,
Volume 7(4), pp. 643–653
AWS (Amazon Web Services), 2018.
Amazon EC2 T2 Instances –Amazon Web Services, Inc. Available Online at
https://aws.amazon.com/ec2/ instance-types/t2/, Accessed on July 20, 2018
Bardsiri, A.K., Hashemi, S.M.,
2014. QoS Metrics for Cloud Computing Services Evaluation. International Journal of Intelligent Systems and Applications, Volume 6(12), pp. 27–33
Calheiros, R.N., Ranjan, R.,
Beloglazov, A., De Rose, C.A.F., Buyya, R., 2010. CloudSim: A Toolkit for Modeling and
Simulation of Cloud Computing Environments and Evaluation of Resource
Provisioning Algorithms. Software
Practice and Experience, Volume 41(1), pp. 23–50
Filho, M.C.S., Oliveira, R.L.,
Monteiro, C.C., Inacio, P.R.M., Freire, M.M., 2017. CloudSim Plus: A Cloud
Computing Simulation Framework Pursuing Software Engineering Principles for
Improved Modularity, Extensibility and Correctness. In: Proceedings of IFIP/IEEE International Symposium on Integrated
Network Management, 8-12 May, 2017, pp.400–406
Ganesh, A., Sandhya, M., Shankar,
S., 2014. A Study on Fault Tolerance Methods in Cloud Computing. In: Proceedings of IEEE International
Advance Computing Conference (IACC), pp.844–849
Kuperberg, M., Herbst, N.,
Kistowski, J.V., Reussner, R., 2011. Defining and Quantifying Elasticity of
Resources in Cloud Computing and Scalable Platforms. In: Karlsruhe Reports in Informatics, Volume 16, pp. 1–17
Lehrig, S., Eikerling, H. Becker,
S., 2015. Scalability, Elasticity, and Efficiency in Cloud Computing: A
Systematic Literature Review of Definitions and Metrics. In: Proceedings of the 11th International ACM SIGSOFT
Conference on Quality of Software Architectures - QoSA '15, pp. 83–92
Mell, P., Grance, T., 2011. The NIST Definition of Cloud Computing.
NIST Special Publication 800-145, pp. 1–3
Wong, T.S., Chan, G.Y., Chua,
F.F., 2018. A Machine Learning Model for Detection and Prediction of Cloud
Quality of Service Violation. In:
International Conference on Computational Science and Its Applications (ICCSA),
LNCS, pp. 498–513
Wong, T.S., Chan, G.Y., Chua, F.
F., 2019. Adaptive Preventive and Remedial Measures in Resolving Cloud Quality
of Service Violation. In: Proceedings
of IEEE 33rd International Conference on Information Networking
(ICOIN 2019), 9-11 January 2019, Kuala Lumpur, Malaysia, pp. 473–479
Zheng, Z., Zhang, Y., Lyu, M.R.,
2014. Investigating QoS of Real-World Web Services. In: IEEE Transactions on Services Computing, Volume 7(1), pp.32–39