Published at : 09 Dec 2021
Volume : IJtech
Vol 12, No 5 (2021)
DOI : https://doi.org/10.14716/ijtech.v12i5.5223
Arian Dhini | Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia |
Muhammad Fauzan | Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia |
Technology advancement has developed a shift
perception towards better service from internet providers, and the power to move
easily to another provider to secure improved quality results in customer churn.
Internet service providers must detect the risk of churn at the earliest opportunity
if they want to retain their customers. This study aimed to predict churn using
recent developments in machine learning approaches, and customer data from one of the biggest
fixed broadband companies in Indonesia was selected as a case study. Ensemble learning is the collaboration of
meta-algorithms to improve model performance, and two such approaches were
performed in this study, namely random forest and extreme gradient boosting (XGBoost).
The results show that the ensemble learning models outperform classical
technique and XGBoost is the best algorithm for predicting customer churn. Customers
are thereby clustered as being at high, medium, or low risk of churn, and the
company can specify particular retention strategies towards each customer
cluster.
Customer churn prediction; Ensemble learning; Fixed broadband, Random Forest (RF); Extreme gradient boosting (XGBoost)
The internet
has become a necessity in the daily activities of modern life. The number of
fixed broadband subscribers in Indonesia has proliferated in recent years, with
an average annual increase of 2.8% (Statista, 2019),
and the global volume of fixed broadband-based internet usage has itself risen
by a yearly average of 6.8% (World Bank, 2020).
The provision of high-speed internet connections will be an essential step
towards achieving Industry 4.0 in terms of boosting productivity and economic
performance (Sarachuk and Mißler-Behr, 2020).
The emergence of these increasingly demanding internet needs has been
leveraging competition among internet providers, and this high level of
competition has encouraged companies to provide the best possible service (Huang et al., 2009; Khan et al., 2010; Qureshi et al.,
2013; Do et al., 2017).
To
sustain sales performance, an internet service provider must acquire and retain
customers. Based on data in 2020, from one of Indonesia’s largest fixed
broadband companies, the costs incurred through acquisition
exceed those of retention, and a monthly customer
churn rate of 8.2% had affected the company’s growth. Churn is often due to customer
dissatisfaction, with higher prices, low service quality, incomplete features,
and privacy concerns being some of the reasons that motivate customers to
switch providers (Amin et al., 2019).
Different
ways to reduce or manage churn rate have been reported, including investing in
retention activities, targeted marketing, campaign management, and customer
relationship management (CRM) (Chen and Popovich, 2003). Building and
maintaining long-term relationships with customers through CRM can gain both
empathy and loyalty (Jain et al., 2021), and
loyal customers can become great ambassadors in the market and help attract new
business (Amin et al., 2019).
Due to shifting telecommunications behavior and intensified competition
from deregulation, there is an increasing need to secure core business by
strengthening CRM and to improve the profitability of each customer (Wei and Chiu, 2002; Huang et al., 2009; Khan et al.,
2010; Qureshi et al., 2013; Vafeiadis et al., 2015; Do et al., 2017). To do so,
companies need to understand the churn characteristics of their customers by grouping
them according to risk level. Some companies have faced difficulty in executing
churn management because of an inability to predict who will leave using high
volume of customer data (Ahmad et al., 2019).
Machine learning (ML) algorithms have been extensively applied for
decision makers to predict the possibility of customer churn (Ahmad et al., 2019), and the
current study focuses on ensemble learning in this regard. Ensemble learning is
a progressive ML paradigm in which multiple models are applied in processes
known as bagging, boosting, or stacking (Breiman, 2001; Chen and Guestrin, 2016; Jain et al.,
2020). In decision tree, bagging generates
various trees from which a synthesized model will be generalized through
aggregation. Boosting applies a specific algorithm to minimize error in developing its trees (Chen and Guestrin, 2016; Do et al., 2017).
Stacking techniques combine several algorithms and apply them to the
meta-learner in order to result in prediction (Abbasimehr et al., 2014).
Previous studies have demonstrated that ensemble learning is best for
prediction models, including both classification and regression problems and it
has been proven that the approach can increase model performance through
generalization and error adjustment (Jain et al., 2021).
Stacking techniques are particularly complex and time-consuming, and so this
study employs bagging in the form of random forest (RF) and extreme gradient
boosting (XGBoost) trees for their simplicity and robustness in predicting
customer churn (Do et al., 2017). These
techniques can also predict missing values, control overfitting, and generalize
output (Breiman, 2001; Chen and Guestrin, 2016).
Through the early detection of customer churn, the alignment of sales
and marketing strategies can be improved and churn management activities made
more efficient (Mattison, 2006). A
relevant method here is to cluster the
churning customers according into
groups, an approach that has been used in various industries to profile
customers and apply appropriate strategies (Larasati et al., 2019; Ullah et al., 2019). Using a
clustering approach, this study develops profiling to group customers into
several churn clusters so that the company can determine specific retention
strategies for each group. A nonhierarchical clustering technique, K-means, is
used to find the criteria for churn behavior, as in previous studies (Ullah et al., 2019).
Predicting churn using ensemble learning approaches and clustering customers according to risk has identified the most appropriate model for assisting decision makers in designing customer retention activities. As a result, the productivity of CRM in terms of retention could be improved, and both product and service quality could be enhanced which would increase customer commitment toward the company. The optimized XGBoost algorithm produced the best accuracy and recall in predicting customer churn at 98.82% and 87.48%, respectively. This study also provides a framework for assessing churn based on low, medium, and high risk which could help the company focus and prioritize its retention activities by group to ultimately positively impact its profitability.
The main limitation of the study is that it considers competitor availability without categorizing them, for example as strong or weak. Attention to this particular feature is therefore of high importance to future studies. Two key adjustments to the study’s methods have been identified: Firstly, competitor categorization, and secondly, more advanced methods should be applied to obtain an accurate & quicker outcome in predicting customer churn, for example stacking and deep learning (DL). Stacking algorithms would be expected to significantly improve model performance because they combine multiple classifiers, such as logistics regression and random forest combines for generating fewer errors (Abbasimehr et al., 2014). On the other hand, DL uses multiple layers to progressively extract higher-level features from the raw input by which the machine can learn customer behavioural patterns more thoroughly and effectively (Agrawal et al., 2018). However, this approach would cost much more in terms of computational time which must also be considered for any future studies.
Abbasimehr, H., Setak, M., Tarokh,
M.J., 2014. A Comparative Assessment of the Performance of Ensemble Learning in
Customer Churn Prediction. International Arab Journal of Information
Technology, Volume 11(6), pp. 599–606
Agrawal, S., Das, A., Gaikwad, A.,
Dhage, S., 2018. Customer Churn Prediction Modelling based on Behavioural
Patterns Analysis using Deep Learning. In: 2018 International Conference
on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1–6
Ahmad, A.K., Jafar, A., Aljoumaa, K.,
2019. Customer Churn Prediction in Telecom using Machine Learning in Big Data
Platform. Journal of Big Data, Volume 6(1), pp. 1–24
Amin, A., Al-Obeidat, F., Shah, B.,
Adnan, A., Loo, J., Anwar, S., 2019. Customer Churn Prediction in
Telecommunication Industry using Data Certainty. Journal of Business
Research, Volume 94, pp. 290–301
Bergstra, J., Bengio, Y., 2012.
Random Search for Hyper-Parameter Optimization. Journal of Machine Learning
Research, Volume 13(2), pp. 281–305
Breiman, L., 2001. Random Forests. Kluwer
Academic Publishers, Volume 45, pp. 5–35
Chawla, N.V., Bowyer, K.W., Hall,
L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic Minority Over-Sampling Technique.
Journal of Artificial Intelligence Research, Volume 16, pp. 321–357
Chen, I.J., Popovich, K., 2003.
Understanding Customer Relationship Management (CRM). Business Process
Management Journal, Volume 9(5), pp. 672–688
Chen, T., Guestrin, C., 2016. XGBoost:
A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794
Do, D., Huynh, P., Vo, P., Vu, T., 2017.
Customer Churn Prediction in an Internet Service Provider. In: 2017 IEEE
International Conference on Big Data (Big Data), pp. 3928–3933
Fawcett, T., 2006. An Introduction to
ROC Analysis. Pattern Recognition Letters, Volume 27(8), pp. 861–874
Huang, B., Kechadi, M.T., Buckley, B.,
2012. Customer Churn Prediction In Telecommunications. Expert Systems with
Applications, Volume 39(1), pp. 1414–1425
Huang, B.Q., Kechadi, M.T., Buckley,
B., 2009. Customer Churn Prediction for Broadband Internet Services. In: International
Conference on Data Warehousing and Knowledge Discovery, pp. 229–243
Jain, H., Khunteta, A., Srivastava,
S., 2020. Churn Prediction in Telecommunication using Logistic Regression and
Logit Boost. Procedia Computer Science, Volume 167, pp. 101–112
Jain, H., Yadav, G., Manoov, R., 2021.
Churn Prediction and Retention in Banking, Telecom and IT Sectors using Machine
Learning Techniques. In: Advances in Machine Learning and Computational
Intelligence, pp. 137–156
Khan, A.A., Jamwal, S., Sepehri, M.M.,
2010. Applying Data Mining to Customer Churn Prediction in an Internet Service
Provider. International Journal of Computer Applications, Volume 9(7), pp.
8–14
Larasati, A., Hajji, A.M., Handayani,
A.N., Azzahra, N., Farhan, M., Rahmawati, P., 2019. Profiling Academic Library
Patrons using K-means and X-means Clustering. International Journal of
Technology, Volume 10(8), pp. 1567–1575
Mattison, R., 2006. The Telco
Churn Management Handbook. Lulu.com
Qureshi, S.A., Rehman, A.S., Qamar,
A.M., Kamal, A., Rehman, A., 2013. Telecommunication Subscribers’ Churn
Prediction Model using Machine Learning. In: Eighth International
Conference on Digital Information Management (ICDIM 2013), pp. 131–136
Sarachuk, K., Mißler-Behr, M., 2020.
Is Ultra-Broadband Enough? The Relationship between High-Speed Internet and
Entrepreneurship in Brandenburg. International Journal of Technology, Volume
11(6), pp. 1103–1114
Sisodia, D.S., Verma, U., 2019.
Distinct Multiple Learner-Based Ensemble Smotebagging (ML-ESB) Method for
Classification of Binary Class Imbalance Problems. International Journal of
Technology, Volume 10(4), pp. 721–730
Soldani, D., Hou, X.J., Luck, B., 2011.
Strategies for Mobile Broadband Growth: Traffic Segmentation for Better
Customer Experience. In: 2011 IEEE 73rd Vehicular Technology
Conference (VTC Spring), pp. 1–5
Springer, T., Kim, C., Debruyne, F.,
Azzarello, D., Melton, J., 2014. Breaking the Back of Customer Churn. Bain
& Company, pp. 1–8
Statista, 2019. Telkom Indonesia:
Fixed Broadband Market Share 2019 | Statista. Available Online at https://www.statista.com/statistics/1058240/telkom-indonesia-fixed-broadband-market-share
Ullah, I., Raza, B., Malik, A.K.,
Imran, M., Islam, S.U., Kim, S.W., 2019. A Churn Prediction Model using Random
Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor
Identification in Telecom Sector. IEEE Access, Volume 7, pp. 60134–60149
Vafeiadis, T., Diamantaras, K.I.,
Sarigiannidis, G., Chatzisavvas, K.C., 2015. A Comparison of Machine Learning
Techniques for Customer Churn Prediction. Simulation Modelling Practice and
Theory, Volume 55, pp. 1–9
Wei, C.-P., Chiu, I.-T., 2002. Turning Telecommunications Call Details to Churn
Prediction: A Data Mining Approach. Expert Systems with Applications,
Volume 23(2), pp. 103–112
World Bank, 2020. World
Development Indicators | DataBank. Available Online at https://databank.worldbank.org/reports.aspx?source=2&series=IT.NET.BBND&country=IDN#
Yang, M., 2013. Churn Management and
Policy: Measuring the Effectiveness of Fixed-Mobile Bundling on Mobile
Subscriber Retention. Journal of Media Economics, Volume 26(4), pp. 170–185
Zhu, B., Baesens, B., vanden Broucke, S.K.L.M., 2017. An Empirical
Comparison of Techniques for the Class Imbalance Problem in Churn Prediction. Information
Sciences, Volume 408, pp. 84–99