Published at : 09 Dec 2021
Volume : IJtech Vol 12, No 5 (2021)
DOI : https://doi.org/10.14716/ijtech.v12i5.5223
|Arian Dhini||Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia|
|Muhammad Fauzan||Department of Industrial Engineering, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia|
Technology advancement has developed a shift
perception towards better service from internet providers, and the power to move
easily to another provider to secure improved quality results in customer churn.
Internet service providers must detect the risk of churn at the earliest opportunity
if they want to retain their customers. This study aimed to predict churn using
recent developments in machine learning approaches, and customer data from one of the biggest
fixed broadband companies in Indonesia was selected as a case study. Ensemble learning is the collaboration of
meta-algorithms to improve model performance, and two such approaches were
performed in this study, namely random forest and extreme gradient boosting (XGBoost).
The results show that the ensemble learning models outperform classical
technique and XGBoost is the best algorithm for predicting customer churn. Customers
are thereby clustered as being at high, medium, or low risk of churn, and the
company can specify particular retention strategies towards each customer
Customer churn prediction; Ensemble learning; Fixed broadband, Random Forest (RF); Extreme gradient boosting (XGBoost)
The internet has become a necessity in the daily activities of modern life. The number of fixed broadband subscribers in Indonesia has proliferated in recent years, with an average annual increase of 2.8% (Statista, 2019), and the global volume of fixed broadband-based internet usage has itself risen by a yearly average of 6.8% (World Bank, 2020). The provision of high-speed internet connections will be an essential step towards achieving Industry 4.0 in terms of boosting productivity and economic performance (Sarachuk and Mißler-Behr, 2020). The emergence of these increasingly demanding internet needs has been leveraging competition among internet providers, and this high level of competition has encouraged companies to provide the best possible service (Huang et al., 2009; Khan et al., 2010; Qureshi et al., 2013; Do et al., 2017).
sustain sales performance, an internet service provider must acquire and retain
customers. Based on data in 2020, from one of Indonesia’s largest fixed
broadband companies, the costs incurred through acquisition
exceed those of retention, and a monthly customer
churn rate of 8.2% had affected the company’s growth. Churn is often due to customer
dissatisfaction, with higher prices, low service quality, incomplete features,
and privacy concerns being some of the reasons that motivate customers to
switch providers (Amin et al., 2019).
Different ways to reduce or manage churn rate have been reported, including investing in retention activities, targeted marketing, campaign management, and customer relationship management (CRM) (Chen and Popovich, 2003). Building and maintaining long-term relationships with customers through CRM can gain both empathy and loyalty (Jain et al., 2021), and loyal customers can become great ambassadors in the market and help attract new business (Amin et al., 2019).
Due to shifting telecommunications behavior and intensified competition from deregulation, there is an increasing need to secure core business by strengthening CRM and to improve the profitability of each customer (Wei and Chiu, 2002; Huang et al., 2009; Khan et al., 2010; Qureshi et al., 2013; Vafeiadis et al., 2015; Do et al., 2017). To do so, companies need to understand the churn characteristics of their customers by grouping them according to risk level. Some companies have faced difficulty in executing churn management because of an inability to predict who will leave using high volume of customer data (Ahmad et al., 2019).
Machine learning (ML) algorithms have been extensively applied for decision makers to predict the possibility of customer churn (Ahmad et al., 2019), and the current study focuses on ensemble learning in this regard. Ensemble learning is a progressive ML paradigm in which multiple models are applied in processes known as bagging, boosting, or stacking (Breiman, 2001; Chen and Guestrin, 2016; Jain et al., 2020). In decision tree, bagging generates various trees from which a synthesized model will be generalized through aggregation. Boosting applies a specific algorithm to minimize error in developing its trees (Chen and Guestrin, 2016; Do et al., 2017). Stacking techniques combine several algorithms and apply them to the meta-learner in order to result in prediction (Abbasimehr et al., 2014).
Previous studies have demonstrated that ensemble learning is best for prediction models, including both classification and regression problems and it has been proven that the approach can increase model performance through generalization and error adjustment (Jain et al., 2021). Stacking techniques are particularly complex and time-consuming, and so this study employs bagging in the form of random forest (RF) and extreme gradient boosting (XGBoost) trees for their simplicity and robustness in predicting customer churn (Do et al., 2017). These techniques can also predict missing values, control overfitting, and generalize output (Breiman, 2001; Chen and Guestrin, 2016).
Through the early detection of customer churn, the alignment of sales and marketing strategies can be improved and churn management activities made more efficient (Mattison, 2006). A relevant method here is to cluster the churning customers according into groups, an approach that has been used in various industries to profile customers and apply appropriate strategies (Larasati et al., 2019; Ullah et al., 2019). Using a clustering approach, this study develops profiling to group customers into several churn clusters so that the company can determine specific retention strategies for each group. A nonhierarchical clustering technique, K-means, is used to find the criteria for churn behavior, as in previous studies (Ullah et al., 2019).
Predicting churn using ensemble learning approaches and clustering customers according to risk has identified the most appropriate model for assisting decision makers in designing customer retention activities. As a result, the productivity of CRM in terms of retention could be improved, and both product and service quality could be enhanced which would increase customer commitment toward the company. The optimized XGBoost algorithm produced the best accuracy and recall in predicting customer churn at 98.82% and 87.48%, respectively. This study also provides a framework for assessing churn based on low, medium, and high risk which could help the company focus and prioritize its retention activities by group to ultimately positively impact its profitability.
The main limitation of the study is that it considers competitor availability without categorizing them, for example as strong or weak. Attention to this particular feature is therefore of high importance to future studies. Two key adjustments to the study’s methods have been identified: Firstly, competitor categorization, and secondly, more advanced methods should be applied to obtain an accurate & quicker outcome in predicting customer churn, for example stacking and deep learning (DL). Stacking algorithms would be expected to significantly improve model performance because they combine multiple classifiers, such as logistics regression and random forest combines for generating fewer errors (Abbasimehr et al., 2014). On the other hand, DL uses multiple layers to progressively extract higher-level features from the raw input by which the machine can learn customer behavioural patterns more thoroughly and effectively (Agrawal et al., 2018). However, this approach would cost much more in terms of computational time which must also be considered for any future studies.
Abbasimehr, H., Setak, M., Tarokh, M.J., 2014. A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction. International Arab Journal of Information Technology, Volume 11(6), pp. 599–606
Agrawal, S., Das, A., Gaikwad, A., Dhage, S., 2018. Customer Churn Prediction Modelling based on Behavioural Patterns Analysis using Deep Learning. In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1–6
Ahmad, A.K., Jafar, A., Aljoumaa, K., 2019. Customer Churn Prediction in Telecom using Machine Learning in Big Data Platform. Journal of Big Data, Volume 6(1), pp. 1–24
Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., Anwar, S., 2019. Customer Churn Prediction in Telecommunication Industry using Data Certainty. Journal of Business Research, Volume 94, pp. 290–301
Bergstra, J., Bengio, Y., 2012. Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, Volume 13(2), pp. 281–305
Breiman, L., 2001. Random Forests. Kluwer Academic Publishers, Volume 45, pp. 5–35
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, Volume 16, pp. 321–357
Chen, I.J., Popovich, K., 2003. Understanding Customer Relationship Management (CRM). Business Process Management Journal, Volume 9(5), pp. 672–688
Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794
Do, D., Huynh, P., Vo, P., Vu, T., 2017. Customer Churn Prediction in an Internet Service Provider. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 3928–3933
Fawcett, T., 2006. An Introduction to ROC Analysis. Pattern Recognition Letters, Volume 27(8), pp. 861–874
Huang, B., Kechadi, M.T., Buckley, B., 2012. Customer Churn Prediction In Telecommunications. Expert Systems with Applications, Volume 39(1), pp. 1414–1425
Huang, B.Q., Kechadi, M.T., Buckley, B., 2009. Customer Churn Prediction for Broadband Internet Services. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 229–243
Jain, H., Khunteta, A., Srivastava, S., 2020. Churn Prediction in Telecommunication using Logistic Regression and Logit Boost. Procedia Computer Science, Volume 167, pp. 101–112
Jain, H., Yadav, G., Manoov, R., 2021. Churn Prediction and Retention in Banking, Telecom and IT Sectors using Machine Learning Techniques. In: Advances in Machine Learning and Computational Intelligence, pp. 137–156
Khan, A.A., Jamwal, S., Sepehri, M.M., 2010. Applying Data Mining to Customer Churn Prediction in an Internet Service Provider. International Journal of Computer Applications, Volume 9(7), pp. 8–14
Larasati, A., Hajji, A.M., Handayani, A.N., Azzahra, N., Farhan, M., Rahmawati, P., 2019. Profiling Academic Library Patrons using K-means and X-means Clustering. International Journal of Technology, Volume 10(8), pp. 1567–1575
Mattison, R., 2006. The Telco Churn Management Handbook. Lulu.com
Qureshi, S.A., Rehman, A.S., Qamar, A.M., Kamal, A., Rehman, A., 2013. Telecommunication Subscribers’ Churn Prediction Model using Machine Learning. In: Eighth International Conference on Digital Information Management (ICDIM 2013), pp. 131–136
Sarachuk, K., Mißler-Behr, M., 2020. Is Ultra-Broadband Enough? The Relationship between High-Speed Internet and Entrepreneurship in Brandenburg. International Journal of Technology, Volume 11(6), pp. 1103–1114
Sisodia, D.S., Verma, U., 2019. Distinct Multiple Learner-Based Ensemble Smotebagging (ML-ESB) Method for Classification of Binary Class Imbalance Problems. International Journal of Technology, Volume 10(4), pp. 721–730
Soldani, D., Hou, X.J., Luck, B., 2011. Strategies for Mobile Broadband Growth: Traffic Segmentation for Better Customer Experience. In: 2011 IEEE 73rd Vehicular Technology Conference (VTC Spring), pp. 1–5
Springer, T., Kim, C., Debruyne, F., Azzarello, D., Melton, J., 2014. Breaking the Back of Customer Churn. Bain & Company, pp. 1–8
Statista, 2019. Telkom Indonesia: Fixed Broadband Market Share 2019 | Statista. Available Online at https://www.statista.com/statistics/1058240/telkom-indonesia-fixed-broadband-market-share
Ullah, I., Raza, B., Malik, A.K., Imran, M., Islam, S.U., Kim, S.W., 2019. A Churn Prediction Model using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector. IEEE Access, Volume 7, pp. 60134–60149
Vafeiadis, T., Diamantaras, K.I., Sarigiannidis, G., Chatzisavvas, K.C., 2015. A Comparison of Machine Learning Techniques for Customer Churn Prediction. Simulation Modelling Practice and Theory, Volume 55, pp. 1–9
Wei, C.-P., Chiu, I.-T., 2002. Turning Telecommunications Call Details to Churn Prediction: A Data Mining Approach. Expert Systems with Applications, Volume 23(2), pp. 103–112
World Bank, 2020. World Development Indicators | DataBank. Available Online at https://databank.worldbank.org/reports.aspx?source=2&series=IT.NET.BBND&country=IDN#
Yang, M., 2013. Churn Management and Policy: Measuring the Effectiveness of Fixed-Mobile Bundling on Mobile Subscriber Retention. Journal of Media Economics, Volume 26(4), pp. 170–185
Zhu, B., Baesens, B., vanden Broucke, S.K.L.M., 2017. An Empirical
Comparison of Techniques for the Class Imbalance Problem in Churn Prediction. Information
Sciences, Volume 408, pp. 84–99