• International Journal of Technology (IJTech)
  • Vol 10, No 8 (2019)

Profiling Academic Library Patrons using K-means and X-means Clustering

Profiling Academic Library Patrons using K-means and X-means Clustering

Title: Profiling Academic Library Patrons using K-means and X-means Clustering
Aisyah Larasati, Apif Miftahul Hajji, Anik Nur Handayani, Nabila Azzahra, Muhammad Farhan, Puji Rahmawati

Corresponding email:

Cite this article as:
Larasati, A., Hajji, A.M., Handayani, A.N., Azzahra, N., Farhan, M., Rahmawati, P., 2019. Profiling Academic Library Patrons using K-means and X-means Clustering. International Journal of Technology. Volume 10(8), pp. 1567-1575

Aisyah Larasati 1. Department of Industrial Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia 2. PUI-PT Disruptive Learning Innovation (DLI) Universitas Negeri Malang, Jl. Semarang N
Apif Miftahul Hajji 1. Department of Civil Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia 2. PUI-PT Disruptive Learning Innovation (DLI) Universitas Negeri Malang, Jl. Semarang No.5,
Anik Nur Handayani Department of Electrical Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia
Nabila Azzahra Department of Industrial Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia
Muhammad Farhan Department of Industrial Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia
Puji Rahmawati Department of Industrial Engineering, Universitas Negeri Malang, Jl. Semarang No.5, Malang 65145, Indonesia
Email to Corresponding Author

Profiling Academic Library Patrons using K-means and X-means Clustering

Information technology is now used very often, especially by individuals born between 1982 and 2002 (the Millennial generation). The academic library, which from its beginnings has been a storehouse for information through collections, is becoming less attractive for Millennials because of the influence of information technology. This study aimed to use k-means and x-means clustering algorithms to identify the characteristics of academic library patrons, particularly Millennial patrons. K-means is a well-known algorithm due to its simplicity, while x-means is a relatively new algorithm for performing clustering and provides the capability to determine an optimal number of clusters, the number of cluster that minimizes differences within each cluster and maximizes differences between clusters. In this study, data were collected using questionnaires, both in online and offline forms. A total of 935 responses were collected. The results show that k-means performs better than x-means since it results in a lower Davies-Bouldin index value. However, x-means provides better descriptions of the patrons’ behavior on each cluster. Both k-means and x-means clustering methods create five clusters based on the behavior of academic library patrons. One of the clusters resulting from k-means and x-means also confirms that not all patrons come to the academic library for the book collection; they come because of invitations from friends or to use internet services.

Academic library; Clustering; K-means; X-means


The Millennial generation (also known as Generation Y) is the group of individuals born between 1982 and 2002 (Kotz, 2016). Members of this generation have unique characteristics. For example, many Millennials do not wear watches because their cell phones display the time. Furthermore, rather than using physical photograph albums, Millennials store their photographs on Facebook, Instagram, and other social media platforms. Millennials enjoy using technology. Indeed, they are the first generation to have become dependent on technology (Smith & Nichols, 2015).  They live in an age when they can instantly access  whatever information they want,  for example, academic data or information through their smartphones (Maiers, 2017) as part of their use of existing technology. According to Maiers (2017), social networks have become one of the strongest motivators for Millennials, rather than interactions in the real world.

In this age, information technology is one of the most frequently used tools in life, including for academic libraries because most have chosen to integrate technology into their information content (Walton, 2014). Many academic library patrons surveyed for the study reported that obtaining information from the internet is easier than having to search in a library. It is also easier because not everything patrons need is available in libraries. Moreover, technology is one of the most commonly used communication tools, and the majority of users who use it to communicate are Millennials (Maiers, 2017).

Learning methods must continually adapt to engage and educate this generation (Nicholas, 2008). Millennials tend to have a different learning method than previous generations. Millennials prefer to have skills and creativity in arts, games, video lectures, field trips, and other activities that do not depend only on books and theories. They tend to work beyond required working hours and have less social time (DeVaney, 2015). Moreover, Millennials are fluent in the uses of technology or perhaps even dependent on it (Nicholas, 2008).

To keep pace with and adjust to evolutions in Millennials’ learning methods, academic libraries must be able to convert some of their traditional services into digital services. Academic libraries are transitioning from a collections-based model to a broader services-based model (Gleason, 2018). Library services, most of which are printed books, must be converted into digital services augmented by free internet services and other offerings to accommodate Millennials’ needs. Fulfilling users’ needs may increase customer satisfaction and affect an institution’s success. User satisfaction may be achieved by identifying service quality attributes and their effects on user satisfaction (Zuna et al., 2016).

As mentioned above, Millennials often look for academic references on the internet rather than physically searching in a library. In one study, 79.5% of college students reported that they are experts at using the internet to search for information efficiently and effectively, but only 56.4% said that they are skilled in using the college library (Lippincott, 2012). Thus, academic libraries must understand the characteristics of Millennials in order to create an environment that is attractive to them; for example, in providing books Millennials need and promoting such services, libraries can attract the attention of patrons to persuade them to continue using books (Lippincott, 2012). Each customer may have a different perspective on the attributes that affect his preferences since customer preferences can be influenced by the completeness of the product/service attributes and the transaction process (Suzianti et al., 2015).

At the present time, academic libraries are providing number of services that Millennials would find attractive and may not able to find in online sources such as data management, information about digital scholarship, copyright management, citation management, open educational resources, and others (Dempsey & Malpas, 2018).

Millennials students exhibit a number of common characteristics: They are more focused on achievement, they prefer to question everything and use all means available to get information, and they use technology not only to find information on the internet but also for typing notes in class (Freeman et al., 2014). Currently, data integration and analysis are still rarely used to support decision-making, although many academic libraries have applied technology to obtain various reader information. Tremendous amounts of collected data remain to be analyzed in a simple analysis such as correlation (Wang et al., 2011). Thus, in the present study, in order to obtain information about the characteristics of students who use the library often, the most suitable method was clustering because it can be used to identify unique distributions or patterns in data and discover groups of data (Halkidi et al., 2001).

Table 1 Changes in the function of the library


Collection-Based Library

Services-Based Library


Explained as library collection, reference

Explained as users’ needs, such as lecturer and student research


The system used is a bureaucracy that prioritizes the production of the facilities offered by the library.

The system used is enterprising, which is focused on changing its goals.


Process and subject

Focused on learning, research, skills, etc.


Back office

Shared system with workflow systems (scholarship information and e-books)


Focused on collection of books

Focused on service or user experience


Based on consumption from users

The facilities already exist and between one facility and another are collective.

Source: (Dempsey & Malpas, 2018)

Clustering methods are unsupervised classification methods aimed at facilitating the discovery process by combining a set of objects to create a collection of data subjects that have homogenous groups (Bader et al., 2006; Padmaja et al., 2008). The cluster members in one group have maximum similarities but minimum similarities with other cluster group members. Clustering is different from classification. Clustering is the segmenting of data into a group, while classification segments some data by assigning it into groups (Chen & Chen, 2006). The quality of clustering data depends on how high the intra-class similarities are and how low the inter-class similarities are. A common measure of cluster accuracy is the Euclidean distance. Computational time may also be used as a measure of cluster performance (Aparna & Mydhili, 2016).

By using a data mining method such as clustering, it is possible to discover different behaviors of patrons and possibly use those behaviors to determine whether a library’s service and collection match Millennials’ learning methods. Two methods used to conduct the data integration are k-means and x-means clustering. K-means clustering is a data mining algorithm that divides n objects into k clusters so that the members of one cluster have high similar characteristics while the members of different clusters are dissimilar (Ahmar et al., 2018). X-means clustering is an extension of k-means clustering that refines the clustering by continuously splitting the cluster until the selection criterion is reached.

The aim of the present study was to profile the behavior of academic library patrons, particularly patrons who are categorized as members of the Millennial generation, by comparing the clusters resulting from the k-means and x-means clustering methods.


Based on the Davies-Bouldin index parameter, the k-means produced a value of 4.831 and the x-means produced a value of 4.882. Thus, this study demonstrates that k-means performs better at clustering academic library patrons’ behavior than the x-means since the value of the Davies-Bouldin index is smaller than that of the x-means. However, although the x-means has a higher Davies-Bouldin index value, it is better able to provide detailed information about the characteristics of the respondents in each cluster.

This study has a limitation in the number of iterations used to compare the k-means and x-means clustering. Further research is needed to deepen the analysis of the research findings


The authors would like to acknowledge Universitas Negeri Malang (UM) and PUI-PT Disruptive Learning Innovation (DLI) Universitas Negeri Malang for their funding of this research through an Islamic Development Bank (IsDB)-UM Research Grant No. 26.3.34/UN32.14.1/LT/2019.


Ahmar, A.S., Napitupulu, D., Rahim, R., Hidayat, R., Sonatha, Y., Azmi, M., 2018. Using K-Means Clustering to Cluster Provinces in Indonesia. Journal of Physics: Conference Series, Volume 1028, pp. 1–6

Aparna, K., Mydhili, K.N., 2016. Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm. International Journal of Technology, Volume 7(4), pp. 691–700

Bader, S., Urfer, W., Baumbach, J.I., 2006. Reduction of Ion Mobility Spectrometry Data by Clustering Characteristic Peak Structures. Journal of Chemometrics: A Journal of the Chemometrics Society, Volume 20(3–4), pp. 128–135

Chen, A.-P., Chen, C.-C., 2006. A New Efficient Approach for Data Clustering in Electronic Library using Ant Colony Clustering Algorithm. The Electronic Library, Volume 24(4), pp. 548–559

Dempsey, L., Malpas, C., 2018. Academic Library Futures in a Diversified University System. Higher Education in the Era of the Fourth Industrial Revolution. Springer, pp. 65–89

DeVaney, S.A., 2015. Understanding the Millennial Generation. Journal of Financial Service Professionals, Volume 69(6), pp. 11–14

Dubey, A.K., Gupta, U., Jain, S., 2018. Comparative Study of K-means and Fuzzy C-means Algorithms on the Breast Cancer Data. International Journal on Advanced Science, Engineering and Information Technology, Volume 8(1), pp. 18–29

Freeman, S., Eddy, S.L., McDonough, M., Smith, M.K., Okoroafor, N., Jordt, H., Wenderoth, M. P., 2014. Active Learning Increases Student Performance in Science, Engineering, and Mathematics. Proceedings of the National Academy of Sciences, Volume 111(23), pp. 8410–8415

Gleason, N.W., 2018. Higher Education in the Era of the Fourth Industrial Revolution. Springer

Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On Clustering Validation Techniques. Journal of Intelligent Information Systems, Volume 17(2–3), pp. 107–145

Kotz, P.E., 2016. Reaching the Millennial Generation in the Classroom. Universal Journal of Educational Research, Volume 4(5), pp. 1163–1166

Kryszczuk, K., Hurley, P., 2010. Estimation of the Number of Clusters using Multiple Clustering Validity Indices. In: International Workshop on Multiple Classifier Systems, Springer, pp. 114–123

Lippincott, J.K., 2012. Information Commons: Meeting Millennials’ Needs. Journal of Library Administration, Volume 52(6–7), pp. 538–548

Maiers, M., 2017. Our Future in the Hands of Millennials. The Journal of the Canadian Chiropractic Association, Volume 61(3), pp. 212–217

Nicholas, A., 2008. Preferred Learning Methods of the Millennial Generation. Faculty and Staff - Articles & Papers. 18. Available Online at https://digitalcommons.salve.edu/fac_staff_pub/18

Padmaja, P., Vikkurty, S., Siddiqui, N.I., Dasari, P., Ambica, B., Rao, V.V., Rudraraju, V.J.P.R., 2008. Characteristic Evaluation of Diabetes Data using Clustering Techniques. IJCSNS, Volume 8(11), pp. 244–251

Pelleg, D., Moore, A.W., 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Proceedings of the 17th International Conference on Machine Learning, Volume 1, pp. 727–734

Smith, T.J., Nichols, T., 2015. Understanding the Millennial Generation. The Journal of Business Diversity, Volume 15(1), pp. 39–47

Suzianti, A., Faradilla, N.D.P., Anjani, S., 2015. Customer Preference Analysis on Fashion Online Shops using the Kano Model and Conjoint Analysis. International Journal of Technology, Volume 6(5), pp. 881–885

Tan, P.-N., Steinbach, M., Kumar, V., 2005. Chapter 8: Cluster Analysis: Basic Concepts and Algorithms. In: Introduction to Data Mining. Available Online at: https://doi.org/10.1016/0022-4405(81)90007-8

Walton, E.W., 2014. Why Undergraduate Students Choose to use E-books. Journal of Librarianship and Information Science, Volume 46(4), pp. 263–270

Wang, R., Tang, Y., Liu, G., Li, Y., 2011. K-means Clustering Algorithm Application in University Libraries. In: IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC’11), IEEE, pp. 419–422

Zuna, H.T., Hadiwardoyo, S.P., Rahadian, H., 2016. Developing a Model of Toll Road Service Quality using an Artificial Neural Network Approach. International Journal of Technology, Volume 7(4), pp. 562–570