Published at : 03 Nov 2022
Volume : IJtech
Vol 13, No 6 (2022)
DOI : https://doi.org/10.14716/ijtech.v13i6.5882
Choo-Yee Ting | Faculty of Computing and Informatics, Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
Helmi Zakariah | AIME Healthcare Sdn Bhd, Mid Valley City, 59200 Kuala Lumpur, Malaysia |
Yasmin Zulaikha Mohd Yusri | Faculty of Computing and Informatics, Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
COVID-19 started impacting Malaysia in early 2020, and the cases have reached 4.4 million as of April 27, 2022, with 35507 deaths. Since then, federal and state governments have set up COVID-19 Assessment Centres (CACs) to monitor, manage and assess the risk of COVID-19-positive patients. However, a large number of patients within a day has caused the CACs to experience a shortage in medical officers and subsequently resort to overwhelming administrative work. A misassignment of a patient to either home quarantine or COVID-19 Quarantine and Treatment Center or immediate hospital admission (PKRC) could potentially increase the Brought-In-Dead (BID) cases. Therefore, this study aimed to overcome the challenges by achieving the following two main objectives: (i) to identify the optimal feature sets for adult and child patients when they require hospital admission, (ii) to construct predictive models that perform preliminary assessment of a patient, which a medical officer later confirms. In this study, the predictive models developed were Naive Bayes, Random Forest, K-Nearest Neighbors, Logistic Regression and Decision Tree. The datasets were obtained from one of the CACs in Malaysia and were imbalanced in nature. The empirical findings showed that Logistic Regression outperformed the rest with a slight difference. The findings suggested that while there are shared symptoms among adult and child patients, such as runny nose and cough, the child patients exhibited extra symptoms such as vomiting, lung disease, and persistent fever.
Covid-19 assessment centre; Home quarantine; Hospitalization; Machine learning
A
pneumonia pandemic was discovered in Wuhan, China, by the Chinese health
authorities in late December 2019 (Ciotti et al.,
2019; Berawi et al., 2020) and was later recognized as “COVID-19” by the
World Health Organisation (WHO). The impact of COVID-19 pandemic has influenced
various economic sectors of developing countries (Al-Doori
et al., 2021). In Malaysia, the earliest incidence of COVID-19 was
detected on January 25, 2020. The virus was transported by three Chinese
nationals that had close contact with the infected individual in Singapore (Elengoe, 2020). Since the major outbreak via a
religious event named Tabligh at Sri Petaling, Malaysia, the Ministry of Health
(MOH) developed the standard operating procedure for the treatment of COVID-19 (Othman & Babulal, 2020). Thirty-four
government hospitals that had established screening centers for COVID-19
patients in every state in Malaysia (Ravindran,
2020).
Malaysia has gone through several waves of COVID-19.
The first COVID-19 wave happened from January 25 to February 15, 2020, with 22
cases, while the second wave lasted from February 27 to June 30, 2020 (Rampal & Liew, 2021). The third wave occurred
on September 20, 2020 (Rampal & Liew, 2021).
Since then, the demand for more beds in critical care units has increased, not
only for COVID-19 patients, but also for a number of other severe
non-coronavirus cases (Zoey, 2021). In March
23, 2021, Malaysia declared that the third wave of COVID-19 infections is
progressively subsiding since the increase in vaccine distribution (Anand, 2021). Then, the fourth wave arrived as
few as four weeks after the third, in May 2021 (Singh,
2021). The Ministry of Health has mandated that individuals who test
positive for COVID-19 and have mild or no symptoms must undergo home
self-quarantine for ten days (MOH, 2021).
The home quarantine applies to COVID-19 Category 1 and Category 2 patients,
i.e., patients who have tested positive but are experiencing minimal or minor
symptoms, as existing health facilities are unable to cope with the increase in
daily cases (The Straits Times, 2021). With the increase in COVID-19 patients over time,
the government established the COVID-19 assessment center (CAC) as the one-stop
er that serves as a referral center for the identification, monitoring,
managing, and assessment COVID-19 positive patients. It was established to
assess these infected people undergoing home monitoring treatment (MOH, 2021). The treatment plan for COVID-19
patients in Malaysia is divided into five clinical categories of patients (MOH, 2020), namely (i) Category 1. asymptomatic,
(ii) Category 2. symptomatic and no pneumonia, (iii) Category 3. symptomatic
and pneumonia, (iv) Category 4. symptomatic, pneumonia and supplemental oxygen
required, (v) Category 5. critically ill with multiple organ failure .
2. Related Work
3.1. Dataset Description
In this study, two COVID-19 patient
datasets from 22/2/2021 to 21/8/2021 were collected from one of the COVID-19
Assessment Centres in Malaysia (Table 1). Both datasets recorded information on
whether a patient required hospital admission. As shown in Table 2, both
datasets are considered imbalanced. Most of the patients did not require
hospital admission. The first dataset, Dad, consists of basic
profiles and clinical symptoms for adult patients. It has 36 columns with 95666
records; 91834 did not require hospital admission, while 2832 were admitted.
Examples of clinical variables for this dataset are adult active, adult runny
nose, adult cough, adult difficulty breathing, adult fever, and adult loss of
smell.
Table 1 Original
Data Distribution (before SMOTE) for Dad and Dch
Dataset |
Category |
Records |
Hospital Admission
(before SMOTE) |
|
No |
Yes |
|||
Dad |
Adult |
94666 |
91834 |
2832 |
Dch |
Children |
27575 |
26707 |
868 |
Similar
to the first dataset, the second dataset, Dch, captured
children's basic profile and clinical symptoms (i.e., those with ages <18).
The size of this dataset is relatively smaller; it has 26 columns with an
approximation of only 30% of the records in Dad. Examples of
clinical variables are child active, child fever, runny child nose, child
cough, child vomit, child seven days, child lethargic, and child loss appetite.
While there are differences in the variables, there are common variables such
as blood pressure, respiratory rate, height, weight, and BMR.
3.2. Feature Selection
One of the objectives of this study was
to rank the feature importance of hospital admission. To achieve the objective,
this study employed a feature selection algorithm named BORUTA to perform the
feature ranking on both datasets. It was hypothesized that a portion of the
features could be different, with some overlapping between the datasets. In
this study, only features with scores greater than 50% will be identified for
predictive model construction.
3.3. Model Construction
Five machine learning models were
developed using the selected clinical variables as predictors and “require
hospital admission” as the class variable. The datasets were split into 75% for
the training phase while reserving 25% of the data for testing purpose. For
prediction, the F1-Score and ROC curve was used as the main instruments to
compare the performance of Naive Bayes (NB), Random Forest (RF), Logistic
Regression (LR), K-Nearest Neighbors (K-NN), and Decision Tree (DT). This
study's accuracy cannot be used as an important metric mainly because of the
highly imbalanced dataset (Table 1). Due to such imbalance characteristics
inherent within the two datasets, the AUC and F1-score were considered when
identifying the suitable model. In this study, none of the datasets was SMOTE-d
because the study aimed to investigate the performance of predictive models
without any manipulation of the datasets.
This section begins
with a discussion on features influencing the decision to admit a patient (i.e.
adult and child patient) to a hospital for further treatment. Subsequently,
predictive models were trained using the datasets to investigate their
predictive powers under imbalanced characteristics. The findings are presented
using AUC, F1-score, and Accuracy to compare the models.
4.1. Features
Contributing to Hospital Admission
Table 2 compares top 10 features
influencing hospital admission for adults and children. As shown in the above
table, runny nose, cough, respiratory rate, pulse rate, and BMI are some
crucial factors in recommending whether an adult or a child requires hospital
admission. In contrast to adult patients, children do not have hypertension or
loss of smell and taste. However, children with lung disease could end up in
hospital admission. In addition, fever (no indication of low, medium, or high) and
persistent fever can also be common indicators of whether the child requires
hospital admission. As shown in the table, BMI is the 3rd highest contributor
indicating that whether a child requires hospital admission can be inferred
from their BMI. Therefore, maintaining a good BMI can be a determinant of
whether hospitalization is required. The overall ranking of the features for
child patients is shown in Figure 1, while Figure 2 shows the ranking of
features for adult patients.
Table 3 compares five different models using three metrics,
namely, AUC,
F1Score, and Accuracy. Figure 3 shows the AUC comparison of the
five models using a line chart. The dataset provided by the CAC is imbalanced;
therefore, AUC and F1-Score should be the determinant to suggest the fitness of
a model. Based on the findings, Logistic Regression outperformed the rest with
AUC=0.65 and F1-Score=0.10, despite having the lowest accuracy (68%). In Table
4, the highest accuracy, as depicted by K-Nearest Neighbour, could be attributed
to its ability to correctly classify cases that did not require hospital
admission. However, K-NN suffered from the lowest F1 score, indicating that it
should not be used to classify hospital admission. The chart shown in Fig. 4
compares the performance of the five classifiers. It is noted that Logistic
Regression (green line) outperformed other classifiers.
Table 2 Top-10
Features for Hospital Admission among Adult and Children
Features (Adult) |
Score |
|
Features (Children) |
Score |
runny nose |
1.00 |
fever |
1.00 |
|
cough |
1.00 |
pulse rate |
1.00 |
|
BMI |
0.97 |
BMI |
0.96 |
|
respiration rate |
0.94 |
cough |
0.91 |
|
pulse rate |
0.91 |
runny nose |
0.87 |
|
loss of smell |
0.88 |
respiration rate |
0.83 |
|
hypertension |
0.85 |
active |
0.78 |
|
active |
0.82 |
persistent fever |
0.74 |
|
loss of taste |
0.79 |
lung disease |
0.70 |
|
diabetes mellitus |
0.79 |
vomit |
0.65 |
(a) (b)
Figure 1 Feature Ranking for Dch Figure 2 Feature Ranking for Dad
Figure
3 ROC
Comparison using Dad Figure 4 ROC Comparison using
Dch
4.2. Model Comparison using Dad and Dch
Table
4 shows the model performance comparison using Dch. Figure 4 shows the AUC
comparison of the five models using a line chart. The same patterns can be
observed for Dch
when the
same evaluation metrics were used. Logistic regression performed highest in
terms of AUC and F1-Score, while scoring the lowest in accuracy. K-Nearest
Neighbour, Random Forest, and Decision Tree scored higher than 90% but suffered
in AUC and F1-Score. This can explain that no obvious patterns can be observed
for those who did not require hospital admission. Another reason could be due to
a lack of data points. This can be seen in Table 1. The ROC again demonstrated
that Logistic Regression performed better than the other models when treated
with Dch.
Table 3 Performance Comparison between Models for Dad
Model |
AUC |
F1 |
Acc |
Naive Bayes |
0.64 |
0.07 |
0.95 |
Random Forest |
0.53 |
0.08 |
0.80 |
K-Nearest Neighbour |
0.51 |
0.00 |
0.97 |
Logistic Regression |
0.65 |
0.10 |
0.68 |
Decision Tree |
0.54 |
0.08 |
0.75 |
Table
4 Model
Comparison using Dch
|
AUC |
F1 |
Acc |
Naive Bayes |
0.63 |
0.06 |
0.80 |
Random Forest |
0.58 |
0.05 |
0.94 |
K-Nearest Neighbour |
0.54 |
0.02 |
0.97 |
Logistic Regression |
0.65 |
0.10 |
0.76 |
Decision Tree |
0.51 |
0.06 |
0.90 |
This study aimed to investigate the (i) features contributing to hospital
admission for both adult and child patients, (ii) the performance of different
classifiers when datasets are imbalanced. Empirical findings via the feature
selection algorithm suggested that runny nose, cough, BMI, respiration rate,
and pulse rate are the top 5 features that contributed to hospital admission.
In contrast, variables that commonly appeared among child patients who required
hospital admission are fever, pulse rate, BMI, cough, and runny nose. Despite
the slight differences, the empirical study observed common features were
respiration and pulse rate. The study also found that lung disease and
persistent fever can indicate whether hospital admission is needed. When the
models were trained and evaluated using imbalanced datasets, all of them
suffered from low F1-Score and AUC. The high accuracy was mainly contributed by
the model that correctly detected patients that did not require hospital
admission. While SMOT-ing the dataset can be a common practice to boost the
performance of the models, we do not see that should be the direction forward
because there is a tendency to alter the natural characteristics of the
datasets. Instead, we propose to use cluster analysis such as DBScan to detect
the possible clusters inherent inside the datasets. In addition, we also deploy
deep learning models to extract the hidden patterns within the datasets. Future
work can also consider investigating whether there is a significant difference
in symptoms by different COVID-19 variants.
Al-Doori,
J.A., Khdour, N., Shaban, E.A., al Qaruty, T.M., 2021. The Impact of
Economic Indicators on Food Supply Chain of Palestine. International
Journal of Technology, Volume 12(2), pp. 371–377
Alotaibi,
A., Shiblee, M., Alshahrani, A., 2021. Prediction of Severity of
COVID-19-Infected Patients Using Machine Learning Techniques. Computers, Volume 10(3), p. 31
An,
C., Lim, H., Kim, D.W., Chang, J.H., Choi, Y.J., Kim, S.W., 2020. Machine
Learning Prediction for Mortality of Patients Diagnosed with COVID-19: A
Nationwide Korean Cohort Study. Scientific
reports, Volume 10(1), pp.
1–11
Anand, R., 2021. Malaysia Recovering Gradually from
Third Wave of Covid-19 Infections. The Straits Times. Available online at
https://www.straitstimes.com /asia/se-asia
/malaysia-in-gradual-recovery-from-third-wave-of-covid-19-infections, Accessed
on October 11, 2022
Assaf,
D., Gutman, Y., Neuman, Y., Segal, G., Amit, S., Gefen-Halevi, S., Shilo,
N., Epstein, A., Mor-Cohen, R., Biber, A., Rahav, G., 2020. Utilization of
Machine-Learning Models to Accurately Predict the Risk for Critical COVID-19. Internal and Emergency Medicine, Volume
15(8), pp. 1435–1443.
Berawi,
M.A., Suwartha, N., Kusrini, E., Yuwono, A.H., Harwahyu, R., Setiawan,
E.A., Yatmo, Y.A., Atmodiwirjo, P., Zagloel, Y.T., Suryanegara,
M., Putra, N., Budiyanto, M.A., Whulanza, Y., 2020. Tackling the
COVID-19 Pandemic: Managing the Cause, Spread, and Impact. International
Journal of Technology, Volume 11(2), pp. 209–214
Casiraghi,
E., Malchiodi, D., Trucco, G., Frasca, M., Cappelletti, L., Fontana, T.,
Esposito, A.A., Avola, E., Jachetti, A., Reese, J., Rizzi, A., Robinson, P.N.,
Valentini, G., 2020. Explainable Machine Learning for Early Assessment of
Covid-19 Risk Prediction in Emergency Departments. IEEE Access, Volume 8,
pp. 196299–196325
Chassagnon,
G., Vakalopoulou, M., Battistella, E., Christodoulidis, S., Hoang-Thi, T.-N.,
Dangeard, S., Deutsch, E., Andre, F., Guillo, E., Halm, N., El Hajj, S., 2021.
AI-Driven Quantification, Staging and Outcome Prediction of COVID-19 Pneumonia.
Medical Image Analysis, Volume 67, p. 101860
Cheng,
F.Y., Joshi, H., Tandon, P., Freeman, R., Reich, D.L., Mazumdar, M.,
Kohli-Seth, R., Levin, M.A., Timsina, P., Kia, A., 2020. Using Machine Learning
to Predict ICU Transfer in Hospitalized COVID-19 Patients. Journal of Clinical Medicine, Volume 9(6), p. 1668
Ciotti,
M., Angeletti, S., Minieri, M., Giovannetti, M., Benvenuto, D., Pascarella, S.,
Sagnelli, C., Bianchi, M., Bernardini, S., Ciccozzi, M., 2019. Covid-19
Outbreak: An Overview. Chemotherapy,
Volume 64(5-6) pp. 215–223
Elengoe,
A., 2020. COVID-19 Outbreak in Malaysia. Osong Public Health Res Perspect, Volume
11(3), pp. 93-100
Heldt,
F.S., Vizcaychipi, M.P., Peacock, S., Cinelli, M., McLachlan, L., Andreotti,
F., Jovanovi?, S., Dürichen, R., Lipunova, N., Fletcher, R.A., Hancock, A.,
2021. Early Risk Assessment for COVID-19 Patients from Emergency Department
Data Using Machine Learning. Scientific Reports, Volume 11(1),
pp.1–13
Izquierdo,
J.L., Ancochea, J., Soriano, J.B., Savana COVID-19 Research Group, 2020. Clinical Characteristics and Prognostic Factors for
Intensive Care Unit Admission of Patients with Covid-19: Retrospective Study
Using Machine Learning and Natural Language Processing. Journal of Medical Internet Research, Volume 22(10), p. e21801
Kim,
H.J., Han, D., Kim, J.H., Kim, D., Ha, B., Seog, W., Lee, Y.K., Lim, D., Hong,
S.O., Park, M.J. Heo, J., 2020. An Easy-To-Use Machine Learning Model to
Predict the Prognosis of Patients with Covid-19: Retrospective Cohort Study. Journal of Medical Internet Research,
Volume 22(11), p. e24225
Ministry
of Health (MOH), 2020. COVID-19 Malaysia Updates. Ministry of Health Malaysia Available
online at
https://covid-19.moh.gov.my/semasa-kkm/112020/prestasi-rawatan-pesakit-covid-19-di-malaysia-27112020,
Accessed on October 11, 2022
Ministry
of Health (MOH), 2021. Pusat Penilaian COVID-19 (COVID-19 Assessment Center –
Cac). Available online at
https://covid-19.moh.gov.my/semasa-kkm/2021/feb
/covid-19-assessment-center-cac, Accessed on October 11, 2022
Othman, N.Z., Babulal, V., 2020. Sri Petaling
Tabligh Gathering Remains MSIA's Largest Covid-19 Cluster. New Straits Times.
Available online at https://www.nst.com.my
/news/nation/2020/04/583127/sri-petaling-tabligh-gathering-remains-msias-large
st-covid-19-cluster, Accessed on October 11, 2022
Rampal,
L., Liew, B., 2021. Malaysia’s Third Covid-19 Wave-A Paradigm Shift Required. The Medical Journal of Malaysia, Volume 76(1), pp. 1–4
Ravindran,
A., 2020. From Zero to 2,000: Inside Malaysia's Pandemic Year. CodeBlue. Available
online at
https://codeblue.galencentre.org/2020/12/28/from-zero-to-2000-inside-malaysias-pandemic-year/,
Accessed on October 11, 2022
Saeed,
A., Habib, R., Zaffar, M., Quraishi, K. S., Altaf, O., Irfan, M., Glowacz, A.,
Tadeusiewicz, R., Huneif, M.A., Abdulwahab, A., Alduraibi, S.K., 2021.
Analyzing the Features Affecting the Performance of Teachers During Covid-19: A
Multilevel Feature Selection. Electronics,
Volume 10(14), p. 1673
Singh,
A., 2021. Comment: Beware the Fifth Wave of Covid-19. Malaysiakini. Available
online at https://www.malaysiakini.com/columns/580430,
Accessed on October 11, 2022
The Straits Times, 2021. Malaysia Orders Covid-19
Patients with Mild or No Symptoms to be Treated at Home. The Straits Times.
Available online at https://str.sg/JzkC, Accessed on October 11, 2022
Xu,
W., Sun, N.N., Gao, H.N., Chen, Z.Y., Yang, Y., Ju, B., Tang, L.L., 2021. Risk
Factors Analysis of COVID-19 Patients with ARDS and Prediction Based on Machine
Learning. Scientific Reports, Volume
11(1), pp. 1–12
Yatmo,
Y.A., Harahap, M.M.Y., Atmodiwirjo, P., 2021. Modular Isolation Units for
Patients with Mild-to-Moderate Conditions in Response to Hospital Surges
Resulting from the COVID-19 Pandemic. International Journal of
Technology, Volume 12(1), pp. 43–53
Zoey,
L., 2021. Malaysia Reports Record 6,075 New Cases Amid Covid-19 Third Wave.
CNA. Available online at https://www.channelnewsasia.com/asia/covid-19-malaysia-record-high-6075-new-cases-third-wave-mco-1379066,
Accessed on October 11, 2022