• International Journal of Technology (IJTech)
  • Vol 14, No 6 (2023)

Pre- and Post-Depressive Detection using Deep Learning and Textual-based Features

Pre- and Post-Depressive Detection using Deep Learning and Textual-based Features

Title: Pre- and Post-Depressive Detection using Deep Learning and Textual-based Features
Wei-Long Tey, Hui-Ngo Goh, Amy Hui-Lan Lim, Cheng-Kar Phang

Corresponding email:


Cite this article as:
Tey, W., Goh, H., Lim, A.H., Phang, C., 2023. Pre- and Post-Depressive Detection using Deep Learning and Textual-based Features. International Journal of Technology. Volume 14(6), pp. 1334-1343

429
Downloads
Wei-Long Tey Faculty of Computing and Informatics, Multimedia University, Cyberjaya, 63100, Malaysia
Hui-Ngo Goh Faculty of Computing and Informatics, Multimedia University, Cyberjaya, 63100, Malaysia
Amy Hui-Lan Lim Faculty of Computing and Informatics, Multimedia University, Cyberjaya, 63100, Malaysia
Cheng-Kar Phang Behavioral Health Centre, Sunway Medical Centre, Bandar Sunway, 47500, Malaysia
Email to Corresponding Author

Abstract
Pre- and Post-Depressive Detection using Deep Learning and Textual-based Features

The rise of mental disorders, specifically depression, has shown an upward trend, especially during the COVID-19 pandemic. Previous studies have suggested that it is feasible to learn the textual and behavioral features of a user to identify depression on social media. This paper highlights three new contributions, which are, firstly, to introduce Patient Health Questionnaire-9 (PHQ-9) as the survey-based method to complement the TWINT API to collect Twitter data. Secondly, it is to propose a Bidirectional Encoder Representations from Transformers (BERT)-based model, along with emoji decoding and PHQ-9-based lexicon features for predicting the likelihood that a user will exhibit depressive symptoms. The results are promising, achieving an F1 score of 0.98 on a baseline dataset and an F1 score of 0.90 on a benchmark dataset, outperforming previous researcher’s work of achieving a F1 score of 0.85 using solely textual features. Thirdly, previous researcher’s work focuses on differentiating between depressed and non-depressed users only, while this paper further separates the users in the depressive class into before (pre-) and after (post-) self-reported diagnosis, which can potentially be used to detect early symptoms of depression. It was found that the top TF-IDF scores of the post-depressive class contain more frequently negatively implied words compared to the pre-depressive class.

Deep learning; Depressive symptoms; PHQ-9; Predictive modelling; Sentiment analysis

Introduction

Researchers have highlighted that the cases of mental health disorders have increased during the COVID-19 pandemic (Fofana et al., 2020; Berawi, 2020). However, the continuous improvement in research on depression has helped to increase mental health literacy and change the public’s perception of the once highly stigmatized illness. People with symptoms related to depression are more willing to seek professional help when needed (Schomerus et al., 2012).

Existing research on depression seeks to understand the relationship between depression and lifestyles such as smoking (Park and Romer, 2007), neurobiological markers in patients with depressive symptoms (Fu et al., 2008), and gender differences among people with depressive symptoms (Piccinelli and Wilkinson, 2000).

The advancement of big data analytics in healthcare has motivated many researchers to gather insights that can contribute to mental health literacy (Murdoch and Detsky, 2013). The study by Lin et al. (2016) shows that there is a positive linear relationship between depression and social media usage. The study suggests that as depression causes lower self-esteem, it encourages one to turn to social media for validation instead. Another interesting perspective is that people might get depression due to the feeling of guilt for spending too much time on social media rather than doing meaningful things, which leads to the linear trend shown in the study. Other related research on identifying predictive markers of depression has been done on various social media platforms due to user-generated content (Surjandari et al., 2019), including Instagram (Reece and Danforth, 2017), Facebook (De-Choudhury et al., 2014) and Twitter (Tsugawa et al., 2015).

In this paper, the terms predicting “depressive symptoms” and “predicting depression” will be used interchangeably. However, one must note that a machine learning model will not give a formal diagnosis, nor can the prediction replace a physical visit to a certified medical professional. This paper covers research motivation, contributions, related studies, the proposed framework, results, discussions, and a conclusion. Statements, declarations, and references can be found at the end of the paper.

 

2.    Research Motivation and Contribution

The research motivations and contributions are described here. Firstly, for the clinical survey-based method, existing work relies on the Center for Epidemiologic Studies Depression Scale (CES-D) (Reece and Danforth, 2017; De-Choudhury et al., 2014; Tsugawa et al., 2015) rather than PHQ-9. Work by Milette et al. (2010) mentions that PHQ-9 produces similar reliability as compared to CES-D, albeit having only half the length of the CES-D. Therefore, in our study, we propose the use of PHQ-9 as the survey-based method to collect data other than the scraped method.

Secondly, we propose the use of a BERT-based model, along with emoji decoding and PHQ-9-based lexicon features, as the analytics method to predict the likelihood that a user exhibits depressive symptoms via Twitter. The lexicon features are symptom-wise terms that are derived from collected PHQ-9 survey feedback during the data collection.

Thirdly, we have further converted the dataset from a two-class dataset to a three-class dataset where the users in the depressive class are further divided into before (pre-) and after (post-) self-reported diagnosis.

3.    Related Studies

3.1   Data Collection Methods

We can collect tweets using scraping tools (Shrestha, 2018; Poldi, 2019), clinical surveys (Guntuku et al., 2017), or just use the publicly available dataset. Saxena (2018) uses a symptom-wise lexicon as a list of keywords for extracting depressive-indicative tweets. However, the likelihood of false positives tends to be high. Another method is to use a phrase for Twitter API, such as “I’m/I was/I am/I’ve been diagnosed with depression” anywhere in their tweets (Shen et al., 2017; Coppersmith et al., 2015a). The way to construct the non-depressed dataset is simply to scrape data that does not contain the depression-indicative term at all in their tweets.

Analysis by Smarr and Keefar (2011) reveals that the PHQ-9, CES-D, Beck’s Depression Inventory-II (BDI-II), Hospital Anxiety and Depression Scale (HADS), and Geriatric Depression Scale (GDS) are good enough for measuring depressive symptoms of an adult. Although it can help to reach a more reliable audience, the number of responses collected is limited, and utilizing crowdsourcing platforms might incur additional costs to compensate participants. The Computational Linguistics and Clinical Psychology (CLPsych) is an example of a publicly available dataset consisting of tweets related to depression that is collected using Twitter API (Coppersmith et al., 2015b).

3.2. Word Embedding Methods

Existing works (Saxena, 2018; Pedersen, 2015; De-Choudhury et al., 2014; Schwartz et al., 2014) use an n-gram approach to understanding patterns between words. Coppersmith et al. (2015a) use a single-word language model, in addition to using a character language model (CLM) in classifying mental illnesses in capturing semantics. It trains the three million tweets using Word2Vec and uses PHQ-9 to determine if a tweet is depression-indicative. Another popular word embedding method would be BERT, which it uses a bi-directional Long Short-Term Memory (LSTM) to generate word embeddings based on the contexts before and after each word. BERT uses masks. In each input, 15% of the words are masked (hidden), and the model is trained to predict the missing word(s).

In the work by De-Choudhury et al. (2014), a lexicon is built directly by scraping keywords from the “mental health” category from Yahoo! Answers and Wikipedia. The authors report a significantly higher usage of such lexicons when compared to non-depressed users (89% higher, p < 0.0001).

Other than textual features, Tsugawa et al. (2015) select features like the number of followers, number of followees, overall mention rate, tweet frequency, number of words, retweet rate and so on. Perhaps due to geographical differences, the posting frequency, number of followers, and number of followees are significantly different between depressives and non-depressives (De-Choudhury et al., 2014). However, the retweet rate and ratio of tweets containing URL appear to be significant in differentiating both groups of users.

3.3.  Machine Learning Techniques to Predict Depressive Symptoms

Alharahsheh and Abdullah (2021) have applied a few machine learning techniques with hyperparameter tuning, namely Support Vector Machine (SVM), Logistic Regression, Decision Tree (DT), Random Forest (RF), and ensemble methods on tweets that are collected from a survey conducted by Busara Center in Kenya. The results reveal that RF, Ada Boosting, and Voting-Ensemble models with the highest F1 score (0.78) and accuracy (85%) are better techniques that can be used to predict users with depressive symptoms.

Bhargava (2021) combines two data sources, namely the Sentiment140 dataset and depressive tweets, that are collected using the TWINT API. The author has compared a Convolutional Neural Network (CNN) and a hybrid of CNN with LSTM. The results reveal that hybrid CNN-LSTM performs better than CNN. There are also works that reveal that DT provides the highest accuracy and lesser completion time compared to other chosen techniques (Tiwari et al., 2021; AlSagri and Ykhlef, 2020).

Chen and Sokolova (2021) focus on identifying depression posts from Reddit data, specifically text posts from ‘r/depression’. The performance of NB, SVM, XLNet, and BERT are compared, and it shows that BERT achieves the highest accuracy of 72%. Dinkel, Wu, and Yu (2019) propose a text-based multi-task BGRU network to detect depression from text transcripts consisting of clinical interviews to support the treatment for mental illness. The high F1 score of 0.84 indicates the viability of using a learning approach to detect depression.

Apriliani and Maharani (2023) have crawled the tweets belonging to 159 respondents to the Depression Anxiety Stress Scales (DASS-42) questionnaire. The scores from DASS-42 are calculated and used to label the respondents’ tweets. XL-Net and its hyper-parameter tuning effect on the tweets are analyzed. The average accuracy presented is 93.33%. Nurfadhila and Girsang (2023) collected 1424 tweets from Indonesia in the Indonesian language between August and September 2021. Multinomial NB and SVM are two traditional machine learning techniques that are selected to be compared with CNN. The results reveal that both Multinomial NB and SVM are comparatively good choices of traditional machine learning algorithms but lose out to CNN, which produces the highest accuracy of 91.23%.

Work done by Vasha et al. (2023) focuses on analysing 10000 posts and comments from Facebook and YouTube. Six machine learning algorithms are chosen to be compared, namely RF, Logistic Regression (LR), DT, SVM, K Nearest Neighbour (KNN), and multinomial NB. Based on the precision rate and F1 score, SVM is identified to be the most suitable machine learning algorithm to build the best predictive model.

Experimental Methods

4.    Proposed Framework

4.1. Baseline (Two-Class) Data Collection

        A hybrid method of survey and self-scraped data is proposed for data collection to build the two-class dataset, which we label as a baseline dataset. PHQ-9 surveys are distributed among a crowd via a crowdsourcing platform. The participants are told that they must fulfill the following criteria:

·         Twitter account must be active.

·         They claim to have / have not suffered from depression (depending on samples).

·         They are willing to share their data for this research.

·         They have not participated in this research before.

·         Their account should contain only English tweets.

        For scraping public tweets, users are labeled as a depressive class if they follow the strict pattern of “(I’m/ I am/ I was/ I’ve been/ I have been) diagnosed (with) (clinical/severe) depression.” For the non-depressive class, we adoped the negatively labeled dataset from Shen et al. (2017) and re-scraped 250 randomly selected users’ tweets using TWINT API to get their latest status. Table 1 shows the landscape of the baseline dataset. It contains only tweets and its class (which is the label).

Table 1 Landscape of the Baseline Dataset

Method

Class

Total Users

Total Tweets

Survey (PHQ-9)

Depressive

15

>160000

Scraped

Depressive

235

     >2.5 million

Scraped (from benchmark)

Non-depressive

250

   >1 million

4.2. Data Pre-processing

4.2.1. Data pre-processing on baseline dataset

        The baseline dataset is manually reviewed to remove accounts that are irrelevant to research, and Twitter accounts with fewer than 5 tweets. Participants with a PHQ-9 score of < 10 will be labeled as non-depressive, and those with ? 10 will be labeled otherwise. For the depressive class, the user ID of each qualified user will be used to scrape all tweets of the user and merge them with the scraped data of the depressive class. All the collected tweets undergo the following data pre-processing steps of emoji decoding and spellchecking.

4.2.2. Transforming baseline dataset to three-class dataset

        The baseline dataset from users in the depressive class is further separated into before (pre-) and after (post-) self-reported diagnosis, namely pre-depressive and post-depressive, by scanning the tweets for the earliest self-reported diagnosis that fulfills the keywords that are used for scraping. No changes are made to the non-depressive data.

4.3. Exploratory Data Analysis

        In this step, analysis such as lexicon analysis, TF-IDF, syntactical structure, and n-grams analysis are performed to find patterns among users within the baseline dataset in the hope of finding features to build the model for prediction.

4.4. Neural Modeling

        BERT is chosen as a predictive modeling technique. The rationale for choosing BERT stems from works on predicting depression on Reddit (Chen and Sokolova, 2021) and on an interview text transcript (Dinkel, Wu, and Yu, 2019). Both works produce remarkable results from using BERT, and this research further shows that BERT is outstanding for this Natural Language Processing (NLP) task.                        

        Once the BERT architecture has been set up, an ablation study is conducted where BERT, BERT with emoji decoding, and BERT with emoji decoding, and the lexicon frequencies are passed as an additional input to the decoder for the BERT architecture are applied to three types of datasets which are the baseline dataset, three-class dataset and a benchmark dataset that is available from another research (Shen et al., 2017).

Results and Discussion

    Exploratory data analysis and neural model have been carried out to examine how the writing styles can be used in identifying depressive symptoms.

5.1. PHQ-9 Lexicon Analysis

        Based on the ten questions in PHQ-9, we extracted words indicating depressive symptoms, such as 'hopeless,' 'tired,' and 'failure.' These seed words were then augmented by identifying their synonyms through Thesaurus.com. In total, we constructed a lexicon comprising 86 words.

        Each matched lexicon from a tweet is called a “hit”. It is observable that the depressive-labeled users include depressive terms in 0.6% of the tweets, whereas non-depressive labeled users include them in only 0.4%. The results of the analysis suggest that using a lexicon constructed from PHQ-9 may be an important feature in detecting depression.

5.2. Term Frequency – Inverse Document Frequency (TF-IDF)

        TF-IDF scores show the importance of the words within documents. All tweets are parsed through Porter Stemmer prior to the TF-IDF measurement. Stemming is a process of reducing a word’s linguistics morphology into its root form. Table 2 illustrates the list of words with the highest TF-IDF scores in the respective group for the two-class dataset.

Table 2 Words with highest TF-IDF scores for two-class dataset

Group

Words

Depressive

keep, bad, even, touch, thank, sever, wrong, mostli, hate, free, beauti, mood, blue, style, jesu, poor, tire, hair, cours, choic, amen, straight, stuck, option, chicago, ugh, fast, aww, yup, omg, gross, curiousca

Non-depressive

Manag, update, said, good, mr, date, singl, true, everybodi, mouth, light, help, sorri, china, chill favorit, present, absolut, okay, forgot

        Non-depressive group does not display clear and significant cliques. However, it is obvious in the depressive group such as words indicating negative emotions, such as “bad”, “severe”, “wrong”, “hate”, “mood”, “blue”, “poor”, “tire”, “stuck”, “gross”, and religion indicative terms such as “jesus” and “amen”.

        Table 3 illustrates the list of words with the highest TF-IDF scores in depressive groups for three-class dataset. The list of words with the highest TF-IDF scores in non-depressive groups is the same as in Table 2.

Table 3 Words with highest TF-IDF scores for three-class dataset

Group

Words

Pre-depressive

really, petxpe, fc3gt, 15, ku, stupid, ask, sweet, run, girls, moofmurphy, christmas, gender, eye, choice, gun, pizza, status, must

Post-depressive

Christmas, curiouscat, reading, hell, petxpe, scared, cis, lilyluchesi, evil, voice, sleep, lost, deserve, series, lots, coming, bill, fan, rock, type

        It is observable that significantly fewer negatively implied words are used in the pre-depressive group as compared to the post-depressive group. Depressed patients are more likely to have frequent melancholic mode, explaining why they include more depressed terms in their tweets.

5.3. Syntactical Structure

        All tweets are pre-processed (lowercase normalization, emoji replacement, and spelling corrections) prior to performing the part-of-speech (POS) tagging. Table 4 summarizes the most used words in the mentioned THREE tags of NOUN, VERB, and ADJECTIVE in respective groups.

Table 4 Top usage of POS tag by group

POS Tag

Group

Words

NOUN

Depressive

time, day, life, today, someone, depression, way, lt, thing, everyone, something, lol, year, video, night

 

Non-Depressive

time, video, gt, day, year, life, man, lol, way, today, thing, lt, shit, pa, love, something, girl

VERB

Depressive

be, get, do, have, go, know, see, make, love, take, follow, help, feel, say, want, give, let, think, tell

 

Non-Depressive

be, get, do, have, go, know, make, see, take, let,
say, love, give, tell, think, stop, want, keep, feel

ADJECTIVE

Depressive

good, new, much, u, i, happy, last, bad, other, great, same, first, many, little, real, old, mental

 

Non-Depressive

good, i, u, much, happy, last, other, same, bad, first, real, great, many, ur, little, next, sure, old, own

        The most significant difference is the NOUN tag. The depressive group uses more arbitrary references and determiners like “someone”, “everyone,” or “something”. Possibly because users with depressive symptoms tend to report poorer concentration and memory and, therefore, cannot recall the subjects precisely (Zuckerman et al., 2018).

        As for the VERB and ADJECTIVE tags, depressives tend to express themselves more. A significant difference is observed in the words “help” and “mental” as patients with major depression are more likely to open their self on the Internet (Ybarra, Alexander, and Mitchell, 2005). 

5.4. N-grams

        We have experimented using different N values i.e.: N = {1, 2, 3}, and bigram with N = 2 shows a significant difference between the two classes. Table 5 shows the most used bigram.

 

Table 5 Bigram distribution

Depressive Group

Non-depressive group

Bigram

Normalized Frequency

Bigram

Normalized Frequency

I wa

0.12

I want

0.09

I think

0.10

I love

0.08

I love

0.08

I wa

0.07

I know

0.07

Werewolf germani

0.06

        Users in the depressive group tend to use words that relate to their feelings, like “I think”, whereas users in the non-depressive group tend to use words that relate to expressing their opinions, like “I want”. This gives an indicator to further investigate feelings-related words/phrases for the depressive group.

5.5. Neural Modelling Results

        As described in Section 4.4, an ablation study is conducted, and the performance of each architecture model on datasets is studied, as shown in Table 6.

Table 6 Results on different architecture models on baseline (Two-class) and Three-class datasets

No

Dataset

Architecture

F1 score

Accuracy

1

Baseline (Two-Class dataset)

BERT

0.98

0.99

2

Baseline (Two-Class dataset)

 BERT + emoji

0.99

0.99

3

Baseline (Two-Class dataset)

BERT + emoji + lexicon

0.98

0.99

4

3-class dataset

BERT

0.73

0.72

5

3-class dataset

BERT + emoji

0.75

0.76

6

3-class dataset

BERT + emoji + lexicon

0.76

0.77

        Having additional lexicon features has not recorded an improvement in F1 score for the baseline dataset. However, it improves when applied to a three-class dataset. Overall, this proves that the ability of a model to capture detailed data representation allows the model to learn and thus perform better.

        To verify the effectiveness of the architecture models, we have also applied the architecture models to a dataset that is used in work by Shen et al. (2017), which we labeled as a benchmark, which is a two-class dataset. We adopt and customize the benchmark dataset to suit our ablation, as in Table 7.

Table 7 Results on different architectures

No

Dataset

Architecture

F1 score

Accuracy

1

Benchmark

BERT

0.87

0.84

2

Benchmark

BERT + emoji

0.88

0.84

3

Benchmark

BERT + emoji + lexicon

0.90

0.87

        BERT, with emoji decoding and lexicon features, has proven to be better, with an F1 score of 0.90 and an accuracy score of 0.87. After experimenting with various token lengths, we found that a length of 256 tokens works best in this case.

        In general, BERT-based architecture models have reported high F1 scores when these models were applied to baseline, three-class, and benchmark datasets. By pre-training BERT on masked language modeling and next sentence prediction, BERT has a much deeper understanding of the context of each token, which makes it more powerful in recognizing the linguistic features of tweets when compared to older embedding methods like Word2Vec or GloVe.

Conclusion

    In this paper, a method to predict users with depressive symptoms on Twitter that use a BERT-based model with emoji decoding along with our lexicon-based handcrafted feature is proposed. We have demonstrated that by using only textual features, we can achieve outstanding results and outperform a model that is built using linguistic features. We also briefly describe a plan to convert a two-class dataset to a three-class dataset. Our study shows that it is indeed possible to distinguish between “pre-depressive” and “post-depressive” groups. However, the process of finding the differences is significantly harder because we rely on self-reported diagnosis. This means a patient might have already been diagnosed before or after they posted the tweets. Future researchers should take note of the methodologies used, especially when they collect data, to ensure that the cut-off line is more accurate. As for application, the trained model can be integrated as part of social media platforms/tools whereby the model can analyze user inputs and provide an indication of potential depression based on learned patterns and characteristics. Future work may include non-textual features in this deep learning model for possibly better performance.

References

Alharahsheh, Y.E.Abdullah, M.A., 2021. Predicting Individuals Mental Health Status in Kenya using Machine Learning Methods. In: 2021 12th International Conference on Information and Communication Systems (ICICS), Institute of Electrical and Electronics Engineers (IEEE), pp. 94­98

AlSagri, H.S.Ykhlef, M., 2020. Machine Learning-based Approach for Depression Detection in Twitter using Content and Activity Features. Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and SystemsVolume 103(8), pp. 18251832

Apriliani, F.Maharani, W., 2023. Depression Detection on Social Media Twitter using XLNet MethodJurnal Ilmiah Penelitian dan Pembelajaran Informatika (Scientific Journal of Informatics Research and Learning)Volume 8(1), pp. 172180

Berawi, M.A., 2020. Empowering Healthcare, Economic, and Social Resilience During Global Pandemic COVID-19. International Journal of TechnologyVolume 11(3), pp. 436–439

Bhargava, C., 2021. Depression Detection using Sentiment Analysis of Tweets. Turkish Journal of Computer and Mathematics Education (TURCOMAT)Volume 12(11), pp. 5411–5418

Chen, Z.Sokolova, M., 2021. Sentiment Analysis of the COVID-related r/Depression Posts. arXiv preprint arXiv:2108.06215

Coppersmith, G., Dredze, M., Harman, C.Hollingshead, K., 2015. From Attention Deficit Hyperactivity Disorder (ADHD) to Seasonal Affective Disorder (SAD): Analyzing the Language of Mental Health on Twitter Through Self-reported Diagnoses. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 1–10

Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K.Mitchell, M., 2015. Computational Linguistics and Clinical Psychology (CLPsych) 2015 Shared Task: Depression and Post Traumatic Stress Disorder (PTSD) on Twitter. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 31–39

De-Choudhury, M., Counts, S., Horvitz, E.J.Hoff, A., 2014. Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th Association for Computing Machinery (ACM) conference on Computer supported cooperative work & social computingpp. 626–638

Dinkel, H., Wu, M.Yu, K., 2019. Text-based Depression Detection on Sparse Data. arXiv preprint arXiv:1904.05154

Fofana, N.K., Latif, F., Sarfraz, S., Bashir, M.F.Komal, B., 2020. Fear and Agony of the Pandemic Leading to Stress and Mental Illness: An Emerging Crisis in the Novel Coronavirus (COVID-19) Outbreak. Psychiatry ResearchVolume 291, p. 113230

Fu, C.H., Mourao-Miranda, J., Costafreda, S.G., Khanna, A., Marquand, A.F., Williams, S.C.Brammer, M.J., 2008. Pattern Classification of Sad Facial Processing: Toward the Development of Neurobiological Markers in Depression. Biological psychiatryVolume  63(7), pp. 656–662

Guntuku, S.C., Yaden, D.B., Kern, M.L., Ungar, L.H.Eichstaedt, J.C., 2017. Detecting Depression and Mental Illness on Social Media: an integrative review. Current Opinion in Behavioral SciencesVolume 18, pp. 43–49

Lin, L.Y., Sidani, J.E., Shensa, A., Radovic, A., Miller, E., Colditz, J.B., Hoffman, B.L., Giles, L.M.Primack, B.A., 2016. Association Between Social Media use and Depression Among US Young Adults. Depression and anxietyVolume 33(4), pp. 323–331

Milette, K., Hudson, M., Baron, M., Thombs, B.D.Canadian Scleroderma Research Group, 2010. Comparison of the Patient Health Questionnaire Depression Scale (PHQ-9) and Center for Epidemiologic Studies Depression Scale (CES-D) Depression Scales in Systemic Sclerosis: Internal Consistency Reliability, Convergent Validity and Clinical Correlates. RheumatologyVolume 49(4), pp. 789–796

Murdoch, T.B., Detsky, A.S., 2013. The Inevitable Application of Big Data to Health Care. JamaVolume 309(13), pp. 1351–1352

Nurfadhila, B.Girsang, A.S., 2023. Identifying Indication of Depression of Twitter User in Indonesia Using Text Mining. International Journal of Intelligent Systems and Applications in EngineeringVolume 11(2), pp. 523–530

Park, S.Romer, D., 2007. Associations Between Smoking and Depression in Adolescence: an Integrative Review. Journal of Korean Academy of NursingVolume 37(2), pp. 227–241

Pedersen, T., 2015. Screening Twitter users for Depression and Post Traumatic Stress Disorder (PTSD with Lexical Decision Lists. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 46–53

Piccinelli, M.Wilkinson, G., 2000. Gender Differences in Depression: Critical Review. The British Journal of PsychiatryVolume 177(6), pp. 486–492

Poldi, F., 2019, Twint—Twitter Intelligence Tool. Available online at https://github.com/twintproject/twint/wiki, Accessed on May 21, 2020

Reece, A.G.Danforth, C.M., 2017. Instagram Photos Reveal Predictive Markers of Depression. European Physical Journal (EPJ) Data ScienceVolume 6(1), p. 15

Saxena, A., 2018. A Semantically Enhanced Approach to Identify Depression-Indicative Symptoms Using Twitter Data.  

Schomerus, G., Schwahn, C., Holzinger, A., Corrigan, P.W., Grabe, H.J., Carta, M.G.Angermeyer, M.C., 2012. Evolution of Public Attitudes About Mental Illness: A Systematic Review and Met - Analysis. Acta Psychiatrica ScandinavicaVolume 125(6), pp. 440–452

Schwartz, H.A., Eichstaedt, J., Kern, M., Park, G., Sap, M., Stillwell, D., Kosinski, M.Ungar, L., 2014. Towards Assessing Changes in Degree of Depression Through Facebook. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 118–125

Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.S.Zhu, W., 2017. Depression Detection Via Harvesting Social Media: A Multimodal Dictionary Learning Solution. In: International Joint Conferences on Artificial Intelligence (IJCAI), pp. 3838–3844

Shrestha, K., 2018. Machine Learning for Depression Diagnosis using Twitter Data. International Journal of Computer Engineering in Research Trends, 5(2).

Smarr, K.L.Keefer, A.L., 2011. Measures of Depression and Depressive Symptoms: Beck Depression Inventory?II (BDI?II), Center for Epidemiologic Studies Depression Scale (CESD), Geriatric Depression Scale (GDS), Hospital Anxiety and Depression Scale (HADS), and Patient Health Questionnaire?9 (PHQ?9). Arthritis Care & Research, Volume 63(S11), pp. S454–S466

Surjandari, I., Wayasti, R.A., Laoh, E., Rus, A.M.M.Prawiradinata, I., 2019. Mining Public Opinion on Ride-hailing Service Providers using Aspect-based Sentiment Analysis. International of Journal TechnologyVolume 10, pp. 818–828

Tiwari, P.K., Sharma, M., Garg, P., Jain, T., Verma, V.K.Hussain, A., 2021. A Study on Sentiment Analysis of Mental Illness using Machine Learning Techniques. In: International Operating Procedure (IOP) Conference Series: Materials Science and Engineering, Volume 1099(1),  p. 012043

Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y.Ohsaki, H., 2015. Recognizing Depression from Twitter Activity. In: Proceedings of the 33rd Annual Association for Computing Machinery (ACM)  Conference on Human Factors in Computing Systemspp. 3187–3196

Vasha, Z.N., Sharma, B., Esha, I.J., Al Nahian, J.Polin, J.A., 2023. Depression Detection in Social Media Comments Data using Machine Learning Algorithms. Bulletin of Electrical Engineering and InformaticsVolume 12(2), pp. 987–996

Ybarra, M.L., Alexander, C.Mitchell, K.J., 2005. Depressive Symptomatology, Youth Internet use, and Online Interactions: A National Survey. Journal of Adolescent HealthVolume 36(1), pp. 9–18

Zuckerman, H., Pan, Z., Park, C., Brietzke, E., Musial, N., Shariq, A.S., Iacobucci, M., Yim, S.J., Lui, L.M., Rong, C., McIntyre, R.S., 2018. Recognition and Treatment of Cognitive Dysfunction in Major Depressive Disorder. Frontiers in PsychiatryVolume 9, p. 655