Published on in Vol 13 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/55511, first published .
Predicting the Transition From Depression to Suicidal Ideation Using Facebook Data Among Indian-Bangladeshi Individuals: Protocol for a Cohort Study

Predicting the Transition From Depression to Suicidal Ideation Using Facebook Data Among Indian-Bangladeshi Individuals: Protocol for a Cohort Study

Predicting the Transition From Depression to Suicidal Ideation Using Facebook Data Among Indian-Bangladeshi Individuals: Protocol for a Cohort Study

Protocol

1North South University, Dhaka, Bangladesh

2Vishwakarma Institute of Technology, Pune, India

*these authors contributed equally

Corresponding Author:

Manoshi Das Turjo, BSc

North South University

Bashundhara

Dhaka, 1219

Bangladesh

Phone: 880 1701754745

Email: manoshi.turjo@northsouth.edu


Background: Suicide stands as a global public health concern with a pronounced impact, especially in low- and middle-income countries, where it remains largely unnoticed as a significant health concern, leading to delays in diagnosis and intervention. South Asia, in particular, has seen limited development in this area of research, and applying existing models from other regions is challenging due to cost constraints and the region’s distinct linguistics and behavior. Social media analysis, notably on platforms such as Facebook (Meta Platforms Inc), offers the potential for detecting major depressive disorder and aiding individuals at risk of suicidal ideation.

Objective: This study primarily focuses on India and Bangladesh, both South Asian countries. It aims to construct a predictive model for suicidal ideation by incorporating unique, unexplored features along with masked content from both public and private Facebook profiles. Moreover, the research aims to fill the existing research gap by addressing the distinct challenges posed by South Asia’s unique behavioral patterns, socioeconomic conditions, and linguistic nuances. Ultimately, this research strives to enhance suicide prevention efforts in the region by offering a cost-effective solution.

Methods: This quantitative research study will gather data through a web-based platform. Initially, participants will be asked a few demographic questions and to complete the 9-item Patient Health Questionnaire assessment. Eligible participants who provide consent will receive an email requesting them to upload a ZIP file of their Facebook data. The study will begin by determining whether Facebook is the primary application for the participants based on their active hours and Facebook use duration. Subsequently, the predictive model will incorporate a wide range of previously unexplored variables, including anonymous postings, and textual analysis features, such as captions, biographic information, group membership, preferred pages, interactions with advertisement content, and search history. The model will also analyze the use of emojis and the types of games participants engage with on Facebook.

Results: The study obtained approval from the scientific review committee on October 2, 2023, and subsequently received institutional review committee ethical clearance on December 8, 2023. Our system is anticipated to automatically detect posts related to depression by analyzing the text and use pattern of the individual with the best accuracy possible. Ultimately, our research aims to have practical utility in identifying individuals who may be at risk of depression or in need of mental health support.

Conclusions: This initiative aims to enhance engagement in suicidal ideation medical care in South Asia to improve health outcomes. It is set to be the first study to consider predicting participants’ primary social application use before analyzing their content to forecast behavior and mental states. The study holds the potential to revolutionize strategies and offer insights for scalable, accessible interventions while maintaining quality through comprehensive Facebook feature analysis.

International Registered Report Identifier (IRRID): DERR1-10.2196/55511

JMIR Res Protoc 2024;13:e55511

doi:10.2196/55511

Keywords



Background

In Bangladesh and India, low- or middle-income countries (LMICs) of South Asia, the alarmingly high increasing rate of suicides is a dire situation [1]. One of the main reasons for this considerable rate, which is 75.5% [2], is that people going through this condition are not diagnosed early [3]. Stigma and gaps in proper treatment related to such mental health conditions have made this problem more widespread [4]. To help reduce this increasing rate of suicide, an automated and easy-to-access solution is required. This work aims to focus on the early detection of mental health problems in a resource-constrained setting [5].

The conventional approach to diagnosing depression requires patients to complete medical questionnaires and is subjective in nature [4]. Many studies have proven that behavior analysis through social media (SM) is the most convenient and fastest way to make the entire process easier and less time-consuming for patients [3,6-8]. Even by inspecting the behavioral patterns, especially by analyzing the text, it is possible to detect major depressive disorder (MDD) [9]. Studies have proven that identifying posts related to suicide begins with diagnosing depression levels, as suicidal ideation is particularly linked to MDD [10,11]. Numerous studies have shown that individuals with depression are at higher risk of attempting suicide [3]. Systems for detecting suicidal ideation using natural language processing (NLP) and machine learning (ML) on SM data are not fully suitable for LMICs like India and Bangladesh due to different behavioral patterns and socioeconomic conditions [12-14]. For example, texting style and grammar patterns are different [15]. Some sensor-based devices can detect depression [16], but this solution is not apt for this region [17].

In Bangladesh and India, the rising rate of suicides presents a grave concern, and statistics reveal an alarmingly high increase [18], with a significant portion (75.5%) attributed to late diagnosis [19]. The stigma surrounding mental health issues and gaps in appropriate treatment are at the root of this problem [20]. Notably, discussions on mental health problems remain neglected and a taboo in these communities, leading individuals to express their concerns primarily through SM platforms, such as Facebook (Meta Platforms Inc), rather than sharing them directly with others [21]. Moreover, SM continues to play a pivotal role in shaping communication patterns and societal dynamics. Bangladesh ranks among the top 3 countries for Facebook active user growth [22], and similarly, India currently has one of the largest user bases for Facebook, which is >314.6 million [23]. By focusing specifically on Bangladeshi and Indian users’ Facebook data [24], the study aims to detect posts related to depression and observe the transition to suicidal ideation. This targeted approach within a specific cultural context will offer unique insights and outcomes, setting it apart from previous research efforts in this domain. Therefore, while building upon existing literature, the study seeks to provide valuable contributions by exploring new dimensions and nuances in the detection and understanding of mental health issues on SM platforms.

We aim to build an automatic system to analyze posts and engagement on SM and predict the person’s behavior and mental state. Moreover, we will use the 9-item Patient Health Questionnaire (PHQ-9) as a validated and reliable patient self-report measure to assess depression [25-27]. We will focus on the interaction and participation of people on SM, which will clearly indicate their interest in communicating their tendencies with others, and text pattern analysis, which will show their emotional state. Most studies have analyzed public account data, but we will also analyze some private account data with users’ consent. We are focusing on some unexplored and unique variables, such as studying engagement with Facebook games and advertisements, analyzing the content of anonymous posts, and establishing a correlation between them and depressive symptoms. In addition, we focus on analyzing data on people from South Asian countries because very few works have been done targeting this region. Moreover, we will ensure the efficiency and usefulness of the model, which we will assess by examining a group of people. In addition, the model will function as a screening tool and symptom-tracking tool, enabling comprehensive assessments of suicidal thoughts, depression severity, and treatment progress as depicted in the Figure 1 [28].

Figure 1. Flowchart demonstrating the steps involved in the suicidal ideation identification system using machine learning (ML) and Facebook data. NLP: natural language processing. PHQ-9: 9-item Patient Health Questionnaire.

Related Work

Depression and Suicidal Tendency in LMICs

Suicidal ideation, a latent problem, has ranked as the fourth leading cause of death among young adults and adolescents [1]. Close to 800,000 people commit suicide every year [13]. In 2022, Bangladesh reported a total of 446 student suicides [29]. In many LMICs, including India and Bangladesh, suicide rates are alarmingly high, necessitating urgent attention [30]. Depression is a common phenomenon in LMICs due to several factors, including unemployment, poverty, and financial and food crises [31,32]. In LMICs, about one-third of the people commit suicide through the ingestion and inhalation of noxious pesticides; in the Southeast Asian region, 39% of people commit suicide through this method [33,34]. India, in particular, grapples with a substantial number of suicides each year, surpassing 100,000 cases [35]. Notably, suicide ranks as the fourth leading cause of death among young Indians in the 15-29 age group [36]. In Bangladesh, a survey conducted by the Aachol Foundation showed that between January and August 2022, 364 students committed suicide, out of which 194 (53%) were school students and the rest (n=170, 46.7%) were university students [37,38]. Suicide represents a significant social and public health issue that has profound implications for individuals, families, and communities [35]. The causes of suicide are diverse, including professional struggles, social isolation, abuse, family problems, mental disorders, substance addiction, financial challenges, and chronic pain [39]. However, addressing suicide is a complex issue due to various socioeconomic factors prevalent in countries such as India [40] and Bangladesh. Therefore, it is a demanding task to address suicide, which requires multidimensional prevention programs that mirror the multifaceted nature of the issue [4,41].

SM Engagement Analysis

People tend to express their emotional state to others. These days, SM has become the main platform [42] where people express themselves through writing. However, engagement on SM varies because of the mental condition of the individual. Studies have shown that people with clinical depression spend more time on SM than people without clinical depression [10,30,43]. By measuring users’ engagement, it is possible to detect their behavioral patterns [44]. To understand behavioral patterns, it is very important to observe emotions such as anger, disgust, fear, happiness, sadness, surprise, shame, and confusion [45]. The observation of the number of daily posts is a fundamental evaluation metric for this project [46], and the details of depressive lexicon analysis will be provided elsewhere. Post segmentation and word representation are 2 approaches that have been used to analyze the results of the depression detection test more effectively. In post segmentation, the sentences are split into words; in word representation, the words are converted into vectors [43]. However, the understanding of how people communicate their feelings through text has not received much attention. After analyzing the premedication and postmedication longitudinal data of a person diagnosed with MDD, it has been clearly stated that the posting pattern has significantly changed the outcome measure, indicating that SM use is affected by a person’s mood [47]. Therefore, an efficient strategy for this study is to concentrate on the text features to understand the user’s mental state and transform them into a quantifiable shape.

Researchers have investigated whether users’ activity and engagement levels right before using Facebook affect how they use it and, if so, then how. They found a noteworthy result from this study: users were affected by the time they spent on SM, and this had a high impact even on their academic performance [48]. However, gathering longitudinal data is a crucial part of this case because these behaviors are entirely dependent on circumstances, which can change very quickly. Modern studies have used the internet and social network information to discover links between individual traits and different disorders. As an example, SM sentiments have been used to predict various factors, such as population happiness [49] and even suicide rates [35,50]. Therefore, it is very important to analyze long-term data to mitigate biases and establish a strong relationship between behavior patterns and quantifying methods.

Natural Language Processing

NLP techniques for analyzing Facebook posts encompass a wide range of methods aimed at extracting insights and gaining understanding from text data [51,52]. These techniques include sentiment analysis, which evaluates the emotional tone of posts, enabling the classification of content as positive, negative, or neutral. Named entity recognition identifies and categorizes entities mentioned in posts, such as people, organizations, locations, and dates. Text summarization condenses lengthy posts into shorter, more digestible versions while preserving the main ideas. Topic modeling algorithms, such as latent Dirichlet allocation and nonnegative matrix factorization, uncover underlying themes and patterns within posts, facilitating content organization and categorization [53]. Furthermore, part-of-speech tagging assigns grammatical categories to words, enabling syntactic analysis and the understanding of sentence structures [54]. These NLP techniques empower researchers to gain valuable insights from Facebook posts, ranging from insights into user sentiment and behavior to relevant insights for various applications, such as SM monitoring and user profiling.

Text Pattern Analysis

Earlier studies have demonstrated that users openly express signs of depression on SM platforms such as Facebook [55] and Twitter (X Corp) [56]. In certain scenarios, the shared information provides enough insights for researchers to diagnose a major depressive episode. Most studies that have identified depression through SM data are primarily based on the analysis of text from publicly available data. In a particular study, the analysis of Facebook users’ “liked” content unveiled the potential for the precise prediction of diverse traits. Similarly, another investigation revealed that individuals experiencing loneliness are inclined to share more negative content, have fewer friends, and engage in less communication activity on the platform [57]. Valence and arousal, which are the representations of sentiment and intensity, are the 2 most important parameters to examine in any text [58]. Linguistic Inquiry and Word Count (LIWC) is a fully dedicated system for this textual analysis, where it is possible to divide any post into segments and identify the frequently used words, social words, and cognitive processes [59]. Another tool named Natural Language Processing with Java is used to annotate expressions of sentiment and other sociopsychological phenomena using custom lexicons and grammar. It extends the possibilities for the assessment of attitudes expressed in the text beyond sentiment analysis [60]. However, a limitation of these studies is their inability to determine whether the high-frequency words actually indicate suicidal ideation (ie, whether the person has written a post sarcastically).


Our procedure will consist of several key steps. These are data collection and preparation, feature engineering, model selection, model training, model evaluation, prediction, and model deployment.

Data Collection

Our data collection procedure will have 2 parts. The first one is data collection from Facebook, and the second one is data collection with our website, where we will be asking some questions regarding participants’ Facebook use pattern and administering the PHQ-9 to validate our result, which will be predicted using Facebook data.

To streamline the robust and privacy-conscious data collection process for our research, we aim to build a secure web-based system where users can upload their data files [4]. Upon accessing the system, participants would first be directed to a web-based survey where they could provide relevant information and respond to specific questions related to our study. Facebook complies with the General Data Protection Regulation [61] by offering a feature that enables users to download their data, as mandated by the regulation for SM companies. Therefore, following the survey’s completion, participants would be guided to download their Facebook data files directly from the Facebook platform. Further, they were instructed to securely upload the data files to our web-based system for conducting a comprehensive and informed analysis of suicidal thoughts.

Privacy is a multidimensional concept that goes beyond a binary value [62]. Therefore, it is essential to respect individuals’ autonomy [63] and data control, making it inappropriate to ask users to make their accounts public for research purposes. For individuals facing suicidal thoughts, privacy is crucial [64,65] in fostering a supportive and safe environment, allowing them to seek help without the fear of stigma or reluctance [66,67]. Respecting privacy in research is crucial to upholding ethical standards and protecting participants’ personal information [64]. Therefore, we will refrain from asking participants to make their accounts public to uphold the principles of privacy and respect individuals’ confidentiality in our research.

While Facebook restricts external entities from accessing its data [68], authorized application programming interfaces (APIs) such as the Facebook Graph API [69] enable the retrieval of publicly available information about Facebook users while adhering to privacy and legal guidelines. However, the API comes with several limitations and concerns [70,71] that can introduce biases into the model for predicting suicidal ideation, affecting the accuracy and effectiveness of the predictions.

Clinical Assessment

To diagonalize and comprehend the severity of depression, we will use the PHQ-9. This questionnaire comprises 9 questions with a scoring range of 0 to 27. The questionnaire categorizes depression into 5 segments: 0 to 4 as minimal, 5 to 9 as mild, 10 to 14 as moderate, 15 to 19 as moderately severe, and 20 to 27 as severe. The ninth item, “Thoughts that you would be better off dead or of hurting yourself in some way,” directly indicates users’ perspective on suicidal thoughts. This assessment is an important point to focus on to understand the shift.

Ethical Considerations

This study has undergone a comprehensive human participant research ethics review and has obtained the necessary approvals. Approval was granted by the scientific review committee from North South University, Dhaka, Bangladesh, on October 2, 2023 (NonCTRG-23-38), followed by institutional review committee ethical clearance on December 8, 2023 (2023/OR-NSU /IRB/1033). Informed consent was diligently obtained from all participants involved in the study, ensuring transparency and ethical integrity. Moreover, stringent measures were implemented to anonymize and deidentify study data, thereby safeguarding the privacy and confidentiality of human participants involved in the research. These ethical considerations underscore our commitment to upholding the highest standards of research integrity and participant welfare throughout the study process.

Participants

The studies investigating the prevalence of suicidal ideation outcomes among adolescents or young adults are dependent on factors such as age, gender, socioeconomic status, and cultural differences in India [72] and Bangladesh [73], as well as factors such as the frequency and intensity of SM use, types of content shared, and the presence of mental health indicators in web-based behavior. Given the complexity of the study and the need for robust statistical analysis, we anticipate determining the sample size based on a power analysis approach and earlier studies conducted in similar settings. We have carefully considered previous research approaches and sample sizes. The study by Patel et al [74] used larger sample sizes, ranging from 1500 to 30,000 participants, allowing for comprehensive analyses and insightful conclusions. Conversely, smaller sample sizes, ranging from 100 to 500 participants, were used in studies such as those by Vornholt and De Choudhury [75] and De Choudhury et al [76], demonstrating the feasibility of conducting impactful research even with fewer participants.

Given the diverse populations and SM use patterns in India and Bangladesh, we have tailored our sample size determination strategy accordingly. While these countries have millions of inhabitants [77,78], the subset that actively engages with SM, particularly Facebook, and is willing to share personal data for research purposes is smaller. In addition, to ensure balanced representation, we will carefully select participants to achieve a proportionate distribution of positive and negative samples once the minimum threshold value is reached. It is worth noting that despite India’s larger population, our study anticipates a greater participation rate from Bangladesh due to higher Facebook use. Therefore, we aim to recruit 300 to 1500 participants from Bangladesh and 300 to 700 participants from India. This approach ensures a sufficient sample size to yield meaningful insights while considering the unique demographic and SM landscape of each country.

Text and Behavior Analysis

Overview

We are planning to use NLP techniques to analyze the linguistic patterns and sentiments in user-generated content. We will also identify language indicative of depressive symptoms, negative emotions, and expressions of hopelessness. We will use LIWC and Affective Norms for English Words lexicon categorization for more accurate results in the text analysis [79]. In addition, we will use topic modeling and keyword extraction to identify prevalent themes within the text. These insights are essential for understanding trends, preferences, or concerns within the data set.

Behavior has 2 divisions: post-centric behavior and user-centric behavior. Post-centric behaviors include post time, number of posts, posting frequency, friends, followers, likes, replies, comments, and emoji counts. User-centric behaviors include positive affect, negative affect, activation, dominance, linguistic style, depression lexicon, and antidepressant use. Linguistic style specifically shows the total word count, first-person and third-person pronouns, functional words, assent, negation, etc.

Facebook Game Analysis

According to studies, people who play video games excessively are more likely to experience sadness and social anxiety [80]. Our observation supports this assertion that the number of people playing Facebook games grew dramatically throughout the COVID-19 era [55].

A gaming function on Facebook offers a variety of games that can be played by one person or several people. The accompanying player does not necessarily need to be on the friend list of the user but can be anyone from the Facebook community. A few well-known games are carrom, Bubble Shooter, Uno, Ludo, and chess. People can challenge one another in the game, and if the other person is interested, they can accept the challenge and finish the game. Most likely, these games are web based and can be played directly from a browser without downloading, but some of them also offer an Android version that needs to be downloaded on the phone.

A few factors to consider while analyzing gaming are frequency, duration, genres, single-player versus multiplayer, and in-game behavior.

There are currently no set standards for identifying addictive and nonaddictive players. However, there is a strong relationship among weekly gaming time (WGT), craving, and the problem video game playing (PVP) score. Hard-core or even excessive players (extremely high WGT and PVP scores) as well as nonexcessive players (low WGT and PVP scores) show distinct patterns in their gameplay behavior [56]. It has been established that negative emotional ratings for excessive gamers are higher than for nonexcessive players. In addition, they also have higher “self-devaluation when facing failure” scores. The excessive players displayed higher negative feelings and negative bodily manifestations in terms of both emotions and physical symptoms. This seems to be in line with the idea of addiction, when the action no longer produces happy results but instead causes negative ones. Playing video games over a long period might cause a disorder called alexithymia, which impairs the capacity to recognize one’s own internal emotions. People’s inability to identify particular emotions contributing to their current state of mind might make it worse [81]. Due to the greater focus and cognitive processing required by skill games, these players favor action games over them [82].

Games are categorized based on their intensity level: casual games are simple and easy-to-play games designed for short play sessions, whereas hard-core games are more complex and require a higher level of skill and commitment [50].

Categorizing games according to intensity level, we distinguish between hard-core and casual categories. The hard-core category includes games such as Magic Swap Puzzle, Coin Match, Solitaire, Words With Friends, Quiz Planet, Krunker FRVR, Uno, Chess, and Word Search. In contrast, the casual category comprises games such as Bubble Shooter Pro, Bubble Pop, Basketball FRVR, Ludo Club, Cooking Trendy, Carrom Board, Card Party, Card Wars, Eight Ball Pool, and The Test. This distinction helps in understanding player engagement and preferences based on the intensity of gameplay.

Description Analysis

In the bio section, people generally write about themselves or any famous quote they like or that resembles them. Sometimes, it can be one or multiple lines or even just a word. In the details section, they can add current and previous workplaces, institutions, relationship status, hobbies, and cities they previously lived in and currently live in. People with mental health issues have a tendency not to share their information with everyone. They have a close group of people with whom they feel comfortable sharing information [57]. It is often seen that the bio and details sections of these people remain almost empty or filled with very little information.

Content Analysis

Studies have consistently shown that people are willing to share their depression and medical conditions on various social networking sites [83]. Related studies have investigated language and emotional patterns and successfully assessed new mothers’ postpartum behavioral changes from prenatal assessments [84]. These findings highlight the potential of SM as an important indicator in assessing the occurrence of current or future depression.

In our study, we aim to introduce measures that evaluate and characterize users’ linguistic styles in posts, comments, and search history, irrespective of whether they are anonymous. An important observation is that users who are depressed post personal content in various groups or pages anonymously. This behavior seems driven by the need for a safe, nonjudgmental space to express emotions and seek support. In addition, depression-related content, such as quotes, internet memes, and music, is prevalent among users. Examples include posts including “sad songs” and “mental illness quotes,” along with expressions of feelings of despair and overwhelm, such as “I am so lonely” and “Everything seems too overwhelming and pointless.” Understanding such patterns of anonymously shared data on SM platforms can provide valuable insights into predicting the mental health challenges people face. Moreover, analyzing such content and the frequency of posting may be instrumental in identifying depression and the potential areas for targeted interventions and support.

This study also highlighted the untapped opportunity in understanding users’ search activity, which holds promising potential for enhancing clinical and public health efforts in suicide risk assessment and prevention. Participants’ search themes encompass suicide, help-seeking, mood and anxiety symptoms, and trauma and negative life events [85]. Often, users facing trauma or negative life events search for support resources, coping strategies, and information on seeking professional help. This includes asking questions about outpatient resources, health clinics, and other medically relevant topics, analyzing which helps gain a comprehensive understanding of users’ mental health concerns. Further, to enhance the robustness of the predictive model, in this study, we focus on additional insights into users' engagement with relevant resources and communities over Facebook. By integrating a comprehensive list of the user’s group memberships, liked pages, and notification messages, we provide a holistic approach to understanding users’ mental health, allowing us to develop effective strategies while assessing suicide risk at an early stage.

In this research paper, we will preprocess the text data to ensure their suitability for subsequent model building. Given the unstructured nature of SM data, we will apply customized preprocessing and cleaning techniques. These techniques include removing accented characters; expanding contractions; converting to lowercase; eliminating URLs, symbols, digits, and special characters; fixing word lengthening; applying spelling correction; removing stop words; and performing lemmatization. These steps will aim to standardize the text, reduce dimensionality, improve data quality, and ensure consistent representation in our models. By implementing these preprocessing steps, we will be able to successfully clean and prepare the text data, making them suitable for subsequent analysis and model training [74].

During the data cleaning phase, we will take additional steps to refine the preprocessed text data. These steps will involve removing irrelevant words, empty rows, and outlier rows with high word counts. By eliminating irrelevant words, we will focus on more meaningful features for analysis. The empty rows, which will not contain any useful information, will be dropped to ensure data integrity. At the same time, the outlier rows with high word counts, deviating significantly from the majority, will be removed to optimize model training efficiency. These measures will improve the data set’s quality and relevance, preparing it for further analysis and model training.

Further, we will use a vectorizer called Tf-idf Vectorizer to transform our textual data into numerical features suitable for ML algorithms. That vectorizer will can be configured with specific parameters: min_df=50 and max_features=5000. The min_df parameter specifies that a word must appear in at least 50 documents (posts, in our case) to be included in the feature representation. This will help filter out infrequently occurring words that may not contribute significantly to the overall understanding of the data. The max_features parameter sets an upper limit on the number of features (words) to be considered. In our case, we will set it to 5000, indicating that only the top 5000 most relevant features (based on their frequency) will be selected for further analysis. This approach will enable us to represent the text data numerically, capturing the importance of different words while reducing the dimensionality of the feature space.

Diurnal Activity

Diurnal activity indicates the activeness of a person on SM both during the daytime and at night. It has been mentioned that individuals who are depressed are most active during the night [6,30,86]. In our study, we will investigate the moods [87] of all people who have written posts on SM. We intend to determine the emotions of a person who is depressed (ie, to visualize whether the person is actually depressed).

Diurnal activity will be calculated using the following formula:

XG(d,h) / ∑XG(d,i) (1)

where XG (d,h) indicates the average timing of the total number of posts made in hours (h) and days (d), the G in XG indicates the country where the post was made. In addition, h indicates a value from 0 to 23 (ie, an average number of posts is considered up to 11 PM).

Another study indicates a positive relationship between the use of emojis and genders (ie, women use more emojis than men), and the most frequently used emoji was the emoji representing tears of joy, with a use percentage of 22.2% and 18.9% among women and men, respectively. However, to investigate the emoji choices of both genders, a statistical analysis was carried out, named “mutual Information” (MI), which calculates the use of emojis by both men and women [88]:

MI(X;Y)e = ∑∑P(x,y)e log (P(x,y)e) / (P(x)e × P(y)e) (2)

where X indicates whether the gender (either men or women) uses an emoji. Here, the value of x will be equal to 1 if a user applies emojis, and if the value of x is equal to 0, then the user does not apply emojis. However, Y indicates a user’s gender (ie, for men, the value of y will be equal to 1; for women, the value of y will be equal to 1). Here, X is represented as a member of x, and Y is represented as a member of y. In contrast, p(x,y)e indicates the joint probability of X and Y, whereas p(x)e and p(y)e indicate the marginal probability of X and Y [88].

For analyzing the data from Facebook, we will observe the time stamps for negative posts and the types of emojis that are used frequently on posts related depression. We will be focusing on the formulas given earlier, but we will adjust the variables to suit our needs.

For “negative” emotions from posts for “diurnal activity,” we will be applying the text2emotion Python function to extract the emotional words used on the posts. However, the text2emotion library shows 5 different emotions, namely happy, sad, angry, surprise, and fear. However, here, we will be dealing with the negative feelings from Facebook posts.

First, our main purpose is to clean the data to make them appropriate for analyzing emotions. For the data cleaning process, we need to follow several steps. For instance, we need to remove the unwanted texts or words from the posts (ie, the words that have low frequency based on term frequency–inverse document frequency), apply various tools of the Natural Language Toolkit for sentiment analysis, and extract the better-preprocessed posts from the previous preprocessed posts. Second, we need to figure out the emotional words from the preprocessed texts and then use them for further visualization of the emotions. Finally, we need to observe the scores of the emotions. The higher the score from 1 of the 5 categories, the more the emotion will belong to that group.

After that, we intend to observe the emojis from the posts to investigate what type of nonverbal attributes a person who is depressed has both during the daytime and nighttime while writing posts on SM. This will help us figure out which emojis a person who is depressed uses repeatedly during the night. Several studies were conducted on emojis [89-92], indicating how they coincide with the linguistic style of a person. For instance, the emojis that are inclined with the negative words from LIWC are “anger,” “money,” “ingest,” “family,” “home,” and “death” [89].

As mentioned earlier, we will be focusing on the sentiments in the posts and the frequently used emojis by an individual during nighttime. Moreover, we will be considering how the emotions in a person’s post and the number of emojis used on a post are correlated with diurnal activity. For that, we will be using the formula for MI, as it consists of both joint probability and marginal probability. “Joint probability” will help investigate the relationship between the emotions in the posts and the number of emojis used, applying a condition, that is, if we work on the sentiments in the posts, then the number of emojis will be kept constant, and vice versa. In contrast, “marginal probability” will help us extract 1 feature at a time (ie, the features will be independent of one another).

There will be a change in the formula for both diurnal activity and MI (ie, in the diurnal activity” formula, the number of posts will be denoted as “Po,” and not XG). For MI, instead of “X” and “Y,” we will write “Po” and “E,” respectively. Here, “Po” will indicate the posts written by an individual during nighttime, and “E” will indicate the types of emojis used by that individual.

The modified formulas for diurnal activity and MI are as follows:

(Po(d,h)) / (∑Po(d,i)) (3)

MI(Po,E) = ∑∑P(Po,E)e log (P(Po,E)e) / (P(Po)e × P(E)e) (4)

To find out the MI, we will consider various combinations for the number of “negative” posts and emojis used by an individual. We will consider the values of both Po and E as binary 1 (ie, Po=1 and E=1) if the posts are “negative” and the emojis are used.

In contrast, we will consider the values of both Po and E as binary 0 (ie, Po=0 and E=0) if the posts are “nonnegative” and no emojis are used in the post.

There are other combinations as well, that is, Po=0 and E=1 if the number of posts is “nonnegative” and emojis are applied in the posts, and Po=1 and E=0 if the posts are “negative” and no emojis are used in the posts.

Here, we will eliminate Po=0 and E=1 and Po=0 and E=0, as we will not deal with “nonnegative” posts (ie, Po=0) and “zero emojis” (ie, E=0).

The highest value of MI will indicate both the number of “negative” posts and “emojis” (Po=1 and E=1) used. In contrast, the lowest value will indicate either the number of “nonnegative” posts and “emojis” (Po=0 and E=1) used or the number of “nonnegative” posts and “zero emojis” (Po=0 and E=0).

Advertisement Analysis

Participants revealed a tendency to reduce their SM posts during periods of low moods, with the belief that their SM content might not accurately depict their depressive state. SM platforms such as Facebook use algorithms to direct advertisements to suitable users, leveraging search keywords and previously accessed links. A subset of participants acknowledged the practice of targeted advertising, while a minority reported posting specifically to seek support. In addition, participants acknowledged the existing use of targeted advertising and expressed a preference for its application in mental health care provisioning as opposed to its current use. Individuals experiencing depression tend to conduct searches related to health-related topics, resulting in personalized advertisements that align with their search preferences.

Network Analysis

For measuring the network, we will calculate reciprocity, cluster coefficient, and betweenness centrality. This will help understand the user’s close connections and how frequently they communicate with them. To analyze the post-centric and user-centric behavior in detail, we need to calculate reciprocity, prestige ratio, clustering coefficient, degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality.

In terms of analyzing the network, we will consider “u” as the user whose network property we will analyze. A node indicates a person, and an edge is considered a connection or link. Reciprocity will reflect how often “u” has responded to any message.

The prestige ratio indicates the bidirectional communication of the user. That means after texting back to any message, it tracks the number of times they have participated and the frequency of their interactions.

Mathematically, the reciprocity formula is as follows:

Reciprocity = (number of reciprocated connections) / (total number of connections) (5)

To calculate the reciprocity, we have to count the number of connections between pairs of users in a network and determine how many of these connections are reciprocated. A reciprocated connection occurs when 2 nodes have a mutual connection with each other. For example, if user A is connected to user B and user B is also connected to user A, it is considered a reciprocated connection. A higher reciprocity score in network analysis indicates a more tightly connected and interactive network with stronger bidirectional relationships between users.

Clustering coefficient measures how likely it is for the friends or connections of a user to be friends or connected to each other. Basically, it provides a quantifiable measure of how much users tend to form groups or clusters within the network.

For user i, who has ni neighbors, the clustering coefficient Ci is defined as the ratio of ei connected pairs to the number of all possible connections among the ni neighbors:

Ci = 2ei / (ni([ni – 1]) (6)

The formula calculates the clustering coefficient by dividing the number of actual connections between the neighbors of the user i (2ei) by the number of possible connections between the neighbors of the user i (ni([ni–1]).

A higher clustering coefficient score in SM analysis signifies a network with well-connected subgroups or clusters, indicating the presence of close relationships and social interactions among users within these groups. A lower clustering coefficient score in social network analysis indicates a network with less local interconnectedness and fewer cohesive groups, suggesting a more diverse and less tightly knit social structure among the nodes.

Closeness centrality annotates how easily “u” can reach other users in a community. To be more specific, it measures how close or directly connected a user is to other users. A high closeness centrality indicates that “u” is closely connected to many other users. A low closeness centrality indicates that “u” is distant from others in the network. It means “u” may need to go through multiple intermediate connections to reach other users.

Closeness centrality, as the name suggests, is an index defined in terms of distance. Length of a (s, r) path represents the number of edges contained in it. We define the (shortest path) distance, dist (s, r) of s, r∈V as the minimum length of any (s, r) path. We reiterate that we consider only connected graphs for now and observe that dist (s, s)=0 for all s ∈ V. The distance matrix D=(dist[s, r])s, r∈V of an undirected graph is symmetrical, so the total distance, dist(v), of a vertex v ∈ V is obtained as either the row or column sum:

(7)

Because of this reversal in ranking, the closeness centrality of a vertex s ∈ V is usually defined as the inverse of the total (or, equivalently, average) distance [49]

(8)

Higher closeness centrality indicates that a user is more central and well connected in the network, while lower closeness centrality suggests that a node is less central and may have less efficient access to other nodes.

Betweenness centrality is a widely used measure that captures a user’s role in allowing information to pass from one part of the network to another. It shows the potential of a user in terms of communication:

(9)

The number of shortest paths from a vertex i to a vertex j that goes through a vertex k, denoted spij (k), is the maximum number of shortest paths from vertex i to vertex k in the shortest path tree rooted at i and the number of shortest paths from vertex j to vertex k in the shortest path tree rooted at the vertex j [93].

Users with higher betweenness centrality have more control over the flow of communication in the network and act as important mediators or connectors between different parts of the network. In contrast, users with lower betweenness centrality have less influence over the overall communication patterns and may be more localized in their interactions. Betweenness centrality is a measure of a user’s importance in facilitating communication, and it provides insights into the user’s role in maintaining network connectivity and communication.

ML Modeling

We are planning to train and validate ML models using labeled data, encompassing a balanced number of posts from individuals with diagnosed depression and those without. After that, we will use a combination of features, including sentiment scores, word frequency, and contextual information. We will conduct an extensive analysis using classification models to predict suicidal risk by assembling multiple classification models based on the provided features. The customized models used in the voting classifier will include random forest, decision tree, gradient boosting, XGboost, and k-nearest neighbors.

We will conduct an extensive analysis using various classification models to predict suicide risk using multiple classification models based on the provided features. These models include a voting classifier, random forest classifier, decision tree classifier, gradient boosting classifier, XGBoost classifier, and k-nearest neighbors classifier. Initially, we will split the data set into training and testing sets with a ratio of 7:3. We will use a voting classifier to combine the predictions of 3 naive Bayes classifiers. In addition, we will optimize the performance of the random forest classifier, decision tree classifier, gradient boosting classifier, XGBoost classifier, and k-nearest neighbors classifier by tuning their hyperparameters using techniques such as RandomizedSearchCV. At the same time, we will consider the characteristics of our data set and the nature of the classification task to choose an appropriate model that can effectively capture the patterns and relationships in the text data. Therefore, we will evaluate the performance of each model by calculating training and testing scores. This comprehensive methodology will allow us to identify the most effective model for predicting suicide risk based on the given features.

One of the potential concerns of this study is addressing the data imbalance challenge, particularly regarding the occurrence of suicide. To mitigate this imbalance, we have studied the techniques that were previously implemented in similar settings. Firstly, we will use appropriate data preprocessing techniques, such as oversampling or undersampling methods, based on the collected data to ensure that the predictive model is trained on a balanced data set. In addition, feature selection methods will be used to prioritize the most informative variables related to depression and suicide prediction, thereby minimizing the impact of data skewness on the model’s performance. Furthermore, we will consider the use of robust evaluation metrics, such as precision, recall, and F1-score, which will allow for a comprehensive assessment of the model’s predictive capability across different outcome classes, effectively addressing the challenge posed by data imbalance. These approaches would provide valuable insights into addressing data imbalance challenges in predictive modeling studies related to mental health.

Evaluation Metric

Overview

In the process of evaluating our models, we will use various ML metrics that offer critical insights into their performance. These metrics play a pivotal role in determining the accuracy and efficacy of the models. Specifically, we will be focusing on the following metrics [48]: true positive, false negative, true negative, and false positive. These metrics hold significance in assessing the predictive capability of the models in a classification context.

Sensitivity (True Positive Rate or Recall)

This metric measures the ability of the model to correctly identify people who are depressed. It calculates the ratio of true positives (correctly identified people who are depressed) to the sum of true positives and false negatives (people who are depressed incorrectly classified as people who are not depressed).

Specificity (True Negative Rate)

This metric evaluates the proportion of actual negatives (people who are depressed) that are correctly identified as such. It measures the accuracy of identifying healthy individuals who do not have depression.

Precision (Positive Predictive Value)

Precision calculates the fraction of relevant instances (positively labeled) among the predicted instances. It provides an assessment of the accuracy of positive predictions made by the model:

TP / (TP + FP) (10)

F1-Score

The F1-score is a measure that combines precision and sensitivity (or recall) into a single metric. It calculates the harmonic mean of precision and sensitivity and is particularly useful when dealing with imbalanced data:

(2* precision score × recall score) / (precision score + recall score) (11)

Accuracy

Accuracy represents the overall performance of the model, measuring the proportion of correctly classified instances out of the total instances. The error rate, which complements accuracy, quantifies the proportion of misclassified instances:

(TP + TN) / (TP + TN + FP + FN) (12)

Evolution

Among the array of standard evaluation metrics, precision, recall, and F1-score are given more weightage to assess the efficacy of models in accurately classifying depression classes. When assessing early detection algorithms, it is crucial to consider not only the accuracy of decisions but also the associated delays. In line with this, Wang and Mark [48] introduced the early risk detection error metric, which penalizes untimely correct decisions to address this concern. Flatency, introduced in the CLEF eRisk 2021 [59], is an interpretive evaluation metric proposed alongside the early risk detection error metric to enhance interpretability [30]. In addition, known as latency-weighted F1, flatency combines F1-score with speed, reflecting both accuracy and efficiency in identifying users’ content, particularly emphasizing early and accurate classification. By considering these metrics, we can gain a comprehensive understanding of the performance of our models in terms of their ability to correctly identify individuals who are depressed and individuals who are not depressed. The streamlined flow of the depression identification system is depicted in Figure 2.

Figure 2. Deployment of the suicide ideation prediction model based on the 9-item Patient Health Questionnaire (PHQ-9) and Facebook data examination. ML: machine learning; NLP: natural language processing.

Our system is anticipated to automatically detect posts related to depression by analyzing the text and use pattern of the individual with the best accuracy possible. Ultimately, our research aims to have practical utility in identifying individuals who may be at risk of depression or in need of mental health support.

Simultaneously, by analyzing daily activities on SM, we aim to determine how frequently individuals write “negative” posts on Facebook, as such posts are often associated with depression [94]. Additionally, we will examine the types of emojis that are used repeatedly by these individuals. This will help us identify how often an individual uses emojis while writing a post. Moreover, if a person does not use an emoji, their posts will be detected based on the score of their “negative” emotional words (ie, “sad,” “angry,” or “fear”).

By accurately classifying individuals based on their text analysis, we hope to contribute to the development of early detection systems or interventions that can enhance mental health screening and well-being.


Overview

In this study, we will focus on the South Asian region, mainly Bangladesh and India, which is still dealing with economic issues and some problems with regard to basic rights. People from this region do not pay adequate attention to mental health and well-being. Therefore, we aim to build such a solution that will cover most of the gaps regarding all the mentioned facts. Our goal is to establish a procedure that will analyze how a collection of behavioral indicators that appear on SM may be used to forecast postings that are suggestive of depression and, in turn, comprehend widespread suicidal tendencies in populations.

Principal Findings

Our study will begin by administering basic questions to users, followed by a filtering process to select suitable participants for our study. Upon obtaining the Facebook data, our analysis will commence. We will initially determine whether Facebook serves as the primary SM platform for the participants, a crucial factor influencing the accuracy of our predictions. In case it is their main platform, we will focus on their interactions, encompassing messages, posts, comments, and gaming activities. Our analysis will extend to anonymous actions and their engagement with content related to depression.

Conversely, in instances where Facebook is not the predominant medium, judicious focus is shifted to the advertisement analysis model. This model hinges on deciphering users’ web-based behavior, scrutinizing their browsing history, visited sites, and thematic inclinations, particularly within realms such as consultancy or support services. For participants who primarily use other platforms, our analysis will shift to evaluating the types of websites they frequent and the content they engage with. This system will work privately to assess the risks associated with anticipated changes in the future while taking into account metrics related to SM activity. We aim to demonstrate a significant correlation among Facebook games, user profile information, and content such as advertisements on mental health as depicted in Textbox 1.

Finally, we will incorporate an assessment of the frequency of their posting, commenting, friend engagement, and previous questionnaire responses, thus enhancing the depth and accuracy of our predictive model. Our ultimate prediction will amalgamate these distinct analyses through advanced techniques. Informed by the outcome of these individual models, our predictive synthesis advances, culminating in a definitive assessment of the presence or absence of suicidal ideation.

Textbox 1. Consideration of data variables for suicidal ideation detection based on Facebook activity status.
  • If a user is active on Facebook
  • 9-item Patient Health Questionnaire (PHQ-9) score
  • Facebook games analysis
  • Bio analysis
  • Post and comment analysis with or without the user’s name
  • Notification analysis
  • Kinds of groups joined and pages liked analysis
  • Search history analysis
  • Advertisement analysis
  • Diurnal activity value
  • Network attributes calculation

If a user is not active on Facebook but has created an account

  • PHQ-9 score
  • Advertisement content analysis

Comparison With Previous Work

Our study will present a novel investigation, first exploring the prevalence of Facebook as the dominant social application, whereas existing studies focused on Twitter, Reddit (Reddit Inc), and Instagram (Meta Platforms Inc) as data sources [95]. We will use a process to collect the data that will not require any extra website or setup to be downloaded or installed on the user’s end. This is the reason why we are claiming our process is easy to access. Other existing solutions need some external infrastructure, which makes the data donation process a bit complex for users. Along with all the engagement analysis parameters, including texting patterns, we will also consider some unique parameters such as used emojis, stickers, Facebook games, bio sections, and advertisement suggestions that have a direct connection with users’ behavior. People with MDD have noticeable difficulties with their attention span [96]. On this account, we are showing a connection between these parameters and users’ behavior. This exploration will provide valuable insights into the emotional state of users.

Limitations

However, this study may have a few limitations and biases that are crucial to consider. First, the recruitment of participants through web-based platforms may introduce selection bias, as individuals who are active on SM may be different from the general population [97]. In addition, reliance on self-reported data, such as responses to demographic questions and mental health assessments, may introduce response and social desirability biases. Participants may underreport sensitive information or provide socially acceptable responses, leading to inaccuracies in the data collected [98]. Moreover, the interpretation of textual data, including language nuances and sentiment analysis, is subject to inherent biases in NLP algorithms [99]. Future research should aim to mitigate these challenges through methodological refinements and validation studies.

Future research directions can include comprehensive validation studies that directly compare the outcomes of our screening method with those of clinical assessments conducted by mental health professionals. This validation process would involve recruiting control groups and conducting direct comparisons to assess the accuracy and reliability of our screening results in identifying individuals with suicidal ideation. This will provide valuable insights into the strengths and limitations of our approach. By establishing the concordance between our screening results and clinical diagnoses, we can enhance the credibility and utility of our screening methodology in real-world settings [100]. Researchers can also conduct a broader exploration of alternative screening methods and interventions based on the screening results. Incorporating additional data sources or using advanced ML techniques could enhance accuracy and efficiency. In addition, developing targeted interventions based on the screening results holds promise for improving mental health outcomes and preventing suicide [101].

Conclusions

The study’s timeline unfolds as follows. The data collection, which includes recruiting participants, conducting surveys, and gathering Facebook data, is scheduled to conclude by November 2024. Anticipating the system’s ability to accurately detect signs of depression in text and use patterns, we aim to advance early detection systems or interventions, completing the ML model by January 2025. The final results are expected to be available in February 2025.

By delving into these multifaceted aspects, this study will offer a comprehensive and pioneering approach to the field of mental health. Our approach to predicting depression through SM activity, or the examination, is not intended to replace conventional surveillance systems or laboratory-based depression diagnoses. To maximize the advantages of these approaches and concepts and improve people’s quality of life, we believe it is important to highlight the possibilities. This will also encourage discussion and raise awareness of any potential issues that should be resolved on individual and societal levels.

Acknowledgments

JMIR Publications along with North South University (NSU) graciously provided article processing fee (APF) support for the publication of this paper. We would like to express our sincere gratitude for this support.

Conflicts of Interest

None declared.

  1. Scherer S, Hammal Z, Yang Y, Morency LP, Cohn JF. Dyadic behavior analysis in depression severity assessment interviews. Proc ACM Int Conf Multimodal Interact. Nov 2014;2014:112-119. [FREE Full text] [CrossRef] [Medline]
  2. Paelecke-Habermann Y, Pohl J, Leplow B. Attention and executive functions in remitted major depression patients. J Affect Disord. Dec 2005;89(1-3):125-135. [CrossRef] [Medline]
  3. Most popular social networks worldwide as of April 2024, ranked by number of monthly active users. Statista. 2023. URL: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/ [accessed 2024-01-30]
  4. Razi A, Alsoubai A, Kim S, Naher N, Ali S, Stringhini G, et al. Instagram data donation: a case study on collecting ecologically valid social media data for the purpose of adolescent online risk detection. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems Extended Abstracts. 2022. Presented at: CHI EA '22; April 29-May 5, 2022:1-9; New Orleans, LA. URL: https://dl.acm.org/doi/10.1145/3491101.3503569 [CrossRef]
  5. Leiva V, Freire A. Towards suicide prevention: early detection of depression on social media. Universitat Pompeu Fabra. URL: https://repositori.upf.edu/bitstream/handle/10230/33315/freire_insci_towards.pdf [accessed 2024-01-30]
  6. Number of monthly active Facebook users worldwide as of 4th quarter 2023. Statista. URL: https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/ [accessed 2024-01-30]
  7. De Choudhury M, Counts S, Horvitz E. Social media as a measurement tool of depression in populations. In: Proceedings of the 5th Annual ACM Web Science Conference. 2013. Presented at: WebSci '13; May 2-4, 2013:47-56; Paris, France. URL: https://dl.acm.org/doi/10.1145/2464464.2464480 [CrossRef]
  8. Yasir Arafat SM. Current challenges of suicide and future directions of management in Bangladesh: a systematic review. Global Psychiatry. 2018;2(1):9-20. [CrossRef]
  9. Facebook - statistics and facts. Statista. URL: https://www.statista.com/topics/751/facebook/ [accessed 2024-01-30]
  10. Desseilles M, Perroud N, Guillaume S, Jaussent I, Genty C, Malafosse A, et al. Is it valid to measure suicidal ideation by depression rating scales? J Affect Disord. Feb 2012;136(3):398-404. [FREE Full text] [CrossRef] [Medline]
  11. De Choudhury M, Kiciman E, Dredze M, Coppersmith G, Kumar M. Discovering shifts to suicidal ideation from mental health content in social media. Proc SIGCHI Conf Hum Factor Comput Syst. May 2016;2016:2098-2110. [FREE Full text] [CrossRef] [Medline]
  12. Suicide. World Health Organization. URL: https://www.who.int/india/health-topics/suicide [accessed 2024-01-30]
  13. Accidental deaths and suicides in India. National Crime Records Bureau. URL: https://ncrb.gov.in/en/ADSI-2021 [accessed 2024-01-30]
  14. Wongaptikaseree K, Yomaboot P, Katchapakirin K, Kaewpitakkun Y. Social behavior analysis and Thai mental health questionnaire (TMHQ) optimization for depression detection system. IEICE Trans Inf Syst. 2020;E103.D(4):771-778. [FREE Full text] [CrossRef]
  15. Ahmed M, Ahmed N. A fast and minimal system to identify depression using smartphones: explainable machine learning-based approach. JMIR Form Res. Aug 10, 2023;7:e28848. [FREE Full text] [CrossRef] [Medline]
  16. Yu L, Jiang W, Ren Z, Xu S, Zhang L, Hu X. Detecting changes in attitudes toward depression on Chinese social media: a text analysis. J Affect Disord. Feb 01, 2021;280(Pt A):354-363. [CrossRef] [Medline]
  17. Rathod S, Pinninti N, Irfan M, Gorczynski P, Rathod P, Gega L, et al. Mental health service provision in low- and middle-income countries. Health Serv Insights. Mar 28, 2017;10:1178632917694350. [FREE Full text] [CrossRef] [Medline]
  18. Renaud J, MacNeil SL, Vijayakumar L, Spodenkiewicz M, Daniels S, Brent DA, et al. Suicidal ideation and behavior in youth in low- and middle-income countries: a brief review of risk factors and implications for prevention. Front Psychiatry. 2022;13:1044354. [FREE Full text] [CrossRef] [Medline]
  19. Fritz K, Russell AM, Allwang C, Kuiper S, Lampe L, Malhi GS. Is a delay in the diagnosis of bipolar disorder inevitable? Bipolar Disord. Aug 22, 2017;19(5):396-400. [CrossRef] [Medline]
  20. Cogan NA, Liu X, Chin-Van CY, Kelly SW, Anderson T, Flynn C, et al. The taboo of mental health problems, stigma and fear of disclosure among Asian international students: implications for help-seeking, guidance and support. Br J Guid Counc. 2023:1-19. [CrossRef]
  21. Soron TR, Shariful Islam SM. Suicide on Facebook-the tales of unnoticed departure in Bangladesh. Glob Ment Health (Camb). May 26, 2020;7:e12. [FREE Full text] [CrossRef] [Medline]
  22. Bangladesh among top 3 countries for Facebook active user growth: Meta. The Daily Star. URL: https:/​/www.​thedailystar.net/​news/​bangladesh/​news/​bangladesh-among-top-3-countries-facebook-active-user-growth-meta-3238806 [accessed 2024-02-16]
  23. Facebook users by country 2024. World Population Review. URL: https://worldpopulationreview.com/country-rankings/facebook-users-by-country [accessed 2024-02-16]
  24. Statistics: Facebook users in Bangladesh. NapoleonCat. 2022. URL: https://napoleoncat.com/stats/facebook-users-in-bangla desh/2022/01/ [accessed 2024-01-30]
  25. PHQ-9 depression scale questionnaire. AIMS Center, University of Washington. URL: https://aims.uw.edu/resource-library/phq-9-depression-scale [accessed 2024-01-30]
  26. A cultural outlook on behaviors between Asia and North America. TTI Success Insights. URL: https://blog.ttisi.com/a-cul tural-outlook-on-behaviors-between-asia-and-north-america [accessed 2024-01-30]
  27. Wearable sensor can detect hidden anxiety and depression in children. Medical Device Network. Jan 2019. URL: https:/​/www.​medicaldevice-network.com/​news/​wearable-sensor-can-detect-hidden-anxiety-and-depression-in-children/​ [accessed 2024-01-30]
  28. What is suicidal ideation? MedicalNewsToday. URL: https://www.medicalnewstoday.com/articles/193026 [accessed 2024-01-30]
  29. Study: Bangladesh saw 446 student suicides in 2022. Dhaka Tribune. 2023. URL: https://www.dhakatribune.com/bangladesh/303677/study-bangladesh-saw-446-student-suicides-in-2022 [accessed 2024-01-30]
  30. De Choudhury M, Counts S, Gamon M. Not all moods are created equal! exploring human emotional states in social media. Proc Int AAAI Conf Web Soc Media. Aug 03, 2021;6(1):66-73. [CrossRef]
  31. Yatham S, Sivathasan S, Yoon R, da Silva TL, Ravindran AV. Depression, anxiety, and post-traumatic stress disorder among youth in low and middle income countries: a review of prevalence and treatment interventions. Asian J Psychiatr. Dec 2018;38:78-91. [CrossRef] [Medline]
  32. Roy AD. How can we prevent suicides among the youth? The Daily Star. 2022. URL: https://www.thedailystar.net/opinion/views/news/how-can-we-prevent-suicides-among-the-youth-3124791 [accessed 2024-01-30]
  33. De Sousa A, Mohandas E, Javed A. Psychological interventions during COVID-19: challenges for low and middle income countries. Asian J Psychiatr. Jun 2020;51:102128. [FREE Full text] [CrossRef] [Medline]
  34. Brandão DJ, Fontenelle LF, da Silva SA, Menezes PR, Pastor-Valero M. Depression and excess mortality in the elderly living in low- and middle-income countries: systematic review and meta-analysis. Int J Geriatr Psychiatry. Jan 15, 2019;34(1):22-30. [CrossRef] [Medline]
  35. Suicide. World Health Organization. URL: https://www.who.int/news-room/fact-sheets/detail/suicide [accessed 2024-01-30]
  36. Aggarwal S. Suicide in India. Br Med Bull. Jun 09, 2015;114(1):127-134. [CrossRef] [Medline]
  37. Preventing suicide: a global imperative. World Health Organization. 2014. URL: https://www.who.int/publications/i/item/9789241564779 [accessed 2023-12-11]
  38. Cristóbal-Narváez P, Haro JM, Koyanagi A. Perceived stress and depression in 45 low- and middle-income countries. J Affect Disord. Sep 01, 2020;274:799-805. [FREE Full text] [CrossRef] [Medline]
  39. Yusuf HR, Akhter HH, Rahman MH, Chowdhury ME, Rochat RW. Injury-related deaths among women aged 10-50 years in Bangladesh, 1996-97. Lancet. Apr 08, 2000;355(9211):1220-1224. [CrossRef] [Medline]
  40. Migliore LA. Relation between big five personality traits and Hofstede's cultural dimensions: samples from the USA and India. Cross Cult Manag Int J. 2011;18(1):38-54. [CrossRef]
  41. National suicide prevention strategies: progress, examples and indicators. World Health Organization. URL: https:/​/www.​who.int/​publications-detail-redirect/​national-suicide-prevention-strategies-progress-examples-and-indicators [accessed 2024-01-30]
  42. Number of social media users worldwide from 2017 to 2028. Statista. URL: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/ [accessed 2024-01-30]
  43. Huang YC, Chiang CF, Chen AL. Predicting depression tendency based on image, text and behavior data from Instagram. In: Proceedings of the 8th International Conference on Data Science, Technology and Applications. 2019. Presented at: DATA '19; July 26-28, 2019:32-40; Prague, Czech Republic. URL: https://www.scitepress.org/Papers/2019/78336/78336.pdf [CrossRef]
  44. De Choudhury M, Gamon M, Counts S, Horvitz E. Predicting depression via social media. Proc Int AAAI Conf Web Soc Media. Aug 03, 2021;7(1):128-137. [CrossRef]
  45. Chen X, Sykora M, Jackson TW, Elayan S. What about mood swings: identifying depression on Twitter with temporal measures of emotions. In: Proceedings of the 2018 on the World Wide Web Conferences. 2018. Presented at: WWW '18; April 23-27, 2018:1653-1660; Lyon, France. URL: https://dl.acm.org/doi/10.1145/3184558.3191624 [CrossRef]
  46. Sadeque F, Xu D, Bethard S. Measuring the latency of depression detection in social media. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mini. 2018. Presented at: WSDM '18; February 5-9, 2018:495-503; Marina Del Rey, CA. URL: https://dl.acm.org/doi/10.1145/3159652.3159725 [CrossRef]
  47. Saha K, Sugar B, Torous J, Abrahao B, Kıcıman E, De Choudhury M. A social media study on the effects of psychiatric medication use. Proc Int AAAI Conf Web Soc Media. Jul 06, 2019;13:440-451. [FREE Full text] [CrossRef]
  48. Wang Y, Mark G. The context of college students' Facebook use and academic performance: an empirical study. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018. Presented at: CHI '18; April 21-26, 2018:1-11; Montreal, QC. URL: https://dl.acm.org/doi/10.1145/3173574.3173992 [CrossRef]
  49. Brandes U, Borgatti SP, Freeman LC. Maintaining the duality of closeness and betweenness centrality. Soc Netw. Jan 2016;44:153-159. [CrossRef]
  50. Gajaria A, Ravindran AV. Interventions for perinatal depression in low and middle-income countries: a systematic review. Asian J Psychiatr. Oct 2018;37:112-120. [CrossRef] [Medline]
  51. Kabir MK, Islam M, Kabir AN, Haque A, Rhaman MK. Detection of depression severity using Bengali social media posts on mental health: study using natural language processing techniques. JMIR Form Res. Sep 28, 2022;6(9):e36118. [FREE Full text] [CrossRef] [Medline]
  52. Gharehchopogh FS, Khalifelu ZA. Analysis and evaluation of unstructured data: text mining versus natural language processing. In: Proceedings of the 5th International Conference on Application of Information and Communication Technologies. 2011. Presented at: AICT '11; October 12-14, 2011:1-4; Baku, Azerbaijan. URL: https://ieeexplore.ieee.org/document/6111017 [CrossRef]
  53. Badal VD, Kundrotas PJ, Vakser IA. Natural language processing in text mining for structural modeling of protein complexes. BMC Bioinformatics. Mar 05, 2018;19(1):84. [FREE Full text] [CrossRef] [Medline]
  54. Elbattah M, Arnaud É, Gignon M, Dequen G. The role of text analytics in healthcare: a review of recent developments and applications. In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. 2021. Presented at: BIOSTEC '21; February 11-13, 2021:825-832; Virtual Event. URL: https://www.scitepress.org/PublishedPapers/2021/104145/104145.pdf [CrossRef]
  55. Wei HT, Chen MS, Huang PC, Bai YM. The association between online gaming, social phobia, and depression: an internet survey. BMC Psychiatry. Jul 28, 2012;12(1):92. [FREE Full text] [CrossRef] [Medline]
  56. Growth of interaction on Facebook gaming during the COVID-19 pandemic in Vietnam in 2020. Statista. URL: https:/​/www.​statista.com/​statistics/​1278152/​vietnam-facebook-gaming-interaction-growth-during-covid-19/​ [accessed 2024-01-30]
  57. Morin R, Léger PM, Senecal S, Bastarache-Roberge MC, Lefèbrve M, Fredette M. The effect of game tutorial: a comparison between casual and hardcore gamers. In: Proceedings of the 2016 Annual Symposium on Computer-Human Interaction in Play Companion Extended Abstracts. 2016. Presented at: CHI PLAY Companion '16; October 16-19, 2016:229-237; Austin, TX. URL: https://dl.acm.org/doi/10.1145/2968120.2987730 [CrossRef]
  58. Preoţiuc-Pietro D, Schwartz HA, Park G, Eichstaedt J, Kern M, Ungar L, et al. Modelling valence and arousal in Facebook post. In: Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2016. Presented at: NAACL-HLT '16; June 12-17, 2016:9-15; San Diego, CA. URL: https://aclanthology.org/W16-0404.pdf [CrossRef]
  59. Rude S, Gortner EM, Pennebaker J. Language use of depressed and depression-vulnerable college students. Cogn Emot. Dec 2004;18(8):1121-1133. [CrossRef]
  60. Miháltz M, Váradi T, Csertő I, Fülöp É, Pólya T, Kővágó P. Beyond sentiment: social psychological analysis of political Facebook comments in Hungary. In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2015. Presented at: WASSA '15; September 17, 2015:127-133; Lisboa, Portugal. URL: https://aclanthology.org/W15-2918.pdf [CrossRef]
  61. Art. 20 GDPR: right to data portability. General Data Protection Regulation (GDPR). URL: https://gdpr-info.eu/art-20-gdpr/ [accessed 2024-01-30]
  62. Zook M, Barocas S, Boyd D, Crawford K, Keller E, Gangadharan SP, et al. Ten simple rules for responsible big data research. PLoS Comput Biol. Mar 30, 2017;13(3):e1005399. [FREE Full text] [CrossRef] [Medline]
  63. De Choudhury M, Kiciman E. Integrating artificial and human intelligence in complex, sensitive problem domains: experiences from mental health. AI Mag. Sep 2018;39(3):69-80. [CrossRef]
  64. Nebeker C, Murray K, Holub C, Haughton J, Arredondo EM. Acceptance of mobile health in communities underrepresented in biomedical research: barriers and ethical considerations for scientists. JMIR Mhealth Uhealth. Jun 28, 2017;5(6):e87. [FREE Full text] [CrossRef] [Medline]
  65. Myers J, Frieden TR, Bherwani KM, Henning KJ. Ethics in public health research: privacy and public health at risk: public health confidentiality in the digital age. Am J Public Health. May 2008;98(5):793-801. [CrossRef]
  66. Borghouts J, Eikey E, Mark G, De Leon C, Schueller SM, Schneider M, et al. Barriers to and facilitators of user engagement with digital mental health interventions: systematic review. J Med Internet Res. Mar 24, 2021;23(3):e24387. [FREE Full text] [CrossRef] [Medline]
  67. De Choudhury M. Anorexia on Tumblr: a characterization study. In: Proceedings of the 5th International Conference on Digital Health. 2015. Presented at: DH '15; May 18-20, 2015:43-50; Florence, Italy. URL: https://dl.acm.org/doi/10.1145/2750511.2750515 [CrossRef]
  68. Clark M. How we combat scraping. Meta. URL: https://about.fb.com/news/2021/04/how-we-combat-scraping/ [accessed 2024-01-30]
  69. Graph API. Meta. URL: https://developers.facebook.com/docs/graph-api/ [accessed 2024-01-30]
  70. Ho JC. Assessing the bias of Facebook's graph API. In: Proceedings of the 30th ACM Conference on Hypertext and Social Media. 2019. Presented at: HT '19; September 17-20, 2019:271-272; Hof, Germany. URL: https://dl.acm.org/doi/10.1145/3342220.3344923 [CrossRef]
  71. Rieder B. Studying Facebook via data extraction: the Netvizz application. In: Proceedings of the 5th Annual ACM Web Science Conference. 2013. Presented at: WebSci '13; May 2-4, 2013:346-355; Paris, France. URL: https://dl.acm.org/doi/10.1145/2464464.2464475 [CrossRef]
  72. National mental health survey national mental health survey of India, 2015-16. National Institute of Mental Health and Neuro Sciences. URL: https://www.who.int/docs/default-source/searo/india/health-topic-pdf/summary.pdf [accessed 2024-02-17]
  73. Bhuiyan AK, Sakib N, Pakpour AH, Griffiths MD, Mamun MA. COVID-19-related suicides in Bangladesh due to lockdown and economic factors: case study evidence from media reports. Int J Ment Health Addict. May 15, 2021;19(6):2110-2115. [FREE Full text] [CrossRef] [Medline]
  74. Patel V, Ramasundarahettige C, Vijayakumar L, Thakur JS, Gajalakshmi V, Gururaj G, et al. Million Death Study Collaborators. Suicide mortality in India: a nationally representative survey. Lancet. Jun 23, 2012;379(9834):2343-2351. [FREE Full text] [CrossRef] [Medline]
  75. Vornholt P, De Choudhury M. Understanding the role of social media-based mental health support among college students: survey and semistructured interviews. JMIR Ment Health. Jul 12, 2021;8(7):e24512. [FREE Full text] [CrossRef] [Medline]
  76. De Choudhury M, Counts S, Horvitz E, Hoff A. Characterizing and predicting postpartum depression from shared Facebook data. In: Proceedings of The 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. 2014. Presented at: CSCW '14; February 15-19, 2014:626-638; Baltimore, MA. URL: https://dl.acm.org/doi/10.1145/2531602.2531675 [CrossRef]
  77. India population (2024). Worldometer. URL: https://www.worldometers.info/world-population/india-population/ [accessed 2024-02-17]
  78. Bangladesh population (2024). Worldometer. URL: https://www.worldometers.info/world-population/bangladesh-population/ [accessed 2024-02-17]
  79. Tejaswini V, Babu KS, Sahoo B. Depression detection from social media text analysis using natural language processing techniques and hybrid deep learning model. ACM Trans Asian Low Resour Lang Inf Process. Jan 15, 2024;23(1):1-20. [CrossRef]
  80. Bathina KC, Ten Thij M, Lorenzo-Luaces L, Rutter LA, Bollen J. Individuals with depression express more distorted thinking on social media. Nat Hum Behav. Apr 11, 2021;5(4):458-466. [CrossRef] [Medline]
  81. Taquet P, Romo L, Cottencin O, Ortiz D, Hautekeete M. Video game addiction: cognitive, emotional, and behavioral determinants for CBT treatment. J Thér Comport Cogn. Sep 2017;27(3):118-128. [CrossRef]
  82. von der Heiden JM, Braun B, Müller KW, Egloff B. The association between video gaming and psychological functioning. Front Psychol. Jul 26, 2019;10:1731. [FREE Full text] [CrossRef] [Medline]
  83. Park M, Deajeon GD, Cha C, Cha M. Depressive moods of users portrayed in Twitter. In: Proceedings of the 2012 Conference on ACM SIGKDD Workshop on Health Informatics. 2012. Presented at: HI-KDD ’12; August 12, 2012:1-8; Beijing, China. URL: https://nyuscholars.nyu.edu/ws/portalfiles/portal/134720119/depressive_moods_kdd.pdf
  84. De Choudhury M, Counts S, Horvitz E. Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of the 2013 SIGCHI Conference on Human Factors in Computing Systems. 2013. Presented at: CHI '13; April 27-May 2, 2013:3267-3276; Paris, France. [CrossRef]
  85. Moon KC, Van Meter AR, Kirschenbaum MA, Ali A, Kane JM, Birnbaum ML. Internet search activity of young people with mood disorders who are hospitalized for suicidal thoughts and behaviors: qualitative study of google search activity. JMIR Ment Health. Oct 22, 2021;8(10):e28262. [FREE Full text] [CrossRef] [Medline]
  86. Koyanagi A, DeVylder JE, Stubbs B, Carvalho AF, Veronese N, Haro JM, et al. Depression, sleep problems, and perceived stress among informal caregivers in 58 low-, middle-, and high-income countries: a cross-sectional analysis of community-based surveys. J Psychiatr Res. Jan 2018;96:115-123. [CrossRef] [Medline]
  87. Kim J, Aryee LM, Bang H, Prajogo S, Choi YK, Hoch JS, et al. Effectiveness of digital mental health tools to reduce depressive and anxiety symptoms in low- and middle-income countries: systematic review and meta-analysis. JMIR Ment Health. Mar 20, 2023;10:e43066. [FREE Full text] [CrossRef] [Medline]
  88. Naaman M, Zhang A, Brody S, Lotan G. On the study of diurnal urban routines on Twitter. Proc Int AAAI Conf Web Soc Media. Aug 03, 2021;6(1):258-265. [CrossRef]
  89. Chen Z, Lu X, Ai W, Li H, Mei Q, Liu X. Through a gender lens: learning usage patterns of emojis from large-scale android users. In: Proceedings of the 2018 World Wide Web Conference. 2018. Presented at: WWW '18; April 23-27, 2018:763-772; Lyon, France. URL: https://dl.acm.org/doi/10.1145/3178876.3186157 [CrossRef]
  90. Chandra Guntuku S, Li M, Tay L, Ungar LH. Studying cultural differences in emoji usage across the East and the West. Proc Int AAAI Conf Web Soc Media. Jul 06, 2019;13:226-235. [CrossRef]
  91. Kejriwal M, Wang Q, Li H, Wang L. An empirical study of emoji usage on Twitter in linguistic and national contexts. Online Soc Netw Media. Jul 2021;24:100149. [CrossRef]
  92. Kimura M, Katsurai M. Automatic construction of an emoji sentiment lexicon. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017. Presented at: ASONAM '17; July 31-August 3, 2017:1033-1036; Sydney, Australia. URL: https://dl.acm.org/doi/10.1145/3110025.3110139 [CrossRef]
  93. Meghanathan N. A computationally lightweight and localized centrality metric in lieu of betweenness centrality for complex network analysis. Vietnam J Comput Sci. Jun 17, 2016;4(1):23-38. [CrossRef]
  94. Ge J. Emoji sequence use in enacting personal identity. In: Companion Proceedings of the 2019 World Wide Web Conference. 2019. Presented at: WWW '19; May 13-17, 2019:426-438; San Francisco, CA. URL: https://dl.acm.org/doi/10.1145/3308560.3316545 [CrossRef]
  95. Tadesse MM, Lin H, Xu B, Yang L. Detection of depression-related posts in reddit social media forum. IEEE Access. 2019;7:44883-44893. [CrossRef]
  96. Andalibi N, Ozturk P, Forte A. Sensitive self-disclosures, responses, and social support on Instagram: the case of #Depression. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 2017. Presented at: CSCW '17; February 25-March 1, 2017:1485-1500; Portland, OR. URL: https://dl.acm.org/doi/10.1145/2998181.2998243 [CrossRef]
  97. Thornton L, Batterham PJ, Fassnacht DB, Kay-Lambkin F, Calear AL, Hunt S. Recruiting for health, medical or psychosocial research using Facebook: systematic review. Internet Interv. May 2016;4:72-81. [FREE Full text] [CrossRef] [Medline]
  98. Tourangeau R, Yan T. Sensitive questions in surveys. Psychol Bull. Sep 2007;133(5):859-883. [CrossRef] [Medline]
  99. Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. Apr 14, 2017;356(6334):183-186. [FREE Full text] [CrossRef] [Medline]
  100. Zirikly A, Resnik P, Uzuner Ö, Hollingshead K. CLPsych 2019 shared task: predicting the degree of suicide risk in reddit posts. In: Proceedings of the 6th Workshop on Computational Linguistics and Clinical Psychology. 2019. Presented at: CLPsych '19; June 6, 2019:24-33; Minneapolis, MN. URL: https://aclanthology.org/W19-3003.pdf [CrossRef]
  101. Hoermann S, McCabe KL, Milne DN, Calvo RA. Application of synchronous text-based dialogue systems in mental health interventions: systematic review. J Med Internet Res. Jul 21, 2017;19(8):e267. [FREE Full text] [CrossRef] [Medline]


API: application programming interface
LIWC: Linguistic Inquiry and Word Count
LMIC: low- or middle-income country
MDD: major depressive disorder
MI: mutual information
ML: machine learning
NLP: natural language processing
PHQ-9: 9-item Patient Health Questionnaire
PVP: problem video game playing
SM: social media
WGT: weekly gaming time


Edited by A Mavragani; submitted 15.12.23; peer-reviewed by M Elbattah, N Yahagi; comments to author 28.01.24; revised version received 17.02.24; accepted 29.02.24; published 07.10.24.

Copyright

©Manoshi Das Turjo, Khushboo Suchit Mundada, Nuzhat Jabeen Haque, Nova Ahmed. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 07.10.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.