Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis

  • Published: 02 April 2019
  • Volume 51 , pages 1766–1781, ( 2019 )

Cite this article

qualitative research about social media pdf

  • Matthew Andreotta 1 , 2 ,
  • Robertus Nugroho 2 , 3 ,
  • Mark J. Hurlstone 1 ,
  • Fabio Boschetti 4 ,
  • Simon Farrell 1 ,
  • Iain Walker 5 &
  • Cecile Paris 2  

41k Accesses

60 Citations

182 Altmetric

24 Mentions

Explore all metrics

To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (non-negative matrix inter-joint factorization; topic alignment) and qualitative (thematic analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.

Similar content being viewed by others

qualitative research about social media pdf

Text Mining and Big Textual Data: Relevant Statistical Models

qualitative research about social media pdf

Word Association Thematic Analysis: Insight Discovery from the Social Web

qualitative research about social media pdf

Comparing qualitative and quantitative text analysis methods in combination with document-based social network analysis to understand policy networks

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.


Social scientists use qualitative modes of inquiry to explore the detailed descriptions of the world that people see and experience (Pistrang & Barker, 2012 ). To collect the voices of people, researchers can elicit textual descriptions of the world through interview or survey methodologies. However, with the popularity of the Internet and social media technologies, new avenues for data collection are possible. Social media platforms allow users to create content (e.g., Weinberg & Pehlivan, 2011 ), and interact with other users (e.g., Correa, Hinsley, & de Zùñiga, 2011 ; Kietzmann, Hermkens, McCarthy, & Silvestre, 2010 ), in settings where “Anyone can say Anything about Any topic” ( AAA slogan , Allemang & Hendler, 2011 , pg. 6). Combined with the high rate of content production, social media platforms can offer researchers massive and diverse dynamic data sets (Yin & Kaynak, 2015 ; Gudivada et al., 2015 ). With technologies increasingly capable of harvesting, storing, processing, and analyzing this data, researchers can now explore data sets that would be infeasible to collect through more traditional qualitative methods.

Many social media platforms can be considered as textual corpora, willingly and spontaneously authored by millions of users. Researchers can compile a corpus using automated tools and conduct qualitative inquiries of content or focused analyses on specific users (Marwick, 2014 ). In this paper, we outline some of the opportunities and challenges of applying qualitative textual analyses to the big data of social media. Specifically, we present a conceptual and pragmatic justification for combining qualitative textual analyses with data science text-mining tools. This process allows us to both embrace and cope with the volume and diversity of commentary over social media. We then demonstrate this approach in a case study investigating Australian commentary on climate change, using content from the social media platform: Twitter.

Opportunities and challenges for qualitative researchers using social media data

Through social media, qualitative researchers gain access to a massive and diverse range of individuals, and the content they generate. Researchers can identify voices which may not be otherwise heard through more traditional approaches, such as semi-structured interviews and Internet surveys with open-ended questions. This can be done through diagnostic queries to capture the activity of specific peoples, places, events, times, or topics. Diagnostic queries may specify geotagged content, the time of content creation, textual content of user activity, and the online profile of users. For example, Freelon et al., ( 2018 ) identified the Twitter activity of three separate communities (‘Black Twitter’, ‘Asian-American Twitter’, ‘Feminist Twitter’) through the use of hashtags Footnote 1 in tweets from 2015 to 2016. A similar process can be used to capture specific events or moments (Procter et al., 2013 ; Denef et al., 2013 ), places (Lewis et al., 2013 ), and specific topics (Hoppe, 2009 ; Sharma et al., 2017 ).

Collecting social media data may be more scalable than traditional approaches. Once equipped with the resources to access and process data, researchers can potentially scale data harvesting without expending a great deal of resources. This differs from interviews and surveys, where collecting data can require an effortful and time-consuming contribution from participants and researchers.

Social media analyses may also be more ecologically valid than traditional approaches. Unlike approaches where responses from participants are elicited in artificial social contexts (e.g., Internet surveys, laboratory-based interviews), social media data emerges from real-world social environments encompassing a large and diverse range of people, without any prompting from researchers. Thus, in comparison with traditional methodologies (Onwuegbuzie and Leech, 2007 ; Lietz & Zayas, 2010 ; McKechnie, 2008 ), participant behavior is relatively unconstrained if not entirely unconstrained, by the behaviors of researchers.

These opportunities also come up with challenges, because of the following attributes (Parker et al., 2011 ). Firstly, social media can be interactive : its content involves the interactions of users with other users (e.g., conversations), or even external websites (e.g., links to news websites). The ill-defined boundaries of user interaction have implications for determining the units of analysis of qualitative study. For example, conversations can be lengthy, with multiple users, without a clear structure or end-point. Interactivity thus blurs the boundaries between users, their content, and external content (Herring, 2009 ; Parker et al., 2011 ). Secondly, content can be ephemeral and dynamic . The users and content of their postings are transient (Parker et al., 2011 ; Boyd & Crawford, 2012 ; Weinberg & Pehlivan, 2011 ). This feature arises from the diversity of users, the dynamic socio-cultural context surrounding platform use, and the freedom users have to create, distribute, display, and dispose of their content (Marwick & Boyd, 2011 ). Lastly, social media content is massive in volume . The accumulated postings of users can lead to a large amount of data, and due to the diverse and dynamic content, postings may be largely unrelated and accumulate over a short period of time. Researchers hoping to harness the opportunities of social media data sets must therefore develop strategies for coping with these challenges.

A framework integrating computational and qualitative text analyses

Our framework—a mixed-method approach blending the capabilities of data science techniques with the capacities of qualitative analysis—is shown in Fig.  1 . We overcome the challenges of social media data by automating some aspects of the data collection and consolidation, so that the qualitative researcher is left with a manageable volume of data to synthesize and interpret. Broadly, our framework consists of the following four phases: (1) harvest social media data and compile a corpus, (2) use data science techniques to compress the corpus along a dimension of relevance, (3) extract a subset of data from the most relevant spaces of the corpus, and (4) perform a qualitative analysis on this subset of data.

figure 1

Schematic overview of the four-phased framework

Phase 1: Harvest social media data and compile a corpus

Researchers can use automated tools to query records of social media data, extract this data, and compile it into a corpus. Researchers may query for content posted in a particular time frame (Procter et al., 2013 ), content containing specified terms (Sharma et al., 2017 ), content posted by users meeting particular characteristics (Denef et al., 2013 ; Lewis et al., 2013 ), and content pertaining to a specified location (Hoppe, 2009 ).

Phase 2: Use data science techniques to compress the corpus along a dimension of relevance

Although researchers may be interested in examining the entire data set, it is often more practical to focus on a subsample of data (McKenna et al., 2017 ). Specifically, we advocate dividing the corpus along a dimension of relevance, and sampling from spaces that are more likely to be useful for addressing the research questions under consideration. By relevance, we refer to an attribute of content that is both useful for addressing the research questions and usable for the planned qualitative analysis.

To organize the corpus along a dimension of relevance , researchers can use automated, computational algorithms. This process provides both formal and informal advantages for the subsequent qualitative analysis. Formally, algorithms can assist researchers in privileging an aspect of the corpus most relevant for the current inquiry. For example, topic modeling clusters massive content into semantic topics—a process that would be infeasible using human coders alone. A plethora of techniques exist for separating social media corpora on the basis of useful aspects, such as sentiment (e.g., Agarwal, Xie, Vovsha, Rambow, & Passonneau, 2010 ; Paris, Christensen, Batterham, & O’Dea, 2015 ; Pak & Paroubek, 2011 ) and influence (Weng et al., 2010 ).

Algorithms also produce an informal advantage for qualitative analysis. As mentioned, it is often infeasible for analysts to explore large data sets using qualitative techniques. Computational models of content can allow researchers to consider meaning at a corpus-level when interpreting individual datum or relationships between a subset of data. For example, in an inspection of 2.6 million tweets, Procter et al., ( 2013 ) used the output of an information flow analysis to derive rudimentary codes for inspecting individual tweets. Thus, algorithmic output can form a meaningful scaffold for qualitative analysis by providing analysts with summaries of potentially disjunct and multifaceted data (due to interactive, ephemeral, dynamic attributes of social media).

Phase 3: Extract a subset of data from the most relevant spaces of the corpus

Once the corpus is organized on the basis of relevance, researchers can extract data most relevant for answering their research questions. Researchers can extract a manageable amount of content to qualitatively analyze. For example, if the most relevant space of the corpus is too large for qualitative analysis, the researcher may choose to randomly sample from that space. If the most relevant space is small, the researcher may revisit Phase 2 and adopt a more lenient criteria of relevance.

Phase 4: Perform a qualitative analysis on this subset of data

The final phase involves performing the qualitative analysis to address the research question. As discussed above, researchers may draw on the computational models as a preliminary guide to the data.

Contextualizing the framework within previous qualitative social media studies

The proposed framework generalizes a number of previous approaches (Collins and Nerlich, 2015 ; McKenna et al., 2017 ) and individual studies (e.g., Lewis et al., 2013 ; Newman, 2016 ), in particular that of Marwick ( 2014 ). In Marwick’s general description of qualitative analysis of social media textual corpora, researchers: (1) harvest and compile a corpus, (2) extract a subset of the corpus, and (3) perform a qualitative analysis on the subset. As shown in Fig.  1 , our framework differs in that we introduce formal considerations of relevance, and the use of quantitative techniques to inform the extraction of a subset of data. Although researchers sometimes identify a subset of data most relevant to answering their research question, they seldom deploy data science techniques to identify it. Instead, researchers typically depend on more crude measures to isolate relevant data. For example, researchers have used the number of repostings of user content to quantify influence and recognition (e.g., Newman, 2016 ).

The steps in the framework may not be obvious without a concrete example. Next, we demonstrate our framework by applying it to Australian commentary regarding climate change on Twitter.

Application Example: Australian Commentary regarding Climate Change on Twitter

Social media platform of interest.

We chose to explore user commentary of climate change over Twitter. Twitter activity contains information about: the textual content generated by users (i.e., content of tweets), interactions between users, and the time of content creation (Veltri and Atanasova, 2017 ). This allows us to examine the content of user communication, taking into account the temporal and social contexts of their behavior. Twitter data is relatively easy for researchers to access. Many tweets reside within a public domain, and are accessible through free and accessible APIs.

The characteristics of Twitter’s platform are also favorable for data analysis. An established literature describes computational techniques and considerations for interpreting Twitter data. We used the approaches and findings from other empirical investigations to inform our approach. For example, we drew on past literature to inform the process of identifying which tweets were related to climate change.

Public discussion on climate change

Climate change is one of the greatest challenges facing humanity (Schneider, 2011 ). Steps to prevent and mitigate the damaging consequences of climate change require changes on different political, societal, and individual levels (Lorenzoni & Pidgeon, 2006 ). Insights into public commentary can inform decision making and communication of climate policy and science.

Traditionally, public perceptions are investigated through survey designs and qualitative work (Lorenzoni & Pidgeon, 2006 ). Inquiries into social media allow researchers to explore a large and diverse range of climate change-related dialogue (Auer et al., 2014 ). Yet, existing inquiries of Twitter activity are few in number and typically constrained to specific events related to climate change, such as the release of the Fifth Assessment Report by the Intergovernmental Panel on Climate Change (Newman et al., 2010 ; O’Neill et al., 2015 ; Pearce, 2014 ) and the 2015 United Nations Climate Change Conference, held in Paris (Pathak et al., 2017 ).

When longer time scales are explored, most researchers rely heavily upon computational methods to derive topics of commentary. For example, Kirilenko and Stepchenkova ( 2014 ) examined the topics of climate change tweets posted in 2012, as indicated by the most prevalent hashtags. Although hashtags can mark the topics of tweets, it is a crude measure as tweets with no hashtags are omitted from analysis, and not all topics are indicated via hashtags (e.g., Nugroho, Yang, Zhao, Paris, & Nepal, 2017 ). In a more sophisticated approach, Veltri and Atanasova ( 2017 ) examined the co-occurrence of terms using hierarchical clustering techniques to map the semantic space of climate change tweet content from the year 2013. They identified four themes: (1) “calls for action and increasing awareness”, (2) “discussions about the consequences of climate change”, (3) “policy debate about climate change and energy”, and (4) “local events associated with climate change” (p. 729).

Our research builds on the existing literature in two ways. Firstly, we explore a new data set—Australian tweets over the year 2016. Secondly, in comparison to existing research of Twitter data spanning long time periods, we use qualitative techniques to provide a more nuanced understanding of the topics of climate change. By applying our mixed-methods framework, we address our research question: what are the common topics of Australian’s tweets about climate change?

Outline of approach

We employed our four-phased framework as shown in Fig.  2 . Firstly, we harvested climate change tweets posted in Australia in 2016 and compiled a corpus (phase 1). We then utilized a topic modeling technique (Nugroho et al., 2017 ) to organize the diverse content of the corpus into a number of topics. We were interested in topics which commonly appeared throughout the time period of data collection, and less interested in more transitory topics. To identify enduring topics, we used a topic alignment algorithm (Chuang et al., 2015 ) to group similar topics occurring repeatedly throughout 2016 (phase 2). This process allowed us to identify the topics most relevant to our research question. From each of these, we extracted a manageable subset of data (phase 3). We then performed a qualitative thematic analysis (see Braun & Clarke, 2006 ) on this subset of data to inductively derive themes and answer our research question (phase 4). Footnote 2

figure 2

Flowchart of application of a four-phased framework for conducting qualitative analyses using data science techniques. We were most interested in topics that frequently occurred throughout the period of data collection. To identify these, we organized the corpus chronologically, and divided the corpus into batches of content. Using computational techniques (shown in blue ), we uncovered topics in each batch and identified similar topics which repeatedly occurred across batches. When identifying topics in each batch, we generated three alternative representations of topics (5, 10, and 20 topics in each batch, shown in yellow ). In stages highlighted in green , we determined the quality of these representations, ultimately selecting the five topics per batch solution

Phase 1: Compiling a corpus

To search Australian’s Twitter data, we used CSIRO’s Emergency Situation Awareness (ESA) platform (CSIRO, 2018 ). The platform was originally built to detect, track, and report on unexpected incidences related to crisis situations (e.g., fires, floods; see Cameron, Power, Robinson, & Yin 2012 ). To do so, the ESA platform harvests tweets based on a location search that covers most of Australia and New Zealand.

The ESA platform archives the harvested tweets, which may be used for other CSIRO research projects. From this archive, we retrieved tweets satisfying three criteria: (1) tweets must be associated with an Australian location, (2) tweets must be harvested from the year 2016, and (3) the content of tweets must be related to climate change. We tested the viability of different markers of climate change tweets used in previous empirical work (Jang & Hart, 2015 ; Newman, 2016 ; Holmberg & Hellsten, 2016 ; O’Neill et al., 2015 ; Pearce et al., 2014 ; Sisco et al., 2017 ; Swain, 2017 ; Williams et al., 2015 ) by informally inspecting the content of tweets matching each criteria. Ultimately, we employed five terms (or combinations of terms) reliably associated with climate change: (1) “climate” AND “change”; (2) “#climatechange”; (3) “#climate”; (4) “global” AND “warming”; and (5) “#globalwarming”. This yielded a corpus of 201,506 tweets.

Phase 2: Using data science techniques to compress the corpus along a dimension of relevance

The next step was to organize the collection of tweets into distinct topics. A topic is an abstract representation of semantically related words and concepts. Each tweet belongs to a topic, and each topic may be represented as a list of keywords (i.e., prominent words of tweets belonging to the topic).

A vast literature surrounds the computational derivation of topics within textual corpora, and specifically within Twitter corpora (Ramage et al., 2010 ; Nugroho et al., 2017 ; Fang et al., 2016a ; Chuang et al., 2014 ). Popular methods for deriving topics include: probabilistic latent semantic analysis (Hofmann, 1999 ), non-negative matrix factorization (Lee & Seung, 2000 ), and latent Dirichlet allocation (Blei et al., 2003 ). These approaches use patterns of co-occurrence of terms within documents to derive topics. They work best on long documents. Tweets, however, are short, and thus only a few unique terms may co-occur between tweets. Consequently, approaches which rely upon patterns of term co-occurrence suffer within the Twitter environment. Moreover, these approaches ignore valuable social and temporal information (Nugroho et al., 2017 ). For example, consider a tweet t 1 and its reply t 2 . The reply feature of Twitter allows users to react to tweets and enter conversations. Therefore, it is likely t 1 and t 2 are related in topic, by virtue of the reply interaction.

To address sparsity concerns, we adopt the non-negative matrix inter-joint factorization (NMijF) of Nugroho et al., ( 2017 ). This process uses both tweet content (i.e., the patterns of co-occurrence of terms amongst tweets) and socio-temporal relationship between tweets (i.e., similarities in the users mentioned in tweets, whether the tweet is a reply to another tweet, whether tweets are posted at a similar time) to derive topics (see Supplementary Material ). The NMijF method has been demonstrated to outperform other topic modeling techniques on Twitter data (Nugroho et al., 2017 ).

Dividing the corpus into batches

Deriving many topics across a data set of thousands of tweets is prohibitively expensive in computational terms. Therefore, we divided the corpus into smaller batches and derived the topics of each batch. To keep the temporal relationships amongst tweets (e.g., timestamps of the tweets) the batches were organized chronologically. The data was partitioned into 41 disjoint batches (40 batches of 5000 tweets; one batch of 1506 tweets).

Generating topical representations for each batch

Following standard topic modeling practice, we removed features from each tweet which may compromise the quality of the topic derivation process. These features include: emoticons, punctuation, terms with fewer than three characters, stop-words (for list of stop-words, see MySQL, 2018 ), and phrases used to harvest the data (e.g., “#climatechange”). Footnote 3 Following this, the terms remaining in tweets were stemmed using the Natural Language Toolkit for Python (Bird et al., 2009 ). All stemmed terms were then tokenized for processing.

The NMijF topic derivation process requires three parameters (see Supplementary Material for more details). We set two of these parameters to the recommendations of Nugroho et al., ( 2017 ), based on empirical analysis. The final parameter—the number of topics derived from each batch—is difficult to estimate a priori , and must be made with some care. If k is too small, keywords and tweets belonging to a topic may be difficult to conceptualize as a singular, coherent, and meaningful topic. If k is too large, keywords and tweets belonging to a topic may be too specific and obscure. To determine a reasonable value of k , we ran the NMijF process on each batch with three different levels of the parameter—5, 10, and 20 topics per batch. This process generated three different representations of the corpus: 205, 410, and 820 topics. For each of these representations, each tweet was classified into one (and only one) topic. We represented each topic as a list of ten keywords most prevalent within the tweets of that topic.

Assessing the quality of topical representations

To select a topical representation for further analysis, we inspected the quality of each. Initially, we considered the use of a completely automatic process to assess or produce high quality topic derivations. However, our attempts to use completely automated techniques on tweets with a known topic structure failed to produce correct or reasonable solutions. Thus, we assessed quality using human assessment (see Table  1 ). The first stage involved inspecting each topical representation of the corpus (205, 410, and 820 topics), and manually flagging any topics that were clearly problematic. Specifically, we examined each topical representation to determine whether topics represented as separate were in fact distinguishable from one another. We discovered that the 820 topic representation (20 topics per batch) contained many closely related topics.

To quantify the distinctiveness between topics, we compared each topic to each other topic in the same batch in an automated process. If two topics shared three or more (of ten) keywords, these topics were deemed similar. We adopted this threshold from existing topic modeling work (Fang et al., 2016a , b ), and verified it through an informal inspection. We found that pairs of topics below this threshold were less similar than those equal to or above it. Using this threshold, the 820 topic representation was identified as less distinctive than other representations. Of the 41 batches, nine contained at least two similar topics for the 820 topic representation (cf., 0 batches for the 205 topic representation, two batches for the 410 topic representation). As a result, we chose to exclude the representation from further analysis.

The second stage of quality assessment involved inspecting the quality of individual topics. To achieve this, we adopted the pairwise topic preference task outlined by Fang et al. ( 2016a , b ). In this task, raters were shown pairs of two similar topics (represented as ten keywords), one from the 205 topic representation and the other from the 410 topic representation. To assist in their interpretation of topics, raters could also view three tweets belonging to each topic. For each pair of topics, raters indicated which topic they believed was superior, on the basis of coherency, meaning, interpretability, and the related tweets (see Table  1 ). Through aggregating responses, a relative measure of quality could be derived.

Initially, members of the research team assessed 24 pairs of topics. Results from the task did not indicate a marked preference for either topical representation. To confirm this impression more objectively, we recruited participants from the Australian community as raters. We used Qualtrics—an online survey platform and recruitment service—to recruit 154 Australian participants, matched with the general Australian population on age and gender. Each participant completed judgments on 12 pairs of similar topics (see Supplementary Material for further information).

Participants generally preferred the 410 topic representation over the 205 topic representation ( M = 6.45 of 12 judgments, S D = 1.87). Of 154 participants, 35 were classified as indifferent (selected both topic representations an equal number of times), 74 preferred the 410 topic representation (i.e., selected the 410 topic representation more often than the 205 topic representation), and 45 preferred the 205 topic representation (i.e., selected the 205 topic representation more often that the 410 topic representation). We conducted binomial tests to determine whether the proportion of participants of the three just described types differed reliably from chance levels (0.33). The proportion of indifferent participants (0.23) was reliably lower than chance ( p = 0.005), whereas the proportion of participants preferring the 205 topic solution (0.29) did not differ reliably from chance levels ( p = 0.305). Critically, the proportion of participants preferring the 410 topic solution (0.48) was reliably higher than expected by chance ( p < 0.001). Overall, this pattern indicates a participant preference for the 410 topic representation over the 205 topic representation.

In summary, no topical representation was unequivocally superior. On a batch level, the 410 topic representation contained more batches of non-distinct topic solutions than the 205 topic representation, indicating that the 205 topic representation contained topics which were more distinct. In contrast, on the level of individual topics, the 410 topic representation was preferred by human raters. We use this information, in conjunction with the utility of corresponding aligned topics (see below), to decide which representation is most suitable for our research purposes.

Grouping similar topics repeated in different batches

We were most interested in topics which occurred throughout the year (i.e., in multiple batches) to identify the most stable components of climate change commentary (phase 3). We grouped similar topics from different batches using a topical alignment algorithm (see Chuang et al. 2015 ). This process requires a similarity metric and a similarity threshold. The similarity metric represents the similarity between two topics, which we specified as the proportion of shared keywords (from 0, no keywords shared, to 1, all ten keywords shared). The similarity threshold is a value below which two topics were deemed dissimilar. As above, we set the threshold to 0.3 (three of ten keywords shared)—if two topics shared two or fewer keywords, the topics could not be justifiably classified as similar. To delineate important topics, groups of topics, and other concepts we have provided a glossary of terms in Table  2 .

The topic alignment algorithm is initialized by assigning each topic to its own group. The alignment algorithm iteratively merges the two most similar groups, where the similarity between groups is the maximum similarity between a topic belonging to one group and another topic belonging to the other. Only topics from different groups (by definition, topics from the same group are already grouped as similar) and different batches (by definition, topics from the same batch cannot be similar) can be grouped. This process continues, merging similar groups until no compatible groups remain. We found our initial implementation generated groups of largely dissimilar topics. To address this, we introduced an additional constraint—groups could only be merged if the mean similarity between pairs of topics (each belonging to the two groups in question) was greater than the similarity threshold. This process produced groups of similar topics. Functionally, this allowed us to detect topics repeated throughout the year.

We ran the topical alignment algorithm across both the 205 and 410 topic representations. For the 205 and 410 topic representation respectively, 22.47 and 31.60% of tweets were not associated with topics that aligned with others. This exemplifies the ephemeral and dynamic attributes of Twitter activity: over time, the content of tweets shifts, with some topics appearing only once throughout the year (i.e., in only one batch). In contrast, we identified 42 groups (69.77% of topics) and 101 groups (62.93% of topics) of related topics for the 205 and 410 topic representations respectively, occurring across different time periods (i.e., in more than one batch). Thus, both representations contained transient topics (isolated to one batch) and recurrent topics (present in more than one batch, belonging to a group of two or more topics).

Identifying topics most relevant for answering our research question

For the subsequent qualitative analyses, we were primarily interested in topics prevalent throughout the corpus. We operationalized prevalent topic groupings as any grouping of topics that spanned three or more batches. On this basis, 22 (57.50% of tweets) and 36 (35.14% of tweets) groupings of topics were identified as prevalent for the 205 and 410 topic representations, respectively (see Table  3 ). As an example, consider the prevalent topic groupings from the 205 topic representation, shown in Table  3 . Ten topics are united by commentary on the Great Barrier Reef (Group 2)—indicating this facet of climate change commentary was prevalent throughout the year. In contrast, some topics rarely occurred, such as a topic concerning a climate change comic (indicated by the keywords “xkcd” and “comic”) occurring once and twice in the 205 and 410 topic representation, respectively. Although such topics are meaningful and interesting, they are transient aspects of climate change commen tary and less relevant to our research question. In sum, topic modeling and grouping algorithms have allowed us to collate massive amounts of information, and identify components of the corpus most relevant to our qualitative inquiry.

Selecting the most favorable topical representation

At this stage, we have two complete and coherent representations of the corpus topics, and indications of which topics are most relevant to our research question. Although some evidence indicated that the 410 topic representation contains topics of higher quality, the 205 topic representation was more parsimonious on both the level of topics and groups of topics. Thus, we selected the 205 topic representation for further analysis.

Phase 3. Extract a subset of data

Extracting a subset of data from the selected topical representation.

Before qualitative analysis, researchers must extract a subset of data manageable in size. For this process, we concerned ourselves with only the content of prevalent topic groupings, seen in Table  3 . From each of the 22 prevalent topic groupings, we randomly sampled ten tweets. We selected ten tweets as a trade-off between comprehensiveness and feasibility. This thus reduced our data space for qualitative analysis from 201,423 tweets to 220.

Phase 4: Perform qualitative analysis

Perform thematic analysis.

In the final phase of our analysis, we performed a qualitative thematic analysis (TA; Braun & Clarke, 2006 ) on the subset of tweets sampled in phase 3. This analysis generated distinct themes, each of which answers our research question: what are the common topics of Australian’s tweets about climate change? As such, the themes generated through TA are topics. However, unlike the topics derived from the preceding computational approaches, these themes are informed by the human coder’s interpretation of content and are oriented towards our specific research question. This allows the incorporation of important diagnostic information, including the broader socio-political context of discussed events or terms, and an understanding (albeit, sometimes ambiguous) of the underlying latent meaning of tweets.

We selected TA as the approach allows for flexibility in assumptions and philosophical approaches to qualitative inquiries. Moreover, the approach is used to emphasize similarities and differences between units of analysis (i.e., between tweets) and is therefore useful for generating topics. However, TA is typically applied to lengthy interview transcripts or responses to open survey questions, rather than small units of analysis produced through Twitter activity. To ease the application of TA to small units of analysis, we modified the typical TA process (shown in Table  4 ) as follows.

Firstly, when performing phases 1 and 2 of TA, we initially read through each prevalent topic grouping’s tweets sequentially. By doing this, we took advantage of the relative homogeneity of content within topics. That is, tweets sharing the same topic will be more similar in content than tweets belonging to separate topics. When reading ambiguous tweets, we could use the tweet’s topic (and other related topics from the same group) to aid comprehension. Through the scaffold of topic representations, we facilitated the process of interpreting the data, generating initial codes, and deriving themes.

Secondly, the prevalent topic groupings were used to create initial codes and search for themes (TA phase 2 and 3). For example, the groups of topics indicate content of climate change action (group 1), the Great Barrier Reef (group 2), climate change deniers (group 3), and extreme weather (group 5). The keywords characterizing these topics were used as initial codes (e.g., “action”, “Great Barrier Reef”, “Paris Agreement”, “denial”). In sum, the algorithmic output provided us with an initial set of codes and an understanding of the topic structure that can indicate important features of the corpus.

A member of the research team performed this augmented TA to generate themes. A second rater outside of the research team applied the generated themes to the data, and inter-rater agreement was assessed. Following this, the two raters reached a consensus on the theme of each tweet.

Through TA, we inductively generated five distinct themes. We assigned each tweet to one (and only one) theme. A degree of ambiguity is involved in designating themes for tweets, and seven tweets were too ambiguous to subsume into our thematic framework. The remaining 213 tweets were assigned to one of five themes shown in Table  5 .

In an initial application of the coding scheme, the two raters agreed upon 161 (73.181%) of 220 tweets. Inter-rater reliability was satisfactory, Cohen’s κ = 0.648, p < 0.05. An assessment of agreement for each theme is presented in Table  5 . The proportion of agreement is the total proportion of observations where the two coders both agreed: (1) a tweet belonged to the theme, or (2) a tweet did not belong to the theme. The proportion of specific agreement is the conditional probability that a randomly selected rater will assign the theme to a tweet, given that the other rater did (see Supplementary Material for more information). Theme 3, theme 5, and the N/A categorization had lower levels of agreement than the remaining themes, possibly as tweets belonging to themes 3 and 5 often make references to content relevant to other themes.

Theme 1. Climate change action

The theme occurring most often was climate change action, whereby tweets were related to coping with, preparing for, or preventing climate change. Tweets comment on the action (and inaction) of politicians, political parties, and international cooperation between government, and to a lesser degree, industry, media, and the public. The theme encapsulated commentary on: prioritizing climate change action (“ Let’s start working together for real solutions on climate change ”); Footnote 4 relevant strategies and policies to provide such action (“ #OurOcean is absorbing the majority of #climatechange heat. We need #marinereserves to help build resilience. ”); and the undertaking (“ Labor will take action on climate change, cut pollution, secure investment & jobs in a growing renewables industry ”) or disregarding (“ act on Paris not just sign ”) of action.

Often, users were critical of current or anticipated action (or inaction) towards climate change, criticizing approaches by politicians and governments as ineffective (“ Malcolm Turnbull will never have a credible climate change policy ”), Footnote 5 and undesirable (“ Govt: how can we solve this vexed problem of climate change? Helpful bystander: u could not allow a gigantic coal mine. Govt: but srsly how? ”). Predominately, users characterized the government as unjustifiably paralyzed (“ If a foreign country did half the damage to our country as #climatechange we would declare war. ”), without a leadership focused on addressing climate change (“ an election that leaves Australia with no leadership on #climatechange - the issue of our time! ”).

Theme 2. Consequences of climate change

Users commented on the consequences and risks attributed to climate change. This theme may be further categorized into commentary of: physical systems, such as changes in climate, weather, sea ice, and ocean currents (“ Australia experiencing more extreme fire weather, hotter days as climate changes ”); biological systems, such as marine life (particularly, the Great Barrier Reef) and biodiversity (“ Reefs of the future could look like this if we continue to ignore #climatechange ”); human systems (“ You and your friends will die of old age & I’m going to die from climate change ”); and other miscellaneous consequences (“ The reality is, no matter who you supported, or who wins, climate change is going to destroy everything you love ”). Users specified a wide range of risks and impacts on human systems, such as health, cultural diversity, and insurance. Generally, the consequences of climate change were perceived as negative.

Theme 3. Conversations on climate change

Some commentary centered around discussions of climate change communication, debates, art, media, and podcasts. Frequently, these pertained to debates between politicians (“ not so gripping from No Principles Malcolm. Not one mention of climate change in his pitch. ”) and television panel discussions (“ Yes let’s all debate whether climate change is happening... #qanda ”). Footnote 6 Users condemned the climate change discussions of federal government (“ Turnbull gov echoes Stalinist Russia? Australia scrubbed from UN climate change report after government intervention ”), those skeptical of climate change (“ Trouble is climate change deniers use weather info to muddy debate. Careful???????????????? ”), and media (“ Will politicians & MSM hacks ever work out that they cannot spin our way out of the #climatechange crisis? ”). The term “climate change” was critiqued, both by users skeptical of the legitimacy of climate change (“ Weren’t we supposed to call it ‘climate change’ now? Are we back to ‘global warming’ again? What happened? Apart from summer? ”) and by users seeking action (“ Maybe governments will actually listen if we stop saying “extreme weather” & “climate change” & just say the atmosphere is being radicalized ”).

Theme 4. Climate change deniers

The fourth theme involved commentary on individuals or groups who were perceived to deny climate change. Generally, these were politicians and associated political parties, such as: Malcolm Roberts (a climate change skeptic, elected as an Australian Senator in 2016), Malcolm Turnbull, and Donald Trump. Commentary focused on the beliefs and legitimacy of those who deny the science of climate change (“ One Nation’s Malcolm Roberts is in denial about the facts of climate change ”) or support the denial of climate change science (“ Meanwhile in Australia... Malcolm Roberts, funded by climate change skeptic global groups loses the plot when nobody believes his findings ”). Some users advocated attempts to change the beliefs of those who deny climate change science (“ We have a president-elect who doesn’t believe in climate change. Millions of people are going to have to say: Mr. Trump, you are dead wrong ”), whereas others advocated disengaging from conversation entirely (“ You know I just don’t see any point engaging with climate change deniers like Roberts. Ignore him ”). In comparison to other themes, commentary revolved around individuals and their beliefs, rather than the phenomenon of climate change itself.

Theme 5. The legitimacy of climate change and climate science

Using our four-phased framework, we aimed to identify and qualitatively inspect the most enduring aspects of climate change commentary from Australian posts on Twitter in 2016. We achieved this by using computational techniques to model 205 topics of the corpus, and identify and group similar topics that repeatedly occurred throughout the year. From the most relevant topic groupings, we extracted a subsample of tweets and identified five themes with a thematic analysis: climate change action, consequences of climate change, conversations on climate change, climate change deniers, and the legitimacy of climate change and climate science. Overall, we demonstrated the process of using a mixed-methodology that blends qualitative analyses with data science methods to explore social media data.

Our workflow draws on the advantages of both quantitative and qualitative techniques. Without quantitative techniques, it would be impossible to derive topics that apply to the entire corpus. The derived topics are a preliminary map for understanding the corpus, serving as a scaffold upon which we could derive meaningful themes contextualized within the wider socio-political context of Australia in 2016. By incorporating quantitatively-derived topics into the qualitative process, we attempted to construct themes that would generalize to a larger, relevant component of the corpus. The robustness of these themes is corroborated by their association with computationally-derived topics, which repeatedly occurred throughout the year (i.e., prevalent topic groupings). Moreover, four of the five themes have been observed in existing data science analyses of Twitter climate change commentary. Within the literature, the themes of climate change action and consequences of climate change are common (Newman, 2016 ; O’Neill et al., 2015 ; Pathak et al., 2017 ; Pearce, 2014 ; Jang and Hart, 2015 ; Veltri & Atanasova, 2017 ). The themes of the legitimacy of climate change and climate science (Jang & Hart, 2015 ; Newman, 2016 ; O’Neill et al., 2015 ; Pearce, 2014 ) and climate change deniers (Pathak et al., 2017 ) have also been observed. The replication of these themes demonstrates the validity of our findings.

One of the five themes—conversations on climate change—has not been explicitly identified in existing data science analyses of tweets on climate change. Although not explicitly identifying the theme, Kirilenko and Stepchenkova ( 2014 ) found hashtags related to public conversations (e.g., “#qanda”, “#Debates”) were used frequently throughout the year 2012. Similar to the literature, few (if any) topics in our 205 topic solution could be construed as solely relating to the theme of “conversation”. However, as we progressed through the different phases of the framework, the theme became increasingly apparent. By the grouping stage, we identified a collection of topics unified by a keyword relating to debate. The subsequent thematic analysis clearly discerned this theme. The derivation of a theme previously undetected by other data science studies lends credence to the conclusions of Guetterman et al., ( 2018 ), who deduced that supplementing a quantitative approach with a qualitative technique can lead to the generation of more themes than a quantitative approach alone.

The uniqueness of a conversational theme can be accounted for by three potentially contributing factors. Firstly, tweets related to conversations on climate change often contained material pertinent to other themes. The overlap between this theme and others may hinder the capabilities of computational techniques to uniquely cluster these tweets, and undermine the ability of humans to reach agreement when coding content for this theme (indicated by the relatively low proportion of specific agreement in our thematic analysis). Secondly, a conversational theme may only be relevant in election years. Unlike other studies spanning long time periods (Jang and Hart, 2015 ; Veltri & Atanasova, 2017 ), Kirilenko and Stepchenkova ( 2014 ) and our study harvested data from US presidential election years (2012 and 2016, respectively). Moreover, an Australian federal election occurred in our year of observation. The occurrence of national elections and associated political debates may generate more discussion and criticisms of conversations on climate change. Alternatively, the emergence of a conversational theme may be attributable to the Australian panel discussion television program Q & A. The program regularly hosts politicians and other public figures to discuss political issues. Viewers are encouraged to participate by publishing tweets using the hashtag “#qanda”, perhaps prompting viewers to generate uniquely tagged content not otherwise observed in other countries. Importantly, in 2016, Q & A featured a debate on climate change between science communicator Professor Brian Cox and Senator Malcolm Roberts, a prominent climate science skeptic.

Although our four-phased framework capitalizes on both quantitative and qualitative techniques, it still has limitations. Namely, the sparse content relationships between data points (in our case, tweets) can jeopardize the quality and reproducibility of algorithmic results (e.g., Chuang et al., 2015 ). Moreover, computational techniques can require large computing resources. To a degree, our application mitigated these limitations. We adopted a topic modeling algorithm which uses additional dimensions of tweets (social and temporal) to address the influence of term-to-term sparsity (Nugroho et al., 2017 ). To circumvent concerns of computing resources, we partitioned the corpus into batches, modeled the topics in each batch, and grouped similar topics together using another computational technique (Chuang et al., 2015 ).

As a demonstration of our four-phased framework, our application is limited to a single example. For data collection, we were able to draw from the procedures of existing studies which had successfully used keywords to identify climate change tweets. Without an existing literature, identifying diagnostic terms can be difficult. Nevertheless, this demonstration of our four-phased framework exemplifies some of the critical decisions analysts must make when utilizing a mixed-method approach to social media data.

Both qualitative and quantitative researchers can benefit from our four-phased framework. For qualitative researchers, we provide a novel vehicle for addressing their research questions. The diversity and volume of content of social media data may be overwhelming for both the researcher and their method. Through computational techniques, the diversity and scale of data can be managed, allowing researchers to obtain a large volume of data and extract from it a relevant sample to conduct qualitative analyses. Additionally, computational techniques can help researchers explore and comprehend the nature of their data. For the quantitative researcher, our four-phased framework provides a strategy for formally documenting the qualitative interpretations. When applying algorithms, analysts must ultimately make qualitative assessments of the quality and meaning of output. In comparison to the mathematical machinery underpinning these techniques, the qualitative interpretations of algorithmic output are not well-documented. As these qualitative judgments are inseparable from data science, researchers should strive to formalize and document their decisions—our framework provides one means of achieving this goal.

Through the application of our four-phased framework, we contribute to an emerging literature on public perceptions of climate change by providing an in-depth examination of the structure of Australian social media discourse. This insight is useful for communicators and policy makers hoping to understand and engage the Australian online public. Our findings indicate that, within Australian commentary on climate change, a wide variety of messages and sentiment are present. A positive aspect of the commentary is that many users want action on climate change. The time is ripe it would seem for communicators to discuss Australia’s policy response to climate change—the public are listening and they want to be involved in the discussion. Consistent with this, we find some users discussing conversations about climate change as a topic. Yet, in some quarters there is still skepticism about the legitimacy of climate change and climate science, and so there remains a pressing need to implement strategies to persuade members of the Australian public of the reality and urgency of the climate change problem. At the same time, our analyses suggest that climate communicators must counter the sometimes held belief, expressed in our second theme on climate change consequences, that it is already too late to solve the climate problem. Members of the public need to be aware of the gravity of the climate change problem, but they also need powerful self efficacy promoting messages that convince them that we still have time to solve the problem, and that their individual actions matter.

On Twitter, users may precede a phrase with a hashtag (#). This allows users to signify and search for tweets related to a specific theme.

The analysis of this study was preregistered on the Open Science Framework: https://osf.io/mb8kh/ . See the Supplementary Material for a discussion of discrepancies. Analysis scripts and interim results from computational techniques can be found at: https://github.com/AndreottaM/TopicAlignment .

83 tweets were rendered empty and discarded from the corpus.

The content of tweet are reported verbatim. Sensitive information is redacted.

Malcolm Turnbull was the Prime Minister of Australia during the year 2016.

“ #qanda ” is a hashtag used to refer to Q & A, an Australian panel discussion television program.

Commonwealth Scientific and Industrial Research Organisation (CSIRO) is the national scientific research agency of Australia.

Google Scholar  

Article   Google Scholar  

Article   PubMed   PubMed Central   Google Scholar  

Article   PubMed   Google Scholar  

You can also search for this author in PubMed   Google Scholar

Rights and permissions.

Reprints and permissions

About this article

Download citation

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

A qualitative study on negative experiences of social media use and harm reduction strategies among youths in a multi-ethnic Asian society

Ellaisha samari.

1 Research Division, Institute of Mental Health, Singapore, Singapore

Sherilyn Chang

Esmond seow, yi chian chua.

2 Department of Psychosis, Institute of Mental Health, Singapore, Singapore

Mythily Subramaniam

3 Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore

Rob M. van Dam

4 Departments of Exercise and Nutrition Sciences and Epidemiology, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States of America

Swapna Verma

Janhavi ajit vaingankar, associated data.

This study aimed to expand and inform the emerging body of research on the negative experiences of social media use among youths and how youths deal with them, in an Asian setting, using a qualitative approach.

Data were collected using 11 focus group discussions (FGDs) and 25 semi-structured interviews (SIs) among youths aged 15 to 24 years residing in Singapore who were recruited via purposive sampling. Data were analysed using thematic analysis.

The salient negative effects mentioned by participants include the development of negative reactions and feelings from upward comparisons with others (e.g., others’ achievements and lifestyle), receiving hurtful comments, exposure to controversial content (e.g., political events and social movements), as well as the perpetuation of negative feelings, behaviours, and sentiments (e.g., rumination, unhealthy eating behaviour, and self-harm). Participants also described strategies which they have employed or deemed to be useful in mitigating the negative effects of social media use. These include filtering content and users, taking breaks from social media, cognitive reframing, and self-affirmation, where they identify and change stress-inducing patterns of thinking by setting realistic social, physical, and lifestyle expectations for themselves, and focusing on self-development.

The current results highlight that while youths experience negative effects of social media use, they have high media literacy and have employed strategies that appear to mitigate the negative effects of social media use. The findings can inform various stakeholders involved in helping youths navigate the harms of social media use or provide directions for intervention studies aimed at reducing the harms of social media use.


The marked rise in social media use today exemplifies the evolution of the digital landscape. Social media platforms such as Facebook and YouTube continue to dominate the online scene with an estimated 2.9 and 2.1 billion monthly active users respectively [ 1 , 2 ]. Other platforms such as Instagram and TikTok, developed later, have since gained traction, especially among younger audiences with 120 and 100 million active users respectively [ 3 – 5 ]. Using social media sites can arguably be a norm of growing up in the digital age, especially among youths [ 6 ]. This is reflected in research conducted in 2019 among the younger population of the US where 85% of teenagers were reported using YouTube, while 72% used Instagram and 69% used Snapchat [ 7 ].

Social media platforms have drastically changed the way people socialize, share information, present themselves, perceive others, and work [ 8 ]. It is an influential and integral element in today’s interaction and communication which is readily available and easily accessible through multiple devices such as smartphones, computers, and tablets. Furthermore, persistent cues through notifications and variable reward mechanism (e.g., social validation for social media post through likes and comments, infinite scroll etc.) encourage greater social media use. Naturally, this phenomenon has sparked interest among researchers to examine the experiences of social media use, particularly among youths given new social dynamics [ 9 ] and the intricate period of transition to adulthood where young people are experiencing significant developmental and psychosocial shifts including identity exploration [ 10 , 11 ]. It is also a period where youths are experiencing intensified peer relationships, seeking romantic relationships, and engaging with potential or current partners [ 10 ] and may be motivated to gain attention from their peers or to observe peers’ self-presentations.

Arguably, the harms or benefits of social media use depends on how and why people use them, and who uses them. Social media sites provide opportunities for youths to develop and maintain social relationships [ 12 ], cultivate a sense of belongingness, present themselves to others [ 13 ], keep up to date [ 14 ] and even learn about sexual health and identity [ 15 ]. Some studies have reported associations between social media use and psychological well-being, which is a state of wellness where an individual is feeling good and functioning well based on having positive relationships with others, environmental mastery, self-acceptance, personal growth, autonomy, sense of purpose in life and experiencing positive emotions such as happiness and contentment [ 16 , 17 ]. Specifically, judicious use of social media has been found to be associated with several positive psychosocial outcomes such as increased quality of friendship [ 16 ] and social support [ 17 ]. On the other hand, social media use has also been associated with negative experiences such as stress, social isolation, cyberbullying, and mental health issues including depression, anxiety, poor body image and disordered eating [ 18 ], or found to have no substantive links to mental health issues [ 19 – 22 ]. For example, a longitudinal study by Heffer et al. [ 20 ] found that social media use did not predict future depressive symptoms and adolescent girls who experience depressive symptoms tend to use more social media across time, and not vice versa. In another longitudinal study by Coyne et al. [ 19 ], increased time spent on social media was not associated with increased depression or anxiety across adolescents’ developmental period at an individual level.

The literature on social media use has been primarily focused on the association between social media use and its impact on users’ well-being, while factors associated with harm reduction or how young people manage or reduce the harms of its use have received less attention. Notwithstanding, spending less time on social media sites or using social media mindfully was perceived by youths to help mitigate the negative experiences from social media use [ 23 , 24 ]. In addition, high levels of confidence, high level of media literacy and appreciation of individual differences appeared to mitigate the potential negative effects of social media exposure on body image among adolescent girls [ 25 ]. Findings from these studies highlight some perceived effective means of mitigating the negative experiences of social media use. Nonetheless, given the dearth of research on harm reduction strategies for negative experiences of social media use, the current study aimed to contribute to the emerging evidence base on negative experiences of social media use among youths and ways youths reduce the harms of social media use in Asia. Identifying harm reduction strategies is critical as it offers insights into how young people mitigate these negative effects which can inform the development and implementation of interventions aimed at improving health outcomes. A qualitative approach will allow a deeper understanding of the psychological mechanisms and processes (i.e., how and under what circumstances) behind youths’ negative experiences and the ways they deal with them. Thus, we used a qualitative approach to explore the broad experiences of young people in Singapore with social media use to answer the following questions– 1) What are the negative experiences of social media use? 2) What are the strategies youths employ to reduce these harms?

A purposive sampling design was used to obtain the study sample of young people aged 15–24 years, with an approximately equivalent proportion of men and women as well as those belonging to age groups 15–19 and 20–24 years, and three main ethnic groups in Singapore (Chinese, Malay and Indian). Initially, referrals for these participants were sought from colleagues and acquaintances who were provided with the study brochures. Subsequently, participants who had participated were also given the study brochures to disseminate to others and initiate snowball recruitment. Efforts were also made to include young people who had experiences of psychological distress, school drop-out or risky behaviours (e.g., substance use, gang involvement, and incarceration) to ensure greater diversity of the study sample. Referrals for these participants were sought from community-based youth welfare organisations which are providing services to clients with these experiences. Written informed consent was obtained from all participants before they participated in the study. For participants aged below 21 years, parental consent was also obtained. Ethical approval was obtained from the National Healthcare Group’s Domain Specific Review Board (DSRB No. 2020/0228).

Data were collected through a combination of semi-structured interviews (SIs) and focused group discussions (FGDs) conducted in English and via online videoconferencing using the Zoom platform (11 FGDs, 21 SIs) or in person (4 SIs) between May 2020 and November 2020. Almost all SIs and FGDs were conducted by the lead female researcher [JAV] who has extensive experience in conducting qualitative studies, while some were conducted by the other trained study team members [ES1, SC, ES2, and YCC] who have also received training in qualitative research and had prior experience conducting qualitative interviews. Data from this study originates from a larger study which examined youths’ interpretation of positive mental health and its associated pathways. As part of that study, participants were asked questions relating to social media use (as seen in Table 1 ) and their mental health including their or their peers’ experience of any pleasant or unpleasant experiences/incidents on social media, their feelings arising from those experiences/incidents and how it influenced their or their peers’ mental health. Interviews ceased once data saturation was reached, where no new information was observed and collected. The present study explored data gathered on the harms of social media use and the strategies youths employ or perceive to be useful against these harms.

• Tell me about your experience of using social media.
• What are some of the social media platforms that you often use?
• What do you or your friends usually do on these platforms? How does it make you/your friends feel?
• What about your friends? Any pleasant or unpleasant experiences/ incidents that you can recall in relation to social media? How does it make them feel?
• What are some of the strategies you use when engaging with social media?
• You mentioned that social media can affect [experience]. How can you or one overcome this? What are some of the strategies you can use to maintain positive mental health? Activity (only for FGDs): “When I use social media, I feel…”
• When did you/your friends feel [experience/emotion]? Can you/someone describe any such incident or experience? How did it influence your/their mental health?

All FGDs and SIs were audio-recorded and transcribed verbatim. Data management and coding were conducted on NVivo 11 software. Data were analysed using thematic analysis as informed by Braun and Clark [ 26 ]. Following the inductive approach, five study team members [ES1, SC, ES2, YCC, JAV] identified preliminary codes on the harms of social media use and strategies to avoid those harms from the first six transcripts using the open coding method [ 27 ]. Through thorough and iterative discussions within the study team, codes were generated into higher-order concepts (themes and sub-themes) based on their common properties and formed an initial codebook. Regular discussions were carried out to review, refine and build consensus on the final codebook, which served as a framework for coding of remaining transcripts [ 28 ]. A high level of consensus was reached during the framework development process.

Data from the present study comprise 36 data units– 11 FGDs and 25 SIs. A total of 95 young people (51 women and 44 men; mean age = 20.1 years) participated in the study. Participants belonged to Chinese (n = 32), Malay (n = 27), Indian (n = 28) and other ethnicities (n = 8) such as Filipino or Burmese. Among them, eight participants had a history of psychological distress, school drop-out or risky behaviour (e.g., substance use, gang involvement and incarceration). Table 2 displays information on participants’ sociodemographic backgrounds.

Focus group discussions (n = 11; 70 participants)Semi-structured interviews (n = 25)Total
19.8 2.521.0 2.620.1 2.5

1 Institute of Technical Education

Social media use

All participants reported having experience with social media use. The primary purposes of using social media included sharing information, gathering information, connecting with others, maintaining relationships, and entertainment which occurred on various platforms including Instagram, Facebook, TikTok, Reddit, Twitter, and YouTube. The following results are presented as themes encapsulating the various negative experiences and impact of social media use as experienced and perceived by participants or their peers, as well as practised and perceived mitigations against these harms.

Negative experiences of social media use

Theme 1 : Experience of negative emotions and behaviour from upward comparisons with others . A common theme in the interview was the development of negative reactions from upward comparisons with others. Comparisons with others’ achievements (e.g., school-related, work-related), material possessions, and experiences (e.g., travels), were more saliently discussed during the interviews as opposed to physical appearance, which was noticeably mentioned more by female participants than male participants. Participants noted that comparisons were usually made with peers, social media influencers, and celebrities with narratives on comparisons to peers and social media influencers being more pronounced. The resulting negative reactions to these comparisons included the development of negative feelings (e.g., feelings of inferiority, insecurity, self-consciousness, hurt, and loneliness), negative behaviours (e.g., self-loathing, starving, engaging in unhealthy competitions and overly intense workouts), and negative body image (e.g., idealizing skinny body types).

“ Social media is a platform where there are many many different beauty standards , and I can’t help but tend to compare myself with them . I think over the past one year or so , I tried many , many , many ways to lose weight because I felt that I wasn’t good enough to the point that like I lost too much weight which wasn’t good for myself . ”–FGD03 “ When you’re looking at like , maybe the Instastory of like , some influencer or like , and stuff like that , or people who are like , on holidays and whatnot , then kind of make you feel as though like , you are inadequate or you are missing out or you’re not doing something right , that kind of puts on unnecessary trigger or stress on you . ”–SI10

Furthermore, comparisons with others can result in the erosion of individuality when users feel compelled to follow the trend or be like other users just to fit in. This desire to belong can inadvertently fuel feelings of insecurity.

“ … it’s very natural to then feel , "Okay . We must do this as well because that’s what a lot of our generation people are trying to do . " We’re trying to follow other people . We’re trying to follow to be like someone else . And along that process , we either worsen the insecurities we already have or we don’t feel confident about ourselves . We don’t feel good about ourselves . We don’t want to be ourselves”–FGD10

Theme 2 : Experience of negative emotions from receiving hurtful remarks . Some participants recounted being emotionally affected by hurtful remarks directed towards them or having witnessed their friends suffer from such insults. These hurtful remarks stem from varying sources, including disagreement with or disapproval towards the content (e.g., activities, opinions, or comments) they had shared, and come from both known and unknown contacts. Furthermore, the anonymity afforded by fake social media accounts allows unbounded and harsh criticisms without consequences towards the perpetrator. Following that experience, some removed the content shared or deactivated their accounts.

“ One of my friends , she posted her point of view about this situation and other things . Then there’s this another person who just created another fake account and went on to talk shit about her and attack her saying that "Oh , you shouldn’t have this point of view . Why are you acting this way , " all that stuff . Then it sort of affected her mental health… she was really affected by it that she had to deactivate her account for a couple of weeks until she got back online again–SI14 “ I removed them [family] but they somehow managed to stalk me still . They started talking about me–that I’m sharing my personal life in social media and disgracing the family . So , they caught me and I had to remove all of the videos because it was really very stressful for me . That’s one of the bad experiences . ”–SI21

A participant also recounted how easy it was to have been a bully on social media:

“ …talking to people online through the screen is so much easier . So , it also increases cyberbullying . Yeah , it’s so easy to cyberbully someone because it’s just like… and then just send . Yeah , I’m sorry to say this , but I was a bully also . Yeah , it felt very bad , but I’m not that person anymore . ”–SI07

Theme 3 : Experience of negative emotions from exposure to controversial content . Some participants described experiencing or seeing their peers experience negative feelings from exposure to controversial content. While a broad range of content was discussed, the prominent ones leaned towards global political events (e.g., repression of Uyghurs), natural or man-made disasters (e.g., Beirut explosion), social issues (e.g., misogynistic content and animal cruelty), and social movements (e.g., Black Lives Matter). Consumption of this controversial content had left them feeling disturbed, drained, helpless, overwhelmed, and pessimistic.

“ I see a lot of my friends online , they might overdo it and it might like start to drain them (be)cause they tend to watch a lot of such disturbing videos and- so I think last week , I was speaking to someone and they were saying they feel very upset , and I asked them why . They were like they were watching a lot of animal cruelty videos , and it really affect them a lot”–SI08 “ …I feel like certain situations are full of injustice , for example , the Uyghur situation or how people criticize Muslims a lot in this world . And it just frustrates me because a lot of , yeah , misunderstandings are due to them not wanting to educate themselves . Like the comments are so ignorant and it’s just so frustrating . ”–FGD07

Theme 4 : Perpetuation of negative emotions , behaviours , and sentiments . The perpetuation of negativity through social media use was commonly mentioned by participants. First, social media platforms enable rumination through posts or stories. Second, negative sentiments are sometimes echoed by like-minded communities and exacerbated. Third, social media can encourage the perpetuation of negative behaviours such as self-harm, unhealthy eating behaviour and unhealthy coping styles when users follow self-harm content/accounts and emulate them.

“ If they’re feeling down right , instead of looking at something positive , they end up going on Reddit , and they look at—they enter those communities that are all the same people , and it becomes an echo chamber of negative things , and it gets worse because it’s just the same kind of people . ” FGD08 “ It’s very , very sad because you see that their (friends with eating disorders) whole lives are just engrossed and obsessed with numbers , calories , food , exercise , all these things . And it’s really very , very , very sad and disheartening to see that social media has such a huge role in worsening these things . ”–SI16 “ I work mostly with youths at risk , and the kind of content that you follow online is always more towards the emo genre . Sometimes , it does have the elements of suicide or taking a life , or like self-harm , and all those various aspects , and that became their way of dealing with stress , either they learn to take the action from there , or it just comes ingrained , and they do something similar towards the destructive and that’s maladaptive . ”–FGD08

Mitigation of negative effects of social media use

Theme 5 : Filtering content and users . Most participants described curating content or being selective of users they follow as means to reduce any negative effect social media use can have on their mental health. These include choosing only positive content to follow from the start, filtering out accounts followed (e.g., unfollowing accounts which trigger negative feelings), muting selected users’ posts/stories on social media, and scrolling past negative content.

“ I don’t really feel that way [negative feelings] when I use social media because , yeah , why would I want to make myself feel so bad ? Yeah , I tend to go for the more positive kind of posts instead of those that kind of puts you down”–FGD01 “ Let’s say you have a favourite YouTuber who is so pretty . She’s a beauty vlogger . And you love watching her videos . But every time you watch her videos , you feel so self-conscious about your own face . I think it’s important to make that hard sacrifice to cut it out , to stop watching that YouTuber . ”- SI13

Theme 6 : Taking breaks from social media . Participants also highlighted the importance of being aware of the negative impacts of social media consumption on themselves, such as from social comparisons or absorbing overwhelming content and managing time spent on social media to reduce these impacts. They also mentioned different ways and benefits of taking breaks from social media use. For some, breaks can either be temporary or permanent (e.g., deactivating social media accounts for good), and the benefits include being able to focus on other tasks at hand (e.g., studying/schoolwork) and experiencing boosted mental health.

“ …I think what we don’t realize is that these (social media) are outlets that feed on your energy every single day . So , if it gets to a point where you can’t function or perform well because of it , then you need to give yourself a break from it . ”–FGD7 “ So , for one period , I deleted all my social media like kind of a social media cleanse , and I think a lot of youth do that as well , as they are growing a little bit older , because social media is very superficial , in a sense . ”–FGD2

Theme 7 : Cognitive reframing and self-affirmation . This theme explicates the conscious effort by users to identify and change stress-inducing patterns of thinking by setting realistic social, physical, and lifestyle expectations for themselves and not being swayed by unrealistic social media portrayals. Participants also showed awareness of the superficial nature of the content on social media which does not necessarily mirror complete real-life situations or experiences. In addition, focusing on self-development and self-affirmation were other ways participants noted as safeguards against the pitfalls of social comparisons on social media.

“ … we need to have a stronger sense of reality and understand that social media is really not everything . Like what [participant] said , there’s a backstage and there’s a front stage that you want to portray , and no one wants to show their dirty laundry on social media . They want everything to be good . ”–FGD4 “ So , you see these people’s successes… but then I just try to remind myself that , "Okay . I’ll be happy for them . Yours will come in time , " instead of feeling like , "Oh , why isn’t mine here ? ”–SI03 “ I don’t regularly tell positive affirmations to myself , but maybe once in a while , I will just look in the mirror and say positive things about myself , and it actually makes me feel better immediately , and I think that has an impact on my mood ”– FGD08

Findings from this study expand and inform the emerging body of research on the negative experiences of social media use among a sample of youths in Asian society and how they deal with them. While youths reported experiencing or witnessing their peers experience significant negative effects of social media use, it is promising to note youths’ awareness of these negative effects and their attempts to avoid or reduce them.

Youths’ narratives suggest a predisposition towards upward social comparisons, particularly with peers and influencers on social media sites. This tends to result in negative effects; most of which are reflected in prior studies including the development of negative feelings (e.g., feelings of inadequacy, lowered self-esteem [ 29 ]), as well as an unhealthy mindset (e.g., negative body image [ 30 ]) and behaviours (e.g., disordered eating behaviours [ 31 ]). [ 32 , 33 ]. Scholars suggest that the fundamental and universal desire for comparisons with others serves a variety of functions such as evaluating the self [ 34 ], fulfilling affiliation needs [ 35 ], and being inspired [ 36 ]. Given the social functions of social media sites and the detailed information about others, it may be natural for people to engage in social comparisons either consciously or unconsciously [ 37 ]. Furthermore, unlike real-life situations, social media sites allow people to present an optimized or idealized version of themselves and their experiences [ 38 , 39 ]. It is therefore possible that further exposure to ‘enhanced’ profiles can create more discrepancy between their perceived self and others and perhaps amplify feelings of inadequacy. This could be supported by the finding of Chou and Edge [ 40 ] who examined the impact of using Facebook on people’s perception of others’ lives and found that people who have spent more time on Facebook tend to perceive other social media users as having better lives than they do.

Qualitative studies examining comparisons on social media among young people in Western populations have mostly focused on examining the relationship between social media use and physical or bodily appearance [ 25 , 41 , 42 ] or found the appearance-related social comparison to be discussed by participants when examining the role of social media on mental health [ 43 ]. In such studies, negative impacts from such comparisons were often highlighted. For example, in a study based in the UK by Easton et al. [ 42 ], the authors examined young adults’ (aged 18–25) experience with ‘fitspiration’ (blend of “fitness” and “inspiration”) on social media. They found that comparisons with another’s perceived fitness (content on healthy lifestyle habits, relating to exercise and diet), can give rise to negative effects on their psychological health. In particular, minor negative effects include being frustrated about the deceptive nature of posts and jealousy towards unattainable body appearance while more perturbing effects include negative feelings towards their bodies and unhealthy eating habits. These findings are reflected in the narratives of a few participants in this study, particularly female participants, who had mentioned engaging in upward physical appearance comparisons and resonated with such experiences. On the other hand, a unique finding which emerged from this study is the upward comparison with others’ achievements (e.g., academics, employment) or material possessions and lifestyles. Research exploring cultural differences in social comparisons suggests that people living in countries whose cultures tend to be more collectivistic, rather than individualistic, are more likely to engage in social comparisons [ 44 ] and that Eastern cultures are suggested to be more concerned about one’s relative social standing [ 45 ]. Therefore, it seems unsurprising that youths in Asian society are inclined to seek upward social comparisons with peers’ achievements, such as having a good social network, occupation, and education, which are contributors to a person’s social standing.

Participants highlighted cognitive reframing as a strategy to reduce the harms of social comparisons on social media. Cognitive reframing among participants entails identifying negative patterns of self-evaluations and reinterpreting how they view content on social media, such as reminding themselves of the superficial nature of social media portrayals as well as creating more realistic self-expectations. Research indicates that low self-esteem has been linked to unrealistic standards for self-evaluation [ 46 ] and that negative self-evaluations can occur when discrepancy increases between ideal and real self-image [ 47 ]. In the context of social media, for example, exposure to ‘fitspiration’ content, which tends to involve images and messages praising thinness and high fitness levels [ 48 , 49 ], can lead to increased body dissatisfaction if these ideals are internalized and unattained [ 39 ] Cognitive reframing mentioned by youths, such as setting realistic body image or achievement expectations, can therefore act as a buffer against the development of negative self-evaluations when comparing with unrealistic and unattainable body images or others’ achievements on these platforms.

Narratives of negative feelings from exposure to controversial content are worth paying attention to. Specifically, youths mentioned feeling drained and helpless from bearing witness to cruelty or disasters to which they are unable to contribute to improving the situation. Social media has emerged as a source of news content over the years [ 50 ] and provides many opportunities to be exposed to news incidentally or deliberately through content shared by others within their social networks [ 51 ], or when they follow official accounts of news broadcasters. Media writers have argued that news in the digital age has become increasingly visual, with images taken from various sources, and written to convey excitement and danger, and be fear-laden [ 52 ]. Importantly, user-generated images of important world events are frequently captured on smartphones by witnesses in these events, allowing the audience to view such events in real-time [ 53 ]. Such user-generated images allow news broadcasters to display more intense and shocking visuals that may not have been available or approved in earlier times. It is therefore unsurprising that there is an observed negative effect on mental well-being with exposure to such raw and intense visuals and commentaries.

However, it is arguably reassuring to note that participants are aware of the control they have over social media’s influence on their lives and have exercised proactive self-control strategies to regulate their social media use, such as limiting content seen on social media or keeping off social media. Participants recognize the feeling of relief and benefit to their mental well-being when they step away from stress-inducing content or when they disabled their social media accounts, incidentally, highlighting the pervasiveness of social media in their lives. Indeed, some studies have shown that taking a break from social media positively affects subjective well-being [ 54 – 56 ]. It is also worth noting that keeping off social media among participants appears to involve internal negotiation and contention, alluding to resisting compulsions towards using it.

The negative experiences and harm minimisation strategies reported in this study align well with some emerging initiatives to protect young people from the harms of social media use. An example of such an initiative is #Chatsafe. #Chatsafe was developed to guide and educate young people about communicating safely about suicide on social media [ 57 , 58 ]. The social media campaign was found to be effective in improving young people’s capacity to intervene against suicide online, perceived internet self-efficacy and safety when communicating about suicide on social media [ 58 ]. A similar initiative based on the #Chatsafe guideline was also implemented in Singapore [ 59 ]. Specifically, the #Chatsafe guideline was adapted to the local context of Singapore into a #PauseBeforeYouPost campaign which educates young people on how to conduct safe conversations around mental health issues and engage safely with those who are at risk of suicide. In addition, a #Chatsafe training curriculum is also being developed for youths and caregivers to provide them with the relevant skills and knowledge to engage positively with suicide-related online content and support those around them who are in distress. The effectiveness of this campaign can inspire the implementation of future interventions targeting different needs of youths, as exemplified by the narratives among this sample population, to mitigate negative experiences and outcomes of social media use. For example, users could be taught and reminded of proper etiquette when communicating with others on social media to avoid making hurtful remarks. In addition, self-esteem, which can mediate the effects of upward comparisons on well-being, could also be addressed in these campaigns.

Efforts could also be extended beyond the social media realm as programs in schools. Educators could teach youths how to manage content and conversations on social media and encourage the diversification of content by suggesting accounts that nurture intellectual passions or interests, resilience, and increase self-esteem in them. As exemplified by narrations of adolescents from Burnette et al.’s [ 25 ] study, a supportive school environment and its effective communication of social media-related messages and programmes on accepting differences in body image can contribute to the development of high media literacy and confidence among them.

Strengths and limitations

This study has several strengths. Data were gathered from both FGDs and SIs, and deviant samples, allowing for the generation of broad and rich qualitative data. FGDs are generally more dynamic and allow participants to discuss and expand on their pre-existing ideas in light of points mentioned by other participants, which may not have been uncovered in in-depth interviews [ 60 ]. On the other hand, SIs allows for the gathering of greater insight into the individuals through the discussion of topics in detail [ 60 ]. However, study findings must be considered in the context of several limitations. The nature of the topic–the negative impact of social media use–may be sensitive or controversial to some participants; therefore, participants may limit sharing due to social desirability bias [ 61 ]. In addition, we did not examine whether experiences of social media use differ across individuals with different patterns of social media use, or those from different sociodemographic and sociocultural backgrounds, which may account for differences in experiences of negative effects and subsequently adoption and success of different coping mechanisms. It is recommended that future research delve into this topic further and explore the interaction between an individual’s socioecological environment and their experiences with social media use. Future research could also investigate differences in terms of strategies adopted by Asian and Western youth populations.

Results from the present study indicate that social media can influence youths’ lives today. While social media can enhance learning, connection and communication [ 62 ], the salience of its negative effects on users’ mental well-being drives the need to actively monitor these harms and explore effective ways to steer users away from them. The current results offer a preliminary portrait of the salient negative effects of social media use in a multi-ethnic Asian youth population. It also indicates that while youths experience the negative effect of social media use, they have high media literacy and have employed strategies that appear to mitigate the negative effects of social media.


The authors are immensely grateful to all the participants and CHAT Hub, Human Hearts, The Green House Community, and Touch Community for kindly circulating information on our study to their clients.

Funding Statement

This study was supported by the Singapore Ministry of Health’s National Medical Research Council under the Centre Grant Programme (NMRC/CG/004/2013) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

Discover the world's research

