Brain drain, the emigration of highly skilled workers, is both a pressing policy issue and a cultural frame for examining how societies negotiate belonging, loss, and opportunity. Using data from X (formerly known as Twitter) across English, Spanish, and Italian debates, this study examines engagement dynamics through three dimensions: visibility, sentiment shifts, and thematic divergence. In the present context, we define visibility as the weighted exposure of replies and likes that are standardized within each language. The dataset includes approximately 3100 English, 4500 Spanish, and 250 Italian tweet–reply pairs collected between 2020 and 2024. The findings show that visibility consistently drives engagement but carries distinct cultural meanings. The discourse in English illustrates coherence: engagement is amplified when replies remain thematically aligned, reinforcing bonding capital and collective narratives. The Spanish discourse illustrates plurality: visible divergent replies broaden participation, reflecting bridging capital while introducing volatility. The Italian discourse illustrates optimism: positive sentiment shifts sustain institutional trust with divergence. For policymakers, these results highlight the following specific pathways to tailor crisis communication: reinforce coherence in English debates, embrace plurality in Spanish contexts, and sustain optimism in Italian discourse.
Brain drain, the emigration of highly skilled workers, remains both a critical policy challenge and a lens that reveals how societies negotiate opportunity, belonging, and loss. In an era when debates on migration increasingly unfold on digital platforms such as X (formerly Twitter), social media functions not merely as a communication tool but as a contested arena where narratives are shaped, amplified, and resisted. Understanding these dynamics requires moving beyond descriptive accounts of online discourse to frameworks that capture the scale and nature of the structural, affective, and thematic patterns at play.
Language plays a pivotal role in this process. It functions simultaneously as a structural filter that shapes the participants of digital public spheres and a cultural framing device that conditions how migration narratives are articulated and received (Benson, 2016; Dubois & Blank, 2018; Hale et al., 2022; Estevez et al., 2025). Linguistic segmentation is therefore more than a methodological choice: it is a theoretical necessity for understanding how publics process crises through distinct cultural logics. The examination of English, Spanish, and Italian brain drain debates highlights three contrasting contexts: post-Brexit labor uncertainty, Latin American migration pressures, and Italian institutional inertia, each embedded within different linguistic–cultural frames.
To analyze these dynamics, this study integrates social capital theory with computational methods. Social capital distinguishes between bonding capital, which reinforces cohesion through thematic alignment and shared sentiment, and bridging capital, which expands networks by incorporating divergent perspectives (Coleman, 1988; Putnam, 2000). Retweeting is not merely amplification but a relational signal of trust, reciprocity, or contestation within linguistic communities. We operationalize three pillars of retweet behavior to capture these dynamics: visibility, measuring the structural reach of tweets and replies; sentiment shift, capturing the affective modulation between tweets and replies; and divergence, which indicates whether conversations remain thematically coherent or diverge into new directions. Through regression modeling and divergence metrics, computational approaches allow us to move beyond anecdotal contrasts and systematically reveal how publics process migration debates in different languages.
The results show that visibility consistently drives engagement but operates differently across linguistic communities. English retweet dynamics penalize divergence while privileging thematic coherence and emotional restraint, which is consistent with bonding capital. By contrast, Spanish dynamics selectively reward visible divergent replies, reflecting bridging capital and the cultural value of expressive plurality. Italian dynamics place less weight on divergence and amplify engagement through positive sentiment shifts, which sustain institutional trust and optimism. Interaction effects across models underscore how structural signals (visibility), affective modulations (sentiment), and thematic alignment (divergence) are interpreted through distinct cultural filters.
These findings contribute to the theory by demonstrating how language structures both participation and the legitimacy of expression in the discourse on migration, linking digital publics to broader debates on transnational communication. The implications for crisis management are clear. English publics consolidate around coherent, professionalized narratives that support coordinated messaging and risk homogenization. Spanish publics thrive on plurality, where the selective amplification of divergent voices enhances inclusivity while raising the costs of emotional labor. Italian publics sustain engagement through institutional optimism and highlight the importance of affective framing in maintaining trust. Together, these patterns show that effective crisis communication cannot rely on a one-size-fits-all strategy: it must adapt to the cultural logics embedded in language, leveraging visibility, sentiment, and divergence differently to sustain credibility and engagement in multilingual public spheres. To anchor this analysis further, Table 1 introduces a three-level categorization—individual, collective, and organizational—that defines how the discourse on brain drain manifests across linguistic publics, with multilingual examples illustrating the spectrum from personal experience to societal framing.
Category definitions with multilingual examples.
Research on migration has long emphasized that mobility is not merely an economic or demographic process but also a communicative one (Hynek et al., 2025). Narratives of departure, opportunity, and loss are socially constructed and politically contested across multiple arenas (Docquier & Rapoport, 2012; Benson, 2016; Porfírio et al., 2024). With the rise of social media, the discourse on migration has increasingly unfolded within the digital public sphere, where affect, visibility, and participation are shaped by platform affordances (Dubois & Blank, 2018; Hale et al., 2022). X, in particular, functions as a transnational arena where narratives of “brain drain” circulate, interact, and compete for legitimacy across cultural and linguistic contexts.
Linguistic segmentation as a structural filter and cultural frameLanguage is more than a neutral medium; it structures participation and frames interpretation. Linguistic boundaries act as a structural filter, defining who can participate in discourse communities and shaping the flow of information across them. Language also serves as a cultural framing device, influencing how narratives are articulated, the affective registers deemed legitimate, and the forms of authenticity that are rewarded (Benson, 2016; Dubois & Blank, 2018). Comparative work in cross-linguistic social media analysis1 shows that English, Spanish, and Italian discourses often emphasize policy coherence, expressive plurality, and institutional trust and optimism, respectively (Ibáñez-Sánchez et al., 2022; Kim et al., 2025). Without segmentation by language, these cultural logics are flattened, obscuring the core dynamics that shape migration narratives in global digital arenas.
Social capital and the three pillars of retweet behaviorTo interpret cross-linguistic engagement patterns, this study draws on social capital theory (Bourdieu, 1986; Coleman, 1988; Putnam, 2000). Social capital distinguishes between bonding capital, which reinforces cohesion and trust within communities, and bridging capital, which expands networks by incorporating divergent perspectives. Retweeting is not only an act of amplification but also a relational signal of endorsement, contestation, or belonging that is embedded in cultural norms.
We operationalize this framework through three pillars of retweet behavior. Visibility reflects the structural distribution of attention and shows how tweets and replies extend their reach across different linguistic publics. Sentiment shift captures the affective dimension of engagement and measures how the emotional tone changes between original posts and replies. Divergence assesses whether conversations maintain thematic coherence (bonding capital) or depart into new directions (bridging capital). By embedding these pillars within the social capital lens, we link computational measures of visibility, affect, and thematic distance with broader cultural logics of trust, reciprocity, and authenticity.
Descriptive insights on multilingual retweet patternsThe descriptive analysis provides the empirical backbone for this study. Fig. 1a presents the most frequent terms by category (collective, organizational, individual), while Fig. 1b maps seven Latent Dirichlet Allocation (LDA) topics that structure cross-language debates. Table 2 quantifies engagement at the category level (tweets, replies, likes, sentiment, visibility, divergence), and Fig. 2 displays sentiment distributions across tweets and replies. Figs. 3a–c compare visibility, sentiment shifts, and divergence by category and topic for each language, highlighting whether publics concentrate their engagement on original tweets versus replies, whether thematic divergence is suppressed or amplified, and whether sentiment shifts stabilize or destabilize the discourse. These descriptives lead to the formation of the research questions, which are subsequently tested through regression models.
Distribution of tweets and replies by category across languages.
Note. Metrics include average likes, average replies, sentiment scores, sentiment shifts, tweet visibility, and topic divergence. Visibility measures are standardized within language.
Comparative visibility, sentiment shift, and divergence patterns in English.
Note. Visibility and sentiment indices (Tweet Visibility, Reply Visibility Score, Sentiment Shift, and Intensity Shift) are standardized within language. Topic Divergence is a binary indicator and not standardized.
Engagement in English is concentrated in collective topics (21 tweets; 2041 replies), which also attract the highest likes per tweet (1758; Table 2). Fig. 1a highlights terms such as “UK,” “brain,” and “people,” while Fig. 1b shows causes and patriotism as dominant themes. Fig. 2 indicates neutral-to-positive replies, and Fig. 3a confirms strong tweet visibility but weaker reply visibility, minimal sentiment shift, and low divergence. These patterns suggest a logic of coherence, where engagement clusters around visible, thematically aligned content.
RQ1 (English – Coherence): To what extent does tweet visibility reinforce coherent, collective engagement in English debates?
Spanish: plurality via reply visibility and divergenceThe debates in Spanish have the highest organizational volume (93 tweets; 160 replies) and also show that individual topics generate the most traction (616 likes per tweet; Table 2). Fig. 1a presents institutional and personal terms such as fuga and trabajo, while Fig. 1b emphasizes personal narratives and economy-related themes. Fig. 2 reveals broader, often more negative sentiment distributions, and Fig. 3b shows strong reply visibility coupled with higher divergence and more pronounced sentiment shifts. This indicates a logic of plurality, where diverse voices gain traction through reply visibility; although volatility emerges when negative sentiment spreads.
Comparative visibility, sentiment shift, and divergence patterns in Spanish.
Note. Visibility and sentiment indices (Tweet Visibility, Reply Visibility Score, Sentiment Shift, and Intensity Shift) are standardized within language. Topic Divergence is a binary indicator and not standardized.
RQ2 (Spanish – Plurality): How do reply visibility and thematic divergence support plurality in Spanish debates, and how is this affected by shifts in sentiment?
Italian: optimism through positive sentiment shiftsItalian engagement is centered around organizational topics (41 tweets; 38 replies; Table 2), with consistently positive sentiment (Fig. 2). Fig. 1a highlights lavoro and fuga, while Fig. 1b anchors debate in government, education, and opportunities. Fig. 3c shows modest tweet visibility, steady reply visibility, consistently positive sentiment shifts, and low divergence. This reflects a logic of optimism, where a positive tone reinforces organizational discourse even in the absence of high visibility.
Comparative visibility, sentiment shift, and divergence patterns in Italian.
Note. Visibility and sentiment indices (Tweet Visibility, Reply Visibility Score, Sentiment Shift, and Intensity Shift) are standardized within language. Topic Divergence is a binary indicator and not standardized.
RQ3 (Italian – Optimism): How do positive sentiment shifts sustain organizational optimism in Italian debates, even when overall visibility is lower?
Cross-linguistic integrationTaken together, Figs. 1 and 2 and Table 2 reveal three distinct logics: English publics rely on tweet visibility to preserve coherence, Spanish publics leverage reply visibility and divergence to amplify plurality, and Italian publics sustain optimism through positive sentiment shifts despite modest visibility. These descriptive insights establish the foundation for the regression analysis, where the interaction of visibility, sentiment, and divergence is tested to produce differentiated outcomes across linguistic publics.
Data and methodsData collectionWe collected multilingual X data on the brain drain discourse in English, Spanish, and Italian between January 2020 and December 2024. This timeframe was chosen because migration debates intensified during this period due to post-Brexit concerns about labor, the Covid-19 pandemic, and the European energy crisis, all of which heightened anxieties over the emigration of skilled professionals (Segev, 2023; Cau et al., 2024). Simultaneously, changes to X’s algorithm reshaped the patterns of visibility and engagement and rendered this window especially relevant for studying digital discourse (Brady et al., 2021; Rathje et al., 2021).
Data were retrieved using the Apify API with language-specific keywords and hashtags: “brain drain” and #braindrain for English, “fuga de cerebros” and #fugadecerebros for Spanish, and “fuga di cervelli” and #fugadicervelli for Italian. Multilingual or code-switched tweets were included if the focal language was dominant and contained the relevant term; otherwise, they were excluded. After filtering off-topic content, the dataset comprised 79 English tweets (3191 replies), 122 Italian tweets (390 replies), and 181 Spanish tweets (4508 replies). Mixed-language or off-topic posts, primarily those referencing unrelated economic or political debates, were excluded, accounting for approximately 7 % of the total corpus. The final dataset includes 3132 English, 4490 Spanish, and 253 Italian tweet–reply pairs, reflecting expected activity asymmetries across linguistic communities. The smaller Italian subset led to thinner cells for some interaction terms, which we interpret with caution.
Following best practices in multilingual computational social science (Charquero-Ballester et al., 2024), metadata (likes, retweets, replies, timestamps, author IDs) were preserved for traceability. A de-identified replication package containing the code and minimal data samples used in this study is publicly available through the Open Science Framework (OSF) (https://doi.org/10.17605/OSF.IO/S6VZ2). Manual topic validation was conducted by a single coder to ensure thematic accuracy in the tweet–reply pair classification. To verify reliability, a 10 % subset (approximately 500 tweet–reply pairs, proportionally sampled across English, Spanish, and Italian) was independently re-coded by a second verifier. The two coders reached 85 % agreement at the topic-label level, with disagreements resolved through discussion. Although full manual double-coding was not feasible due to scale, this subset validation confirmed the consistent application of thematic labels across languages. Table 3 details the operationalization of key variables, clarifying how the structural, affective, and thematic dimensions of retweet behavior were measured across languages.
Operationalization of structural, affective, and thematic variables.
Note. Example when Tweet sentiment = +5 and Reply sentiment = –5 →.
Sentiment shift = –10 (directional change toward negativity).
Absolute sentiment shift = 10 (magnitude of emotional swing).
Intensity shift = 0 (emotional strength unchanged, equally strong but opposite polarity).
Tweets and replies were manually annotated into three categories: collective, organizational, and individual, and further mapped to seven topics: brain drain causes, government-related, education and opportunities, personal narratives, patriotism, mockery, and economy-related. These categories were derived inductively after closely reading the corpus and aligned with migration communication research (Nguyen, 2006; Tomić & Taylor, 2018; Vega-Muñoz et al., 2021; Chen et al., 2022; Rémy, 2023; Gjerazi, 2024). Items not fitting any category were labeled “N/A” and excluded. Table 4 shows how the seven topics map onto the existing literature and themes observed in our dataset, ensuring transparency in topic definition and interpretation. The data collection process is documented in Appendix A1.
Topic characteristics mapped to prior literature and empirical findings.
| Topics | Scholars Findings on Recurring Themes | Reference | Our Findings |
|---|---|---|---|
| Brain Drain Causes | Economic InstabilityLimited job opportunitiesPolitical turmoilPolitical insecurity | Tomić, C. H. & Taylor, K., 2018; Ajoseh, S. et al., 2024 | Loss of skilled talented professionalsConcern over national lossIntellectual migration |
| Government-related | Inadequate government policiesLack of support for research developmentBureaucratic hurdlesGovernment initiatives to attract expatriates | Vega-Muñoz, A. et al., 2021 | Systemic inefficienciesCritiques towards specific government officials or private actorsComplaints about corruption, lack of support, regualtory failures contributing to brain drain |
| Education & Opportunities | Disparities in educational standards between countriesLimited opportunities for career advancement | Tomić, C. H. & Taylor, K., 2018; Ajoseh, S. et al., 2024 | Mention of personal dreams for jobsAccess to education and career growth opportunitiesBetter opportunities abroadMany unmet expectations in home countries |
| Personal Narratives | Challenges and successes of emigrantsEmotional and social implications of leaving one’s homelandFeeling of lossImpact on local communities | Gjerazi, B., 2024; Rémy, F.C., 2023 | Personal stories on why people left their home countriesA lot of disappointment towards countries (both home and host)Emigration as a form of protest |
| Patriotism | Wish to return home if conditions improvePatriotism influences migration’s decisionsTrade-off between seeking better opportunities and the sense of loyalty to one’s country | Nguyen, C. H., 2006 | Nostalgia of home countries a calls for more opportunities to return and build a better version of home countryUsed to ispire hope or guilt |
| Mocking | Critics about government policies and societal conditionsCoping mechanism to cope with frustrations about systematic issues | Gjerazi, B., 2024 | Mocking of specific government officialsMocking between users for their ideasSometimes derailing from serious topic with performative commentary |
| Economy-related | Remittance from abroad can increase home country’s economyBrain drain hinder economic development and innovation in the home country | Oosterik, S. 2016; Vega-Muñoz, A. et al., 2021 | Salaries and overall economic instabilityLimited financial opportunities, especially after Covid-19Local economic conditions are push factors causing brain drain |
Note. Columns align prior scholarship on recurring brain-drain themes with thematic patterns identified in this dataset.
Each dataset was analyzed in its original language to preserve semantic nuance, consistent with Santoveña-Casal et al.’s (2021) recommendations on language-specific lexicon accuracy (Charquero-Ballester et al., 2024). Preprocessing involved lowercasing, tokenization, and stopword removal using language-specific resources: SnowballC for English and Spanish (Bouchet-Valat, 2020) and a curated Italian stopword list from the Genediazjr stopwords GitHub repository (Genediazjr, 2017). While the Snowball stopword lists in R were sufficiently precise for English and Spanish (Table 5), the Italian list proved less reliable. To address this, we employed the curated GitHub list, ensuring culturally appropriate filtering of high-frequency function words while retaining semantically relevant terms (Carlino et al., 2020; Ruiu & Regnedda, 2022). Additional resources were integrated for sentiment analysis, including the AFINN lexicon (Nielsen, 2011), Bing lexicon (Hu & Liu, 2004), Sentix for Italian (Polignano et al., 2019), and the Emoji Sentiment Ranking dataset (Novak et al., 2015). Platform artifacts (t.co, https, amp) were removed; care was taken to avoid eliminating semantically relevant migration terms (Duong & Nguyen-Thi, 2021; Karamouzas et al., 2022).
Stopword resources used for English, Spanish, and Italian preprocessing.
Note. Examples illustrate language-specific stopword lists applied during text preprocessing.
Sentiment polarity was computed using validated lexicons for each language: the AFINN lexicon for English, a community-adapted AFINN (lexico_afinn.csv) for Spanish, and Sentix for Italian (rescaled to the AFINN range of –5 to +5). Emoji polarity scores (Novak et al., 2015) were integrated to capture multimodal expression, and the sentiment score of each message was calculated as the mean of text- and emoji-based polarity. To adapt sentiment scoring to the migration discourse, the Spanish AFINN lexicon was extended to include migration-specific terms such as migración, exilio, fuga, retorno, talento, expulsión, refugiado, diáspora, frontera, xenofobia, asilo, destierro, barreras, nación, and país. These words frequently appear in migration narratives and often carry negative polarity associated with loss, displacement, or constraint. Validation against the baseline lexicon produced a small negative correlation (r = –0.07, 95 % CI [–0.10, –0.04], n = 4 507). This modest inverse association reflects the introduction of systematically negative migration terms—such as migración, fuga, expulsión, and refugiado—which moved previously neutral or mildly positive messages toward more negative valence (Appendix Fig. A1). This divergence does not signal noise or inconsistency; rather, it shows that the adaptation captures a real, context-bound emotional tone that the generic lexicon misses—the anxiety, loss, or structural critique embedded in “fuga de cerebros,” for example. The result indicates that the extended lexicon captures new, domain-relevant emotional cues rather than random noise, improving sensitivity to the specific affective language surrounding the brain-drain debate.
From these scores, three relational measures were derived. Sentiment shift was defined as the reply’s sentiment minus the tweet’s sentiment (Reply – Tweet), capturing the directional change in tone. Absolute sentiment shift was defined as the magnitude of this change; it ignores direction and measures how far removed the reply is from the emotion of the original post. Intensity shift measured changes in emotional strength by comparing absolute sentiment scores for tweets and replies, indicating whether replies became more or less emotionally intense, regardless of polarity.
For example, if a tweet scored +5 and its reply scored –5, the sentiment shift would be –10 (a negative swing), the absolute sentiment shift would be 10 (a large emotional gap), and the intensity shift would be 0 (equal strength, opposite polarity). This distinction allowed us to capture the direction (sentiment shift), distance (absolute sentiment shift), and strength (intensity shift) of emotional modulation. A 10 % manually coded subsample confirmed strong agreement with the automated measures (≥0.78).
Topic modeling and divergenceLatent Dirichlet Allocation (LDA) was applied separately to each language’s tweet–reply corpus, using Gibbs sampling with a fixed random seed (1234) for reproducibility. The number of topics was set to seven, based on an inductive reading of the corpus and alignment with prior scholarship (Table 4). LDA produces a probability distribution of the topics for each text; for interpretability, each tweet or reply was assigned to its dominant topic (highest probability). Topic divergence was then coded as 1 when the dominant topic of a reply differed from that of its parent tweet, and 0 otherwise.
To complement this categorical measure, we also computed thematic divergence, a continuous measure of semantic drift. Tweets and replies were vectorized with TF–IDF, and divergence was defined as 1 – cosine similarity between the tweet and reply. This dual strategy ensured that divergence was captured as discrete topical shifts and more subtle semantic departures.
To capture the cross-linguistic variation in engagement, we operationalized three swing measures derived from tweet–reply pairs: emotional swing (absolute sentiment shift, |Reply – Tweet|), thematic swing (thematic divergence, 1 – cosine similarity of TF–IDF vectors), and strength swing (intensity shift, |Reply Sentiment| – |Tweet Sentiment|), which were later compared across languages and visualized in Figs. 3a–c.
Visibility metricsFollowing diffusion models (Brady et al., 2021), visibility was operationalized as a weighted combination of likes and replies. For each language, tweet visibility was estimated by partial least squares regression coefficients that predicted retweet counts from likes and replies. Reply visibility was computed analogously, based on likes and replies to replies. A total retweet impact measure summed retweets of original tweets and their replies. This calibration ensured that visibility reflected actual engagement pathways within each language community.
where β1,β2,β1′,andβ2′ are language-specific regression weights estimated via ordinary least squares (OLS) (see Appendix Table A1). Both indices were standardized within each language to ensure comparability across linguistic communities.This design avoids spurious cross-linguistic contrasts that may arise from different engagement baselines. Appendix Table A2 reports the estimated weights for each language, indicating how engagement components contribute to the Latent Visibility Index in a language-specific manner. Standardized scores were used for all subsequent regression and interaction models, and figures were labeled “standardized within language” to avoid misinterpretation.
Statistical analysesThe descriptive evidence in Table 2 and Figs. 1–3 already suggests systematic cross-linguistic contrasts. English activity was concentrated on collective topics, generating the highest likes per tweet and reflecting collective coherence. Spanish activity was more dispersed and reflected plurality, with organizational tweets being the most frequent, but individual topics attracting stronger engagement. Italian activity centered on organizational themes, with modest visibility, reflecting institutional optimism. Figs. 1a–b confirmed these tendencies: English clustered around causes and patriotism, Spanish spanned personal, economic, and institutional themes, and Italian emphasized government and education. Fig. 2 highlighted distinct sentiment distributions: English clustered neutral-to-positive, Spanish replies spread more widely and often negative, and Italian consistently positive. Finally, Fig. 3a–c confirmed differences in engagement: English publics displayed strong tweet visibility but low reply visibility and divergence, Spanish publics displayed strong reply visibility and higher divergence yet coherent threads, and Italian publics showed modest visibility with consistently positive sentiment shifts.
To test whether these descriptive contrasts were statistically robust, we compared three continuous measures across languages: absolute sentiment shift (emotional swing), thematic divergence (thematic swing), and intensity shift (strength swing), using Levene’s tests, one-way ANOVA, and Tukey HSD post-hoc comparisons (α = 0.05). Levene’s tests were highly significant (p < .001) for all three measures, reflecting unequal variances across the languages
For absolute sentiment shift (emotional swing), ANOVA showed strong differences across languages, F(2, 15,754) = 4103, p < .001. Tukey comparisons revealed that English replies produced significantly larger swings than both Italian (p < .001) and Spanish (p < .001), while Spanish–Italian differences were negligible (p = .76). This confirms that English publics engage through sharper emotional swings, often amplifying or reversing the polarity of the original tweet.
For thematic divergence (thematic swing), ANOVA also indicated significant differences, F(2, 15,754) = 64.79, p < .001. Post-hoc comparisons showed that Spanish replies were significantly more convergent (lower divergence) than both English (p < .001) and Italian (p < .001), while English–Italian differences were not significant (p = .36). This suggests that Spanish publics, while branching into diverse topical categories, sustain low thematic swings by maintaining coherence after threads are established.
For intensity shift (strength swing), the ANOVA showed moderate but significant differences, F(2, 15,754) = 42.55, p < .001. English replies showed greater changes in emotional strength compared to Italian (p = .024) and Spanish (p < .001), while Spanish–Italian differences were minimal (p = .75). This indicates that English publics are more likely to alter the strength of emotional expression, either intensifying or dampening the tone, whereas Spanish and Italian publics maintain more stable emotional magnitudes.
Together, these results show that engagement dynamics vary by language along three distinct axes of swing. English publics diverge most in emotional and strength swings, reflecting engagement through the emotional modulation of coherent collective narratives. Spanish publics minimize both emotional and strength swings, while also showing low thematic swings, reflecting plurality across topics but coherence within threads. Italian publics sustain optimism with low emotional and strength swings, but moderate thematic swings, reflecting replies that remain consistently positive even when the topics shift slightly.
These statistical contrasts validate the descriptive insights from Table 2 and Figs. 1–3, and establish the empirical foundation for the regression analyses (Tables 6 and 7), which examine how visibility, sentiment, and divergence jointly shape engagement outcomes across languages.
Regression models predicting total retweet impact (thematic divergence specification).
Note. All continuous predictors (Visibility, Sentiment Shift, and Thematic Divergence) are standardized within language. Estimates are unstandardized coefficients.
Regression models predicting total retweet impact (topic divergence specification).
| Panel A: English | ||||
|---|---|---|---|---|
| Model 1 | Model 2 | Model 3 | Model 4 | |
| (Intercept) | 40.83 | 46.31 | 18.61 | −46.41 |
| (117.68) | (130.29) | (101.62) | (132.07) | |
| Tweet Visibility | 1.00 ⁎⁎⁎ | 1.00 ⁎⁎⁎ | 0.80 ⁎⁎⁎ | 0.63 ⁎⁎⁎ |
| (0.02) | (0.02) | (0.04) | (0.02) | |
| Reply Visibility | 0.78 | 0.47 | 7.83 ⁎⁎⁎ | 9.22 |
| (2.49) | (2.57) | (2.55) | (43.06) | |
| Sentiment Shift | 6.58 | 6.04 | −28.80 | 96.61 |
| (6.08) | (17.78) | (16.63) | (67.25) | |
| Topic Divergence | −50.39 * | −22.52 | −47.68 | 133.77 |
| (118.93) | (122.52) | (104.39) | (133.77) | |
| Categories Individual level | −95.72 ⁎⁎ | 88.52 | ||
| (78.34) | (15.07) | |||
| Categories Collective level | −14.21 | 13.07 | ||
| (78.23) | (67.25) | |||
| Tweet Visibility × Sentiment Shift | 0.14 ⁎⁎⁎ | |||
| (0.03) | ||||
| Reply Visibility × Topic Divergence | 0.21 * | |||
| (43.26) | ||||
| Tweet Visibility × Topic Divergence | −0.35 ⁎⁎ | |||
| (0.06) | ||||
| R² | 0.97 | 0.97 | 0.98 | 0.98 |
| Adj. R² | 0.97 | 0.97 | 0.98 | 0.98 |
| Num. obs. | 78 | 78 | 78 | 78 |
| Note. All variables are standardized within language. Topic Divergence is a binary indicator derived from dominant Latent Dirichlet Allocation (LDA) topics. Estimates are unstandardized coefficients. *** p < 0.001; ** p < 0.01; * p < 0.05 | ||||
| Panel B: Spanish | ||||
| Model 1 | Model 2 | Model 3 | Model 4 | |
| (Intercept) | 56.72 | 35.94 | 56.18 | 7.02 |
| (66.48) | (110.38) | (66.52) | (115.26) | |
| Tweet Visibility | 0.98 ⁎⁎⁎ | 0.98 ⁎⁎⁎ | 0.99⁎⁎⁎ | 1.84* |
| (0.07) | (0.07) | (0.07) | (0.82) | |
| Reply Visibility | −1.30 | −1.29 | −1.36 | −8.43 |
| (0.95) | (0.96) | (0.96) | (6.03) | |
| Sentiment Shift | −0.60 | −0.88 | −5.29 | |
| (11.57) | (11.69) | (12.29) | (36.55) | |
| Topic Divergence | 23.91 | 28.48 | 25.20 | 88.42 |
| (68.68) | (69.66) | (68.65) | (88.89) | |
| Categories Individual level | 6.47 | (25.20) | (88.42) | |
| (91.44) | 23.48 | |||
| Categories Collective level | 23.49 | (88.78) | ||
| (88.78) | 66.50 | |||
| Tweet Visibility × Sentiment Shift | (66.50) | |||
| 0.04 | ||||
| Reply Visibility × Topic Divergence | (0.04) | |||
| 13.65* | ||||
| Tweet Visibility × Topic Divergence | (6.50) | |||
| −0.85 | ||||
| (0.82) | ||||
| R² | 0.60 | 0.60 | 0.60 | 0.62 |
| Adj. R² | 0.59 | 0.58 | 0.59 | 0.62 |
| Num. obs. | 181 | 181 | 181 | 181 |
| Note. All variables are standardized within language. Topic Divergence is a binary indicator derived from dominant Latent Dirichlet Allocation (LDA) topics. Estimates are unstandardized coefficients. *** p < 0.001; ** p < 0.01; * p < 0.05 | ||||
| Panel C: Italian | ||||
| Model 1 | Model 2 | Model 3 | Model 4 | |
| (Intercept) | 0.68 | 0.94 | 0.94 | −2.69 |
| (1.28) | (1.75) | (1.27) | (2.53) | |
| Tweet Visibility | 1.00 ⁎⁎⁎ | 1.00 ⁎⁎⁎ | 1.21 ⁎⁎⁎ | 2.30⁎⁎⁎ |
| (0.02) | (0.02) | (0.08) | (0.36) | |
| Reply Visibility | −0.56 | −0.64 | −0.89 | −1.46 |
| (2.49) | (2.54) | (2.44) | (23.85) | |
| Sentiment Shift | −0.18 | −0.16 | 0.69 | −0.43 |
| (0.50) | (0.51) | (0.59) | ||
| Topic Divergence | 1.35 * | 0.41 | 0.24 | |
| (0.44) | (0.44) | (0.24) | (0.59) | |
| Categories Individual level | −0.13 | −2.69 | ||
| (1.39) | (0.59) | |||
| Categories Collective level | −0.43 | 0.47 | ||
| (1.31) | (0.24) | |||
| Tweet Visibility × Sentiment Shift | −0.37 * | |||
| (0.14) | ||||
| Reply Visibility × Topic Divergence | 0.76 | |||
| (23.95) | ||||
| Tweet Visibility × Topic Divergence | −1.30 | |||
| (0.36) | ||||
| R² | 0.96 | 0.96 | 0.96 | 0.97 |
| Adj. R² | 0.96 | 0.96 | 0.96 | 0.67 |
| Num. obs. | 121 | 121 | 121 | 121 |
Note. All variables are standardized within language. Topic Divergence is a binary indicator derived from dominant Latent Dirichlet Allocation (LDA) topics. Estimates are unstandardized coefficients.
The baseline descriptives (Table 2; Figs. 1–3) reveal distinct cultural orientations in the brain drain discourse. English activity clustered in collective topics such as the causes of brain drain and patriotism, producing the highest likes per tweet and emphasizing coherent societal narratives. Spanish activity reflected plurality and was broadest in organizational tweets, but gained the most traction from individual stories. Italian discourse centered on organizational themes such as government and education, but overall engagement was modest, consistent with institutional optimism. Sentiment distributions (Fig. 2) reinforced these contrasts: English replies were clustered around a neutral-to-positive tone, Spanish replies were more dispersed and often negative, and Italian replies were consistently positive. Fig. 3 confirms these dynamics structurally, showing that English engagement hinged on tweet visibility, Spanish on reply visibility and divergence, and Italian on positive sentiment shifts.
Continuous “swing” measures further validated these descriptive patterns. Emotional swings, measured as absolute sentiment shift, were the largest in English, underscoring affective volatility. Thematic swings, measured as cosine-based divergence, were lowest in Spanish, suggesting greater topical alignment despite plurality. Strength swings, measured as intensity shifts in emotional magnitude, were modest overall but slightly greater in English than Spanish or Italian. Taken together, these descriptive and continuous indicators highlight three distinct logics: English publics prioritize coherence, Spanish publics sustain plurality, and Italian publics prefer to engage in optimism.
Regression results: thematic divergence as the main specificationTable 6 shows the main regression models predicting total retweet impact, where divergence is operationalized continuously as the thematic distance between tweet and reply. Across all languages, tweet visibility was the strongest positive predictor of diffusion, though the role of other factors diverged by language. In English (Panel A), engagement was driven primarily by tweet visibility (β ≈ 1.0, p < .001), with reply visibility also contributing positively. The interaction models revealed that divergence was penalized when replies were highly visible (Reply Visibility × Thematic Divergence = –62.8, p = .008), while sentiment shifts were tolerable only when buffered by high tweet visibility (Tweet Visibility × Sentiment Shift = 0.12, p < .001). In Spanish (Panel B), both tweet and reply visibility predicted diffusion, but with different dynamics. Divergent replies reduced engagement (–322, p = .041), yet this penalty was significantly cushioned when reply visibility was high (Reply Visibility × Thematic Divergence = +8.58, p < .001). Simultaneously, large negative sentiment shifts destabilized engagement (–864, p < .001), suggesting that plurality is structurally tolerated but emotionally fragile. In Italian (Panel C), sentiment shift was the dominant predictor (β ≈ 17, p < .001), visibility played a weaker role, and thematic divergence was non-significant, underscoring that affective tone rather than structural coherence sustains engagement.
All models were tested for multicollinearity and heteroskedasticity. Variance Inflation Factors (VIFs) across all specifications remained below 3.0 (maximum = 2.75), indicating no multicollinearity concerns. To ensure robustness, each model was re-estimated using heteroskedasticity-consistent (HC3) standard errors. The coefficients and significance levels remained stable, confirming that the main results were not sensitive to heteroskedasticity.
The visibility coefficients remained large and significant across languages. In the main specification (Table 6), a one-standard-deviation increase in tweet visibility predicted roughly a one-unit increase in total retweet impact for English and Spanish, and a slightly stronger 1.1-unit increase for Italian. Reply visibility also showed consistent positive effects, suggesting that conversational amplification—when replies attract attention—reinforces overall diffusion.
The robustness specification (Appendix Table A3) yielded nearly identical estimates, with minor increases in standard errors under HC3 correction. These results indicate that the visibility indices are statistically stable and substantively interpretable: greater visibility reliably translates into higher engagement, even when accounting for sentiment and thematic divergence.
Interaction effectsFigs. 4a–c present the interaction effects, showing how visibility fundamentally conditions the interpretation of sentiment and thematic divergence across languages.
Interaction effects of visibility, sentiment shift, and divergence across languages.
(4a) English: Tweet Visibility x Sentiment Shift 4b) English- Reply visibility x Thematic divergence 4c) Spanish- Reply visibility x Thematic divergence
Note. Panels (4a–4c) depict regression-derived interaction effects between visibility and sentiment/thematic divergence for English and Spanish. All variables are standardized within language. Lines represent predicted retweet impact for high (Q3) and low (Q1) visibility. High reply visibility boosts diffusion for coherent English threads but loses advantage when replies diverge, whereas in Spanish discussions, visibility penalties diminish as conversations diversify.
In English (Fig. 4a), sentiment shifts suppressed engagement under low visibility, yet were buffered once the tweets gained prominence, suggesting that publics tolerate emotional fluctuation insofar as it remains anchored in structurally visible posts. Visibility therefore functions as a stabilizing filter: emotional volatility is tolerated when expressed through highly visible messages yet discouraged in peripheral or low-visibility contexts. Similarly, as shown in Fig. 4b, highly visible replies were penalized when they drifted thematically (β = –62.8, p = .008), indicating that visibility amplifies the reward for coherence while dampening the diffusion of off-topic engagement. In the English context, visibility thus consolidates emotional and thematic order—reinforcing coherence as a normative ideal for public discourse.
In Spanish (Fig. 4c), reply visibility initially reduced engagement (β = –7.54, p < .001), yet this penalty was progressively mitigated as thematic divergence increased (β = +8.58, p < .001). The parallel upward slopes in Fig. 4c indicate that heterogeneous discussions regained engagement even under high visibility, effectively transforming divergence from a liability into a participatory resource. In contrast to the English pattern—where visibility rewards coherence—Spanish publics appear to use visibility to normalize diversity, interpreting multiple viewpoints as a sign of authentic deliberation rather than fragmentation. However, large negative sentiment shifts continued to destabilize engagement, underscoring the fragile equilibrium between structural inclusivity and affective volatility.
Italian publics showed a distinct pattern: thematic divergence effects were statistically non-significant, whereas sentiment optimism consistently amplified engagement regardless of visibility. This suggests that Italian engagement dynamics rely more on affective positivity than on structural visibility or thematic alignment. Rather than moderating emotional volatility or reinforcing thematic coherence, Italian users appear to reward positive tone itself as the principal signal of conversational value.
Taken together, these results reveal three differentiated visibility logics. English publics use visibility to consolidate coherence, stabilizing attention around emotionally and thematically consistent narratives. Spanish publics use visibility to legitimize divergence, transforming heterogeneity into a socially recognized form of engagement. Italian publics, by contrast, favor engagement in affective positivity rather than structural visibility or thematic order.
These cross-linguistic patterns demonstrate that visibility does more than amplify exposure—it functions as a culturally adaptive amplifier, translating platform affordances into locally meaningful forms of participation. The interplay between sentiment, divergence, and visibility thus reflects not only communicative strategies but also underlying cultural preferences for coherence, plurality, or optimism, which will be discussed in the following section.
Complementary robustness check: topic divergenceTo ensure that these findings are not artifacts of how divergence is defined, Table 7 re-estimates the models using Topic Divergence, a binary indicator of whether replies switch dominant topics relative to the tweet. This specification complements the continuous semantic measure in Table 6 by capturing sharper categorical breaks in the conversation. The results broadly converge with the main specification while adding nuance. In English, topic divergence reduced baseline amplification (β = –50.39, p < .05) yet recovered when divergent replies themselves gained visibility, whereas high tweet visibility intensified penalties. Spanish models again showed that reply visibility legitimized divergence (β = 13.65, p < .05), reproducing the contingency observed with thematic divergence, although topic switching was penalized in already high-visibility threads. The Italian results remained stable, with topic divergence showing small or non-significant effects, whereas sentiment alignment continued to dominate engagement. We estimated topic divergence (binary) via within-language LDA and correlated it with thematic divergence (1 − cosine similarity). The correlations were positive but small in English (r = 0.12) and Spanish (r = 0.05), and modest in Italian (r = 0.31), indicating that topic switching (categorical drift) and semantic distance (continuous drift) capture related yet non-identical aspects of conversational change. Results were robust to nonparametric (Spearman) estimates and reported with 95 % CIs in Table A3.
Taken together, the robustness check confirms that visibility remains the central driver across languages; sentiment shift operates differently by cultural context; and divergence matters only under conditions of visibility. Thematic divergence provides a continuous, language-agnostic measure of semantic drift, whereas topic divergence highlights sharper conversational ruptures. The use of both operationalizations strengthens the credibility of the results and clarifies why Spanish and English publics exhibit shifting penalties and buffers depending on the definition of divergence, whereas Italian publics remain consistently governed by affective framing.
DiscussionThis study examined how multilingual publics engage with brain drain debates on X, focusing on English, Spanish, and Italian discourse. By analyzing visibility, sentiment, and divergence, we identified not only surface-level differences in activity, but also deeper cultural logics that shape how online publics respond to migration crises. The main specification, based on thematic divergence, showed clear cross-linguistic contrasts, whereas the robustness check with topic divergence confirmed these dynamics under an alternative definition of conversational drift. Together, these models provide convergent evidence that visibility, sentiment, and divergence interact differently across linguistic communities, reinforcing that cultural logics of engagement are not artifacts of measurement choice but robustly observed features of online discourse.
The results highlight three orientations. English publics emphasize coherence: emotional variation is tolerated only when attached to highly visible tweets, whereas divergence, whether measured thematically or topically, is penalized once it becomes visible. Spanish publics value diversity of voice: divergent perspectives tend to reduce engagement when hidden, yet become legitimate and even rewarding when reply visibility is high. This result holds under both divergence measures. Simultaneously, negative emotions destabilize plurality, underscoring the fragility of affective dynamics. Italian publics rely on optimism: positive sentiment shifts consistently amplify engagement, and divergence plays a small role regardless of how it is defined. These findings underscore that bonding, bridging, and affective capital operate differently across linguistic communities, shaping how the migration discourse is interpreted and diffused.
Theoretically, this contributes to research on social capital by showing how bonding and bridging mechanisms take on language-specific forms, and by introducing affective capital as a complementary lens. In English debates, visibility consolidates coherence, reinforces bonding capital by privileging thematic alignment, and penalizes drift. In Spanish debates, reply visibility cushions divergence and aligns with bridging capital by granting legitimacy to diverse perspectives, even as negativity threatens stability. Italian debates point to affective capital, where optimism provides continuity regardless of structural cues, consistent with recent findings on the role of institutional trust in shaping Southern European migration narratives (Patel & Kim, 2025). By using both semantic (thematic) and categorical (topic) measures of divergence, the study demonstrates that these theoretical insights are not tied to a single operationalization but hold across complementary definitions.
Methodologically, this study demonstrates the value of combining multiple computational approaches for comparative migration research. The integration of sentiment scoring, swing measures, and interaction modeling allowed us to capture dynamics that are difficult to detect through manual coding alone. By using both thematic and topic divergence, we detected both subtle semantic drifts and sharper categorical breaks, showing that the interpretation of divergence itself is conditioned by visibility. This dual operationalization directly addresses concerns about definitional clarity and strengthens the reliability of cross-linguistic comparisons.
Finally, the findings carry direct implications for migration communication and policy. English-speaking publics appear most responsive to coherent messaging that leverages structural visibility to stabilize discourse. Spanish-speaking publics thrive on plurality, suggesting that communicators should design strategies that encourage diverse voices, while preparing for volatility when negative sentiment emerges. Italian-speaking publics respond most strongly to optimism, highlighting the importance of positive framing in sustaining institutional trust. These differentiated logics show that digital communication on migration cannot adopt a one-size-fits-all approach. Instead, strategies must adapt to the cultural frames embedded in each language, leveraging visibility, sentiment, and divergence differently to build credibility and engagement in multilingual contexts. Table 8 summarizes the study’s practical implications by linking research insights to the empirical results, interpretive meaning, and policy and communication actions. Importantly, the synthesis draws on evidence from both the main specification (Table 6, Thematic Divergence) and the complementary robustness check (Table 7, Topic Divergence), ensuring that the recommended strategies reflect findings that are robust across alternative operationalizations of divergence.
Policy implications of multilingual interaction effects.
| Insights | Empirical Result | Interpretive Meaning | Policy / Communication Action |
|---|---|---|---|
| How does visibility shape engagement? | English: Tweet visibility buffers emotional swings (Fig. 4a);Spanish: Reply visibility cushions divergence (Fig. 4b);Italian: Sentiment shifts dominate over visibility. | Visibility functions differently across publics: coherence under English, plurality under Spanish, optimism under Italian. | English: Prioritize making coherent policy narratives highly visible (e.g., government accounts amplifying unified Brexit migration updates).Spanish: Ensure diverse replies are surfaced (e.g., community organizations retweeting personal migration stories).Italian: Use visibility less for reach, more to showcase positive tone (e.g., ministries amplifying stories of opportunity programs). |
| How do sentiment shifts affect diffusion? | English: Large emotional swings penalized unless buffered by visibility;Spanish: Negative shifts destabilize engagement;Italian: Positive sentiment shifts amplify retweets (Table 6). | Affect is culturally filtered: emotional restraint in English, volatility in Spanish, optimism in Italian. | English: Avoid overly emotional framings unless backed by authoritative accounts.Spanish: Anticipate volatility; provide empathetic framing when negative emotions spike.Italian: Lean on affective optimism (e.g., highlighting education/job initiatives). |
| How is thematic divergence interpreted? | English: Divergent replies penalized when visible (Fig. 4c);Spanish: Divergence reframed as legitimate under reply visibility;Italian: Divergence effects nonsignificant. | Divergence = coherence violation in English, inclusivity marker in Spanish, neutral in Italian. | English: Keep messaging thematically consistent; manage reply threads to reduce off-topic drift (e.g., official FAQs responding directly).Spanish: Use divergent voices to broaden legitimacy (e.g., retweeting varied migrant experiences).Italian: Focus on optimism regardless of divergence (e.g., framing reforms positively). |
Fig. 5 presents a conceptual map of cross-linguistic engagement dynamics across three axes of conversational swing: emotional swing (absolute sentiment shift, measuring polarity contrasts between tweets and replies), thematic swing (thematic divergence, capturing topical drift between tweets and replies), and strength swing (intensity shift, reflecting changes in the magnitude of sentiment regardless of polarity). English publics exhibit high emotional and strength swings, reflecting sharp polarity contrasts and intensity modulation, but moderate thematic swings, consistent with collective coherence reinforced through emotional modulation. Spanish publics show low emotional and strength swings and the lowest thematic swing, reflecting expressive plurality across topics but coherence within reply threads. Italian publics show low emotional and strength swings but moderate thematic swings, reflecting consistent optimism in tone even when the topical focus shifts. Together, these axes illustrate how different cultural logics shape the discourse on brain drain on X: English through the emotional modulation of collective narratives, Spanish through plural yet coherent branching, and Italian through optimistic stability.
Cross-linguistic engagement swings (emotional, thematic, strength).
Note. Emotional swing is measured as the absolute sentiment shift, i.e., the magnitude of change in polarity between a tweet and its reply (|Reply Sentiment – Tweet Sentiment|). Larger values indicate greater emotional contrast. Thematic swing is measured as thematic divergence, i.e., the degree of topical drift between tweet and reply computed as (1 – cosine similarity) of their TF–IDF vectors. Larger values indicate replies departing further from the original tweet’s content. Strength swing is measured as intensity shift, i.e., the change in absolute sentiment magnitude (|Reply Sentiment| – |Tweet Sentiment|). Positive values reflect replies becoming more emotionally intense than the tweet; negative values indicate dampening. All measures are standardized within language.
This study shows that multilingual publics interpret and engage with the debate on migration through distinct cultural logics. English publics favor coherence, Spanish publics sustain plurality, and Italian publics rely on optimism. The amenability of these patterns to the use of computational methods allowed us to trace how visibility, sentiment, and divergence interact across thousands of tweets and replies. By integrating sentiment analysis, thematic divergence, and interaction modeling, this study demonstrates how computational approaches can uncover the dynamics of coherence, plurality, and optimism that are otherwise difficult to observe.
Simultaneously, the study’s limitations must be acknowledged. The reliance on X’s API may have introduced a sampling bias, as the platform’s public data stream is not fully representative of all users or tweets, and patterns of activity differ substantially across linguistic communities. This unevenness may not only affect the relative weight of English, Spanish, and Italian debates in our dataset but also the dynamics of visibility captured in our models. Moreover, while our analysis focused on visibility, sentiment, and divergence, other powerful drivers of migration debates, such as explicit references to violence or insecurity, were not modeled directly, even though they remain central to the lived experience of mobility. Sentiment scoring relied on language-specific lexicons, which were rescaled and validated but are ultimately unable to fully capture the cultural nuance of irony, sarcasm, or context-dependent meanings. Thematic divergence was measured using TF–IDF cosine similarity and topic modeling, both robust methods but limited compared to newer embedding-based models that can better capture semantic subtleties. In addition, our analysis was conducted at the comment level rather than the user level, which prevented us from connecting findings to network measures of centrality,2 bridging, or influence; the same user rarely appears across threads, limiting actor-level inferences. While this study focused on comment–reply dynamics rather than user-level centrality or modularity, future research could extend our findings by mapping influential users, their bridging roles, and structural positions in multilingual debates. Such analyses would complement our focus on discourse-level mechanisms by linking narrative dynamics to the architecture of online networks. Finally, the design remains observational, identifying associations rather than causal effects. In summary, technology can only get us so far: computational tools are powerful for revealing structural and affective dynamics in digital publics, but they cannot fully resolve questions of context, causality, or interpretation without complementary approaches.
Future research can address these limitations by expanding the linguistic and platform scope of analysis, incorporating multimodal environments such as TikTok or YouTube, and extending beyond European languages to capture a wider range of cultural frames. Advances in transformer-based sentiment models and multilingual embeddings could further improve semantic accuracy, while longitudinal and experimental designs would help clarify causal mechanisms; for example, testing whether visibility interventions or framing strategies stabilize or destabilize discourse. Integrating network-level analysis would also allow scholars to connect content patterns to actor influence, bridging dynamics, and eigenvector centrality. Importantly, linking online discourse more directly to offline drivers of migration, such as violence or economic insecurity, would further ground computational analysis in lived realities.
Analyses were conducted on publicly accessible posts from X using the Academic Research API, in full compliance with the platform’s developer policy. All analyses were performed at the sentence–reply level; no user identifiers, handles, or tweet IDs were stored or analyzed. The data were de-identified, and all scripts operated on synthetic examples to demonstrate workflow reproducibility. A minimal replication package (Park, 2025)—including preprocessing, sentiment, divergence, modeling code, and synthetic tweet–reply examples—is hosted on the OSF at https://doi.org/10.17605/OSF.IO/S6VZ2. This archive enables full pipeline reproduction without distributing original user content and maintains both ethical and legal compliance with data-sharing norms.
Despite these limitations, this study demonstrates the value of computational social science for migration research. By combining theory-driven categories with computational measures, we show how digital publics negotiate coherence, plurality, and optimism across languages.
The implications for policymakers and communicators are: multilingual communication strategies must be adaptive and responsive to the cultural logics embedded in language, instead of assuming uniform digital publics. Table 8 summarizes these implications by connecting the research questions to the empirical results, theoretical meaning, and policy recommendations. Simultaneously, our results highlight both the promise and limits of computational analysis. Technology can only get us so far: while large-scale data and automated measures illuminate structural and affective dynamics that would otherwise remain hidden, a deeper understanding of migration discourse requires the combination of these tools with contextual, qualitative, and causal approaches.
CRediT authorship contribution statementJimi Park: Writing – review & editing, Writing – original draft, Validation, Supervision, Software, Resources, Methodology, Investigation, Formal analysis, Conceptualization. Nicole Pitassi: Writing – review & editing, Writing – original draft, Visualization, Data curation.
None.
Note. The number of topics was set to seven, based on both inductive reading of the corpus and alignment with prior scholarship (Table 4).
Cross-linguistic visibility weights and mechanisms.
Note. Weights (β coefficients) are derived from language-specific OLS visibility models fitted separately for tweets and replies: tweet_retweets ≈ β₁ × tweet_replies + β₂ × tweet_likes; reply_retweets ≈ β₃ × reply_likes + β₄ × reply_replies. All indices were z-standardized within language before modeling to ensure comparability of direction and relative magnitude, but not absolute scale, across languages.
Correlation between topic and thematic divergence.
| Language | n Pairs | r (Pearson) | 95 % CI | Spearman ρ |
|---|---|---|---|---|
| English | 3132 | 0.12 | [0.084, 0.156] | 0.12 |
| Spanish | 4490 | 0.05 | [0.018, 0.080] | 0.03 |
| Italian | 253 | 0.31 | [0.194, 0.429] | 0.32 |
Note. All correlations computed on complete tweet–reply pairs; bootstrapped 95 % percentile confidence intervals (R = 1000) shown in brackets. Moderate positive associations suggest conceptual alignment yet non-redundancy between topic-based and thematic divergence measures.
Fig. A1, Tables A1, A2, A3, A4, A5.
Multicollinearity diagnostics across models and languages.
Note. All VIF and GVIF values < 3.0, indicating no multicollinearity. Categorical predictors evaluated using the generalized VIF adjusted for degrees of freedom.
HC3-robust regression estimates across models and languages.
Estimated coefficients (with HC3 heteroskedasticity-robust standard errors) for Models 3–6 by language. Predictors include tweet-level and reply-level visibility indices, sentiment shift, and interaction terms. All key predictors retain significance and expected direction, confirming robustness to heteroskedasticity.
Note. Across all languages, coefficients for Tweet Visibility remain stable (∼1.0) and highly significant. Reply Visibility and Sentiment Shift vary by language, with Italian showing a significant positive effect for sentiment. Interaction terms were generally nonsignificant, indicating additive rather than multiplicative effects. Across languages, a one-unit increase in standardized tweet visibility corresponds to roughly a one-unit increase in retweet impact, underscoring visibility as the dominant predictor. Spanish engagement also scales with visibility but shows more variance in reply effects. Italian sentiment coefficients are positive, supporting the role of affective resonance in smaller networks.


























