metricas
covid
International Journal of Clinical and Health Psychology Should we worry about how we measure worry? Insights from an updated version of ...
Journal Information
Visits
1829
Vol. 25. Issue 2.
(April - June 2025)
Full text access
Should we worry about how we measure worry? Insights from an updated version of the Italian Penn State Worry Questionnaire
Visits
1829
Gioia Bottesia,b,
Corresponding author
gioia.bottesi@unipd.it

Corresponding author.
, Andrea Spotoa,b
a Department of General Psychology, University of Padova, Via Venezia 8, Padova 35131, Italy
b Centro di Ateneo dei Servizi Clinici Universitari Psicologici (SCUP), University of Padova, Via Niccolò Tommaseo 47a, Padova 35131, Italy
This item has received
Article information
Abstract
Full Text
Bibliography
Download PDF
Statistics
Tables (5)
Table 1. Fit indices of the four competing models tested on the 2020s sample.
Tables
Table 2. Standardized factor loading and RSM IRT item parameters of the 11 items of PSWQ included in Model 4 (2020s sample).
Tables
Table 3. McDonald’s ω if item dropped and item-rest correlation of the PSWQ-11 (2020s sample).
Tables
Table 4. Measurement Invariance between males (N=300) and females (N=364) in the 2020s sample.
Tables
Table 5. Descriptive indices, decile distribution and 95th percentile separated by sex of the PSWQ-11 (2020s sample).
Tables
Show moreShow less
Additional material (1)
Abstract
Background

Research indicates a rise in self-reported worry, highlighting the need for updated psychometric tools. The Penn State Worry Questionnaire (PSWQ) assesses excessive worry and there is debate over whether only its 11 positively worded items should be used. This study aimed to clarify the factor structure and psychometric properties of the PSWQ and to explore worry features in two diverse Italian community samples from the 2010s and 2020s.

Methods

The 2020s sample included 674 participants (44.5 % female; Mage = 29.44 ± 13.20), while the 2010s sample comprised 411 individuals (61.6 % female; Mage = 36.64 ± 13.73). Methods from Classical Test Theory (CTT) were used to compare alternative PSWQ factor structures, assess the best-fitting model’s reliability and validity, and evaluate measurement invariance (MI) across sexes in the 2020s sample. Item Response Theory (IRT) was applied to refine and confirm the best-fitting factor structure and to compare item and individual locations across samples.

Results

The 11-item one-factor model was the best fit and it showed excellent reliability and concurrent validity. MI across sexes was supported. IRT analyses suggested that items were slightly more difficult for the 2010s sample.

Conclusions

The PSWQ-11 is a valid and reliable tool for assessing worry in the Italian community. The findings suggest that societal issues as well as socio-demographic characteristics may contribute shaping differences in worry features across diverse historical contexts.

Keywords:
Worry
Penn State Worry Questionnaire
Classical Test Theory
Item Response Theory
Community samples
Full Text
Introduction

Worry is a cognitive process characterized by anticipating future threats, manifesting through “a chain of thoughts and images relatively affect laden and relatively uncontrollable; it represents an attempt to engage in mental problem solving on an issue whose outcome is uncertain but contains the possibility of one or more negative outcomes (Borkovec et al., 1983; p. 10)”. Worrying is an inherent facet of the human experience, and cognitive theories propose that worry may play a protective role, enabling individuals to prepare for and cope with upcoming challenges (Behar et al., 2009). However, when worry escalates to disproportionate or uncontrollable levels—reaching a pathological state—it can exert detrimental effects on physical and psychological health. Pathological worry constitutes a central feature of Generalized Anxiety Disorder (GAD) (American Psychiatric Association [APA], 2022), and it manifests as a transdiagnostic construct observable across various psychopathologies, including anxiety and depressive disorders (e.g. Ehring & Watkins, 2008; McEvoy et al., 2013; Warren et al., 2021).

Both research and clinical evidence point towards an upward trend in self-reported worry in recent years (e.g. Carleton et al., 2019; Davey et al., 2022). Societal changes, economic pressures, and the widespread impact of digital technology have all been identified as potential contributors to the increased levels of worry and anxiety (Davey, 2018). Additionally, some recent global events may have further significantly augmented normative levels of worry, such as the COVID-19 pandemic, extreme weather events, and geopolitical conflicts (Freeston et al., 2020; Limone et al., 2022; Taylor, 2020). Given these premises, the importance of employing robust psychometric measures adequately capturing this construct and incorporating updated normative data is paramount.

The Penn State Worry Questionnaire (PSWQ) is an extensively used self-report questionnaire designed to measure excessive and uncontrollable worry (Meyer et al., 1990). Due to its established validity and reliability, it is frequently utilized as a screening instrument for pathological worry (Porta-Casteràs et al., 2020). It comprises 11 items framed in a positive manner and 5 items expressed with negative wording, and it was initially conceived as a unifactorial measure of worry. However, investigations into its factor structure through subsequent research have produced inconsistent findings.

Fresco et al. (2002) found that the 11 positively worded items within the PSWQ were associated with a "worry engagement" factor, while the 5 negatively worded items were linked to an "absence of worry" factor. Nevertheless, critics argue against two-factor solution studies, suggesting they may overlook method variance or effects common in measures with both positive and negative items (e.g. Brown, 2003). Hazlett-Stevens et al. (2004) later examined a different alternative model introducing a “method factor” associated with negative wording. They found that a model wherein all items from the PSWQ loaded onto a single worry factor and all negatively worded items loaded also onto a negative wording method factor (i.e., the method-factor model) showed a fit equivalent to the two-factor model. This finding received support in various studies, and the method-factor model was retained as a simpler explanation for assessing worry based on its presence/absence (Hazlett-Stevens et al., 2004; Jeon et al., 2017; Lim et al., 2008; Liu et al., 2022; Zhong et al., 2009). Among them, a recent study involving American undergraduate students, Dutch high school students, and American adults with psychopathology also found that items phrased positively, as opposed to negatively, showed concurrent validity with measures and diagnoses of anxiety and depression symptoms (Liu et al., 2022).

To date, only a few studies on Spanish-speaking samples have also examined an 11-item version of the PSWQ (Carbonell-Bártoli & Tume-Zapata, 2022; Padros-Blazquez et al., 2018; Ruiz et al., 2018), which excludes the 5 negatively worded items. These studies all yielded promising results, concluding that this abbreviated version is valid and reliable. However, findings were mainly based on undergraduate samples and/or did not compare the 11-item one-factor model with existing competing models.

The current study

This study was designed to address existing gaps in the literature on the PSWQ. To this end, we focused on the Italian version of the PSWQ, for which the available information is outdated and a rigorous validation is lacking (Meloni & Gana, 2001; Morani et al., 1999). Indeed, Morani et al. (1999) administered the 16 PSWQ items to 388 individuals from the general population, aged 18–86 years. They only reported descriptive statistics (means, standard deviations, and ranges) by age and sex, along with the overall Cronbach’s alpha value (0.85). No formal test of factor structure or the main psychometric properties of the measure were conducted. Subsequently, Meloni and Gana (2001) examined the factor structure of the PSWQ on a sample of 142 participants (mainly undergraduates) using both principal component analysis and confirmatory factor analysis. They also assessed external validity by using a measure of self-actualization but did not explore issues about concurrent validity with anxiety and depression symptom measures. The preliminary nature of the study prevented drawing definitive conclusions about the factor structure and the psychometric properties of the questionnaire, despite the authors suggested that the PSWQ could be enhanced by eliminating negatively worded items (Meloni & Gana, 2001).

In our study, we aimed to overcome the limitations identified in previous literature by integrating methods from two psychometric theories: Classical Test Theory (CTT; e.g., Lord & Novik, 1968; Novik, 1966) and Item Response Theory (IRT; e.g., Lord, 1980). Most of the studies on the PSWQ to date have focused on the exclusive application of CTT-related methods, such as exploratory and confirmatory factor analysis (CFA) and classical model fit comparisons. In contrast, only a few studies have explored the application of IRT, and these have been limited to very specific aims and narrowly defined populations (e.g., Schroeder et al., 2019; Wu et al., 2013). However, as mentioned earlier, recent literature increasingly questions the existence of the two traditionally accepted factors, prompting consideration of IRT's applicability in PSWQ research as a tool for refining the instrument and providing updated norms.

Our first objective was to enhance understanding of the PSWQ's factor structure and psychometric characteristics using a recent, large Italian community sample. Secondly, we aimed to investigate whether distinct community samples recruited across different time periods – likely shaped by distinct societal and economic conditions - may exhibit differences in worry features. To our knowledge, this is the first study examining these issues through the joint application of CTT and IRT methods.

Data analysis strategy and study hypotheses

The first aim was addressed using a sample recruited in 2021 and 2022 (i.e., “2020s sample”). Specifically:

  • -

    We initially applied CTT to compare the original 16-item one-factor model (Meyer et al., 1990), the 16-item two-factor model (Fresco et al., 2002), the 16-item method-factor model (Hazlett-Stevens et al., 2004), and the 11-item one-factor model (Carbonell-Bártoli & Tume-Zapata, 2022; Padros-Blazquez et al., 2018; Ruiz et al., 2018). The model that best fitted the data was then analyzed using IRT methods to verify the overall model performance and item-specific parameters. Several authors have suggested that enhancing the PSWQ might involve retaining only its 11 positively worded items (see Liu et al., 2022 for details). Thus, we expected the 11-item one-factor model to emerge as the best-fitting model compared to the other solutions and that the model's adequate fit to the data would be confirmed through IRT analysis (H1a).

  • -

    We assessed concurrent validity by examining the correlations between the PSWQ and the depression, anxiety, and stress scales of the Depression Anxiety Stress Scales-21 (DASS-21; Bottesi et al., 2015). Consistent with the literature (e.g. Liu et al., 2022; Ruiz et al., 2018), we hypothesized strong positive associations between the PSWQ and DASS-21 scores (H1b).

  • -

    We investigated Measurement Invariance (MI) across sexes. MI of the PSWQ across sexes has been supported, despite it has only been examined in a few studies (Carbonell-Bártoli & Tume-Zapata, 2022; Ruiz et al., 2018). Thus, we hypothesized that the Italian version of the PSWQ would also be invariant across females and males (H1c).

  • -

    We calculated descriptive statistics and percentiles by sex to provide up-to-date Italian normative data. Based on the literature (e.g., Bottesi et al., 2018; Gould & Edelstein, 2010), we expected women to show higher levels of worry than men (H1d).

To address the second aim, we used an additional Italian community sample recruited between 2012 and 2018 (“2010s sample”). We conducted the same IRT analyses to the PSWQ scores from the 2010s sample as those conducted on the 2020s sample and then compared item characteristics and individual location parameters across the two groups. As this is the first study to explore this topic, we did not formulate specific hypotheses. Nonetheless, we anticipated potential differences in worry-related features among samples recruited across distinct historical contexts, based on the premise that societal, cultural, and environmental factors can shape the experience of worry (H2). This expectation was informed by prior research on non-clinical populations in the US and UK, which has documented a rise in reported worry levels over the past two decades (Carleton et al., 2019; Davey et al., 2022).

Materials and methodsParticipants and procedure

The 2020s sample initially comprised 938 individuals from the Italian community. They were recruited through snowball sampling and invited to complete an online survey focusing on personality traits and psychological features between May 2020 and March 2022. We collected data via a SurveyMonkey link, which was distributed across social networks (e.g., Facebook, Instagram, and Telegram). Participants provided informed consent before completing a socio-demographic schedule and the questionnaires. The survey took approximately 50 min to complete, and no rewards were offered for participation. We retained only participants that did not enter the study during the nationwide lockdown due to the COVID-19 outbreak. Thus, only data collected between November 2021 and March 2022 were used (N = 812). Subsequently, we excluded participants that accessed the survey link but did not complete any part of the study —either the demographic questions or the questionnaires (N = 128). The final sample consisted of 674 participants with complete data, of whom 44.5 % were female, and ages ranged from 18 to 65 years old (M = 29.44; SD=13.20). The mean education level was 13.92 years (SD=2.69, range=5–25). Regarding relationship status, 74 % of participants were single, 22.8 % were married or cohabitating, 2.7 % were separated or divorced, and 0.5 % were widowed. Concerning occupation, 48.2 % were employed, while the remaining were not (they were students, housekeepers, retired, or unemployed)

The 2010s sample was drawn from a dataset of individuals recruited from the Italian community between 2012 and 2018, with some of the data having been previously published in research by Bottesi et al. (2017, 2019). This sample consisted of 411 individuals (61.6 % female) aged between 18 and 65 years (M = 36.64, SD=13.73) and with a mean education level of 14.65 years (SD=3.39; range 5–30). Relationship status was 53.3 % single, 41.4 % married/cohabitating, 4.9 % separated/divorced, and 0.4 % widowed. Regarding occupation, 56 % were employed, while the remaining were not. The 2010s sample included a higher proportion of females (χ²(1)=29.69, p<.001), was older (F(1,1083)=73.73, p<.001), had a higher level of education (F(1,1081)=15.43, p<.001), was more likely to be in a stable romantic relationship (χ²(3)=49.44, p<.001), and had a lower rate of unemployment (χ²(1)=22.49, p<.001) compared to the 2020s sample. Thus, the two samples significantly differed in their socio-demographic composition.

This research was conducted in accordance with the Declaration of Helsinki and it was approved by the Ethics Committee of Psychological Sciences of the University of Padova (Italy).

Measures

The Penn State Worry Questionnaire, a 16-item inventory designed to assess the tendency to worry excessively and uncontrollably. It measures trait worry by asking individuals to rate the extent to which each item is typical of them on a 5-point Likert scale (1 = “Not at all typical of me”, 5 = “Very typical of me”). Internal consistency and test-retest reliability of the original version were good in non-clinical and clinical samples (Meyer et al., 1990).

The Depression Anxiety Stress Scales-21 (DASS-21; Lovibond & Lovibond, 1995), a 21-item measure evaluating depression, anxiety, and stress symptoms over the previous week on a 4-point Likert scale (0 = “Did not apply to me at all”, 3 = “Applied to me very much, or most of the time”). A total score as well as three subscale scores (i.e., Anxiety, Depression, Stress) can be computed. The total and subscale scores of the Italian version showed an excellent internal consistency in both clinical and non-clinical samples (Bottesi et al., 2015).

Data analysis

All analyses were conducted using R 4.4.0 statistical software. Specifically, model comparisons and MI were performed with the lavaan package, initial data screening and scale reliability were carried out using the psych package, and IRT analysis was conducted with the mirt package. Additionally, Jamovi version 2.3.28 was used, particularly for the snowIRT module. The scripts and data are available on OSF: https://doi.org/10.17605/OSF.IO/YDGPH.

CTT methods

Methods from CTT were adopted to analyze data of the 2020s sample. Specifically, they were used to: (i) compare the alternative factor structures of the PSWQ identified in the literature, (ii) assess the reliability and validity of the best-fitting model, (iii) evaluate MI across sexes for the best-fitting model.

We performed model comparisons estimating the fit of different models and comparing the following fit indices: χ2, χ2/df, Tucker-Lewis Index (TLI), Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR). Acceptance criteria were based on current literature guidelines: RMSEA < 0.08; SRMR < 0.08 TLI and CFI > 0.95 (Hu & Bentler, 1999). Given the data distribution (see paragraph 3.1), all models were fitted using the Diagonal Weighted Least Squares (DWLS) as estimator. The reliability of the resulting scale was estimated by means of McDonald’s ω, while concurrent validity was analyzed by means of the Spearman ρ coefficient between the PSWQ and DASS-21 scores.

MI across sexes (300 males vs. 364 females) was tested by imposing increasing restrictions to the model and evaluating the modifications in fit indices. To this aim, a multi-group CFA was performed to assess sex invariance at configural, metric, and scalar levels. Configural invariance examines if the unconstrained model is equal across sexes. Metric invariance implies that the magnitude of loading is similar across the two sexes. Scalar invariance implies similarity on factor loadings and item intercepts between the two groups. We considered ΔCFI ≥ −0.005 and the ΔRMSEA ≤ 0.010 or ΔSRMR ≤ 0.005 for invariance (Chen, 2007). Moreover, a chi-square (χ²) was utilized to test invariance or partial invariance by comparing the constrained with the unconstrained model.

IRT methods

IRT methods were used to: (i) refine and further confirm the best fitting solution for the factor structure of PSWQ emerged from CTT, and (ii) compare items and individuals’ locations between the 2010s and 2020s samples.

Due to the nature of the PSWQ items, which are based on a 5-point Likert scale, we chose to use the Rating Scale Model (RSM; Andrich, 1978) on both the 2020s and 2010s samples. The RSM assumes that the individual probability of selecting a given score for an item is a logistic function of the person’s location parameter and the difficulty of item steps (Andrich, 1978). In the RSM, item category thresholds are constrained to be equal for all items. Naturally, items are still allowed to vary in difficulty (i.e., different item location parameters are permitted; Masters & Wright, 1984). Importantly, each item provides equal weight of information to the scale, sharing the same discrimination parameters (i.e., slopes, Bacci et al., 2014). Model fit was assessed using traditional indices such as RMSEA, TLI, CFI. Additionally, consistent with IRT literature, infit and outfit indices of items were utilized to analyze items specific behavior. The former is crucial for identifying items where unexpected responses occur among respondents whose estimated ability closely matches the item’s difficulty. Outfit, on the other hand, detects unexpected responses to items that may not be aligned with a respondent’s ability level. Both indices are expected to approximate 1, with values below 0.5 that indicate underfit of the item, while values above 1.5 suggest overfit. Finally, person reliability was computed. This index refers to the precision and stability of the measure in estimating individual’s ability parameter (e.g., Embretson & Reise, 2000).

ResultsPreliminary descriptive analyses

As a preliminary step, an exploration of the 2020s sample data was conducted to examine descriptive statistics and distributions for each of the 16 PSWQ items (see Supplementary Materials, Table S1). The Shapiro-Wilk test highlighted that none of the items was normally distributed. For all the items, after inverting the reverse worded ones, all the scale values were observed, and the average score was between 1.85 (item PSWQ_10) and 3.25 (item PSWQ_6).

Models comparison

Four different models for the factor structure of the PSWQ were compared: the 16-item one-factor model (Model 1), the 16-item two-factor model (Model 2), the 16-item method-factor model (Model 3), and the 11-item one-factor model (Model 4). All fit indices are displayed in Table 1.

Table 1.

Fit indices of the four competing models tested on the 2020s sample.

  χ2  df  p-value  CFI  TLI  RMSEA  SRMR 
Model 1  952.460  104  <0.001  .988  .986  .110  .081 
Model 2  437.660  103  <0.001  .995  .995  .069  .051 
Model 3  357.939  99  <0.001  .996  .996  .069  .045 
Model 4  201.860  44  <0.001  .997  .997  .069  .038 

Note. Model 1 = the original 16-item one-factor model; Model 2 = the 16-item two-factor model; Model 3 = the 16-item method-factor model; Model 4 = the 11-item one-factor model. CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual.

The inspection of fit indices suggested that the best model was Model 4. All factor loadings were strong and significant, ranging between 0.695 for item PSWQ_9 and 0.914 for item PSWQ_5. Table 2 reports the standardized factor loadings.

Table 2.

Standardized factor loading and RSM IRT item parameters of the 11 items of PSWQ included in Model 4 (2020s sample).

Item  Std. Loading  IRT Difficulty  infit  outfit 
PSWQ_2  .764  −1.515  1.035  1.001 
PSWQ_4  .879  −2.055  .647  .632 
PSWQ_5  .914  −1.972  .643  .607 
PSWQ_6  .781  −2.659  1.010  .978 
PSWQ_7  .877  −1.491  .773  .722 
PSWQ_9  .695  −1.399  1.371  1.375 
PSWQ_12  .756  −2.293  1.058  1.116 
PSWQ_13  .834  −1.473  .996  .925 
PSWQ_14  .874  −1.257  .775  .723 
PSWQ_15  .901  −0.866  .743  .632 
PSWQ_16  .736  −2.219  1.046  1.046 
IRT analysis

The RSM fit indices of the 11-item one-factor model revealed a mixed fit to the data (RMSEA= 0.100; SRMR= 0.084; TLI= 0.972, CFI= 0.957). Since in the RSM item category thresholds are constrained to be equal for all items, we have only one set of estimates of these parameters whose values were: 0.839 for the first threshold, 1.503 for the second, 3.017 for the third and 3.822 for the last. Item’s difficulty parameters are displayed in the third column of Table 2. The lowest value was observed for item PSWQ_6, which resulted to be the “less difficult” one. On the contrary, the highest value was observed for item PSWQ_15, which was the “most difficult” one. The person parameters ranged between −3.389 and 4.214. The items tended to be, in general, centered on a level of difficulty rather low.

Moreover, we analyzed infit and outfit indices for all the items and checked the residuals of the model. With respect to the former, we verified that all these parameters fell within the 0.5–1.5 range, confirming the goodness of all the items in the model as well as the goodness of the model itself. Results are reported in the last two columns of Table 2. Only item PSWQ_9 reported slightly high values (in any case below the mentioned thresholds). Residual analysis was conducted using Q3 statistics (Chen et al., 2013) to further evaluate model fit. The Q3 values were generally low and centered around zero (Mean = –0.094; Min = –0.277; Max = 0.246), with no evidence of systematic local dependence among item pairs. These results fall well within commonly accepted thresholds (e.g., Christensen et al., 2017), supporting the assumption of local independence. Last, we computed person reliability of the scale, and the value was excellent and equal to 0.910.

Reliability and concurrent validity

The McDonald’s ω for the 11-item PSWQ was 0.943 (95 % C.I.: 0.936–0.949), indicating a very strong reliability. All items were strongly related to one another as displayed by the values of McDonald’s ω dropping one item at a time and the item-rest correlations (Table 3). All item-rest correlations were largely above 0.600, and by eliminating each item, the overall value of McDonald’s ω always worsened, indicating the solid reliability of the unidimensional solution proposed by this model.

Table 3.

McDonald’s ω if item dropped and item-rest correlation of the PSWQ-11 (2020s sample).

Item  McDonald’s ω if item dropped  Item-rest correlation 
PSWQ_2  .939  .701 
PSWQ_4  .935  .807 
PSWQ_5  .933  .840 
PSWQ_6  .939  .700 
PSWQ_7  .934  .810 
PSWQ_9  .942  .625 
PSWQ_12  .939  .691 
PSWQ_13  .937  .761 
PSWQ_14  .935  .795 
PSWQ_15  .935  .803 
PSWQ_16  .940  .683 

With respect to concurrent validity, all correlations between the total score of the 11-item PSWQ and the DASS-21 subscales were significant and moderate-to-strong in size: DASS-21 Depression: ρ=0.548, p < 0.01; DASS-21 Anxiety: ρ=0.510, p < 0.01; DASS-21 Stress: ρ=0.624, p < 0.01.

Measurement invariance

Results on configural, metric and scalar MI across sexes for the 11-item one-factor model are reported in Table 4.

Table 4.

Measurement Invariance between males (N=300) and females (N=364) in the 2020s sample.

Model  χ2  df  CFI  RMSEA  SRMR  Model comparison  Δχ2  Δdf  p  ΔCFI  ΔRMSEA  ΔSRMR 
Configural  206.968  88  .998  .063  .041               
Metric  223.306  98  .998  .062  .043  Metric vs Configural  16.338  10  .09  <0.001  −0.001  .002 
Scalar  291.419  130  .997  .061  .041  Scalar vs Metric  68.113  32  <0.05  −0.001  −0.001  −0.002 

Note. CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual.

While configural and metric invariance were clearly confirmed, results for scalar invariance were slightly more ambiguous. In particular, the difference between the χ2 of the model with the loadings set equal across groups and the model where both loadings and intercepts were set equal across the groups was significant, indicating a worse fit of the second model. Nonetheless, all the other differences between important fit indices (namely ΔCFI, ΔRMSEA, and ΔSRMR) were within the tolerance limits, confirming scalar invariance. This fact, together with the well-known reduced reliability of the χ2 statistics with large samples (Chen, 2007; Heung & Rensvold, 2002; Kline, 2015; Little, 2013), led us to conclude that scalar invariance between the two sexes seems to be confirmed.

Scale descriptives and norms

Table 5 illustrates the main descriptive indices as well as the percentile distribution, separated by sex, of the total score of the 11-item PSWQ. The average score obtained by females was significantly higher than that of males (t672=−8.283; p<.01; Cohen’s d =−0.643). Consequently, the tentative screening threshold for males (95th percentile) is 47, while the same threshold for females is fixed at a score of 51.

Table 5.

Descriptive indices, decile distribution and 95th percentile separated by sex of the PSWQ-11 (2020s sample).

  Sex
  Males  Females 
Mean (SD)  26.163 (10.397)  32.960 (10.821) 
Min-Max (Range)  11–55 (44)  11–55 (44) 
10th Percentile  13.000  19.000 
20th Percentile  16.000  24.000 
30th Percentile  19.000  26.000 
40th Percentile  22.000  29.000 
50th Percentile  25.000  32.000 
60th Percentile  27.000  36.000 
70th Percentile  31.000  39.000 
80th Percentile  35.000  43.000 
90th Percentile  41.000  48.000 
95th Percentile  46.350  51.000 
IRT analysis on the 2010s sample

The RSM IRT analysis indicated that the fit of the 11-item one-factor model on the 2010s sample data was slightly better than the one of the other sample (RMSEA= 0.081; SRMR= 0.073; TLI= 0.976, CFI= 0.963). The range of item difficulty was similar to that observed in the 2020s sample, although both the minimum and maximum values were higher in the 2010s one. Specifically, the lowest value was again observed for item PSWQ_6, with a value of −2.365, while the highest value was still for item PSWQ_15, at −0.124. Overall, items appeared to be slightly more “difficult” for the 2010s sample (see Supplementary Materials, Table S2).

Infit and outfit measures were all in the desired range with no exceptions. Some changes in the order of item difficulty occurred. In particular, while the extreme items (namely PSWQ_6 and PSWQ_15) remained the same in both samples and the order of the first 5 items was essentially stable between the two samples (PSWQ_6, PSWQ_16, PSWQ_12, PSWQ_4 and PSWQ_5), items PSWQ_2 and PSWQ_7 in the 2010s sample resulted more difficult than PSWQ_13, PSWQ_9, and PSWQ_14. The opposite trend was observed for the more recent data collection. Person reliability was very good also in this sample (0.881).

Discussion

In this study, we aimed to achieve two objectives: first, to clarify the factor structure and psychometric properties of the PSWQ; and second, to explore whether distinct community samples recruited across different time periods may be characterized by differences in worry features. To this end, we utilized the Italian version of the PSWQ and employed methods from both CTT and IRT.

Factor structure and psychometric properties of the PSWQ

The factor structure and psychometric properties of the PSWQ were tested on the 2020s sample which, to our knowledge, represents the most up-to-date normative Italian sample for this questionnaire. Consistent with expectations, the 11-item one-factor model emerged as the best-fitting solution compared to other models reported in the literature, despite the 16-item method-factor model demonstrating similarly good fit indices (H1a). This result contributes to the ongoing debate questioning the validity of the two factors identified by Fresco et al. (2002), namely “worry engagement” and “absence of worry”, and aligns with recent literature advocating for a streamlined version of the PSWQ, retaining only the 11 positively worded items (i.e., the PSWQ-11) (Carbonell-Bártoli & Tume-Zapata, 2022; Padrós-Blázquez et al., 2018; Ruiz et al., 2018). Some studies pointed out the conceptual ambiguity and potential method effects introduced by the negatively worded items (Hazlett-Stevens et al., 2004; Liu et al., 2022; Meloni & Gana, 2001), thus supporting their removal on both theoretical and methodological grounds. Moreover, from the perspective of parsimony, we believe that the PSWQ-11 offers a cleaner factorial structure, reduces the likelihood of response biases (e.g., acquiescence or confusion due to reverse wording), and facilitates more straightforward interpretation of scores. Furthermore, the practical utility of the 11-item version is particularly important in both research and clinical contexts. Shorter instruments are less burdensome for respondents, improve completion rates, and are especially advantageous in time-constrained settings such as primary care, mental health screenings, or large-scale surveys. The streamlined format also enhances the feasibility of repeated assessments over time, which is critical in longitudinal designs or treatment monitoring (Porta-Casteràs et al., 2020). IRT analyses further supported the acceptability of the 11-item one-factor model. Additionally, the inspection of item parameters indicated that, overall, the items are centered on a relatively low level of difficulty. This suggests that these items are better suited for discriminate between individuals with lower levels of worry, rather than those in more severe situations. In our view, this could be a desirable feature of this questionnaire, since it is frequently used to detect pathological worry in screening procedures (Porta-Casteràs et al., 2020). Lastly, the analysis of infit and outfit indices for all items confirmed the overall acceptability of fit for both the individual items and the model as a whole. The only exception was item PSWQ_9 (‘As soon as I finish one task, I start to worry about everything else I have to do’), which showed a slight misfit according to the infit and outfit indices. Future research may consider rephrasing this item, as psychometric standards generally discourage the use of items containing more than one sentence (e.g., Emberston, 1999; Henson & Douglas, 2005), an issue which might - at least partially - explain this result.

Both McDonald’s ω and IRT person reliability indices showed excellent values, providing additional support for the strong reliability of this unidimensional model. Concurrent validity was also confirmed, as the total score of the PSWQ-11 exhibited significant, moderate-to-strong correlations with the DASS-21 subscales (H1b). These findings not only mirror previous studies demonstrating strong associations with anxiety and depression measures (Liu et al., 2022; Ruiz et al., 2018) but are also in line with literature suggesting the transdiagnostic nature of worry (Ehring & Watkins, 2008; McEvoy et al., 2013; Warren et al., 2021).

Consistent with our hypothesis and previous research (Carbonell-Bártoli & Tume-Zapata, 2022; Ruiz et al., 2018), we found that the 11-item one-factor model is invariant across sexes. This ensures that mean differences in the PSWQ-11 total scores are unbiased between males and females, meaning that observed differences accurately reflect true variability in worry levels between groups. Thus, the use of the total score for making valid inferences across sexes, at least in the Italian version of the scale, is supported. When comparing the mean scores of females and males, we observed that females scored significantly higher than males, corroborating both our expectations and the prevailing consensus in the literature, which indicates that women tend to worry more than men (Bottesi et al., 2018; Gould & Edelstein, 2010) (H1d).

Worry features in the 2010s and 2020s

Beyond enabling us to refine and confirm the best-fitting factor structure of the PSWQ as identified through CTT, IRT also provided valuable insights into the construct of worry and its features over the past decade by examining item characteristics and individual location parameters across our two community samples.

First, the RSM IRT analysis revealed that the 11-item one-factor model had a good fit also in the 2010s sample, which demonstrates that the PSWQ-11 is a reliable tool for measuring worry across different decades. With respect to item difficulty, in both the 2010s and the 2020s samples the lowest value was observed for item PSWQ_6 (‘When I am under pressure, I worry a lot’), indicating that this item is more likely to receive higher scores and is thus less clinically relevant. Conversely, the highest value was observed for item PSWQ_15 (‘I worry all the time’), making it the most clinically relevant, as it tends to receive lower scores in general population. This finding aligns with established cognitive models of worry (see Behar et al., 2009; Borkovec et al., 1983), which posit that worry is a common experience (i.e., it is quite likely that most people report worrying under pressure) that becomes pathological when it is perceived as excessive and persistent (i.e., it is unlikely that most people report being constantly worrying).

Interestingly, the items appeared to be slightly more difficult for the 2010s sample than for the 2020s one. Moreover, we noted some shifts in the order of item difficulty between the two samples: specifically, in the 2010s sample, PSWQ_2 (‘My worries overwhelm me’) and PSWQ_7 (‘I am always worrying about something’) were more difficult than PSWQ_13 (‘I notice that I have been worrying about things’), PSWQ_9 (‘As soon as I finish one task, I start to worry about everything else I have to do’), and PSWQ_14 (‘Once I start worrying, I cannot stop’). The opposite trend was observed in the 2020s sample. Overall, these results may point to an increase in worry within the general population over time. Indeed, the fact that the items are generally less difficult in the 2020s tentatively implies that people today may be more prone to experiencing worry. Additionally, specific characteristics of worry that were considered more clinically relevant in the 2010s sample, particularly those related to its intensity and pervasiveness, may be viewed as easier to experience—or even more normative—in the current decade (H2). These findings may align with previous research indicating increasing levels of worry over the past two decades, a trend that has been linked to global crises, rapid technological advancement, and the constant influx of information (Carleton et al., 2019; Davey et al., 2022). However, the observed differences between the 2010s and 2020s samples may also be attributable to demographic variations in sample composition, rather than—or in addition to—genuine societal changes. Notably, the 2020s sample was younger, and included fewer employed individuals compared to the 2010s sample, and previous research has demonstrated that certain sociodemographic characteristics—such as younger age and unemployment—are consistently associated with higher levels of worry across different historical contexts (e.g., Hassen & Adane, 2021; Schwartz & Melech, 2000).

Limitations

There are several limitations in the present study. First, due to its cross-sectional design, we were unable to address either the test-retest reliability or the longitudinal MI of the PSWQ-11 structure in the 2020s sample. Future research should examine this issue to gain a more precise and comprehensive understanding of the questionnaire's factor stability over time. Moreover, this sample was recruited using snowball sampling through social media platforms — an approach that may limit the generalizability of the findings due to potential selection biases (e.g., a higher likelihood of including younger or more technologically savvy individuals). Second, we recognize that comparing item characteristics and individual location parameters between two distinct samples does not allow for definitive conclusions regarding the changes in worry over time. This further underscores the need for longitudinal studies. Third, while our samples covered a broad and representative age range (18–65 years), their composition did not allow for separate analyses by age group. Moreover, we only included community individuals. Future research should address these limitations to assess the known-group validity of the PSWQ-11 and to establish clearer cut-off scores. The normative data we reported, expressed as percentiles, do not equate to clinical significance. Indeed, a score at the 95th percentile in a community sample may indicate subclinical levels of pathological worry rather than a pathological condition. While such information is not diagnostic and requires validation in clinical populations, it remains valuable for preliminary screening and identifying individuals who may benefit from further assessment.

Conclusions

Overall, the results of our study contribute to the existing literature on the measurement of worry through the PSWQ, highlighting that this shortened 11-item version may serve as a valid and reliable screening tool for assessing this construct. Furthermore, the comparison between our 2010s and 2020s samples in terms of IRT-based items and individuals’ parameters suggested that contextual and demographic factors might influence worry features. As a whole, we believe that our findings outline the importance of using robust psychometric measures that incorporate updated normative data to help clinicians and researchers interpret questionnaire scores more effectively. This issue assumes crucial relevance in clinical practice, where the use of psychometrically sound assessment tools is critical not only during the screening and assessment phases, but also in evaluating clinically significant changes after treatment (APA, 2020).

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

None.

References
[American Psychiatric Association 2022]
American Psychiatric Association. (2022). Diagnostic and statistical manual of mental disorders, 5th ed., Text Revision. Washington, DC.
[American Psychological Association 2020]
American Psychological Association. (2020). APA guidelines for psychological assessment and evaluation. Washington, DC.
[Andrich, 1978]
D. Andrich.
Application of a psychometric rating model to ordered categories which are scored with successive integers.
Applied Psychological Measurement, 2 (1978), pp. 581-594
[Bacci, Bartolucci and Gnaldi, 2014]
S. Bacci, F. Bartolucci, M. Gnaldi.
A class of multidimensional latent class IRT models for ordinal polytomous item responses.
Communications in Statistics: Theory and Methods, 43 (2014), pp. 787-800
[Behar et al., 2009]
E. Behar, I.D. DiMarco, E.B. Hekler, J. Mohlman, A.M. Staples.
Current theoretical models of generalized anxiety disorder (GAD): Conceptual review and treatment implications.
Journal of Anxiety Disorders, 23 (2009), pp. 1011-1023
[Borkovec, Robinson, Pruzinsky and DePree, 1983]
T.D. Borkovec, E. Robinson, T. Pruzinsky, J.A. DePree.
Preliminary exploration of worry: Some characteristics and processes.
Behavior Therapy, 21 (1983), pp. 9-16
[Bottesi et al., 2015]
G. Bottesi, M. Ghisi, G. Altoè, E. Conforti, G. Melli, C. Sica.
The Italian version of the depression anxiety stress scales-21: Factor structure and psychometric properties on community and clinical samples.
Comprehensive Psychiatry, 60 (2015), pp. 170-181
[Bottesi, Ghisi, Sica, & Freeston, 2017]
G. Bottesi, M. Ghisi, C. Sica, M.H. Freeston.
Intolerance of uncertainty, not just right experiences, and compulsive checking: Test of a moderated mediation model on a non-clinical sample.
Comprehensive Psychiatry, 73 (2017), pp. 111-119
[Bottesi, Martignon, Cerea and Ghisi, 2018]
G. Bottesi, A. Martignon, S. Cerea, M. Ghisi.
Worry and associated cognitive features in Italian university students: Does gender make a difference?.
Personality and Individual Differences, 126 (2018), pp. 38-43
[Bottesi, Noventa, Freeston, & Ghisi, 2019]
G. Bottesi, S. Noventa, M.H. Freeston, M. Ghisi.
Seeking certainty about Intolerance of Uncertainty: addressing old and new issues through the Intolerance of Uncertainty Scale-Revised.
[Brown, 2003]
T.A. Brown.
Confirmatory factor analysis of the Penn State worry questionnaire: Multiple factors or method effects?.
Behavior Research and Therapy, 41 (2003), pp. 1411-1426
[Carbonell-Bártoli & Tume-Zapata, 2022]
E.L. Carbonell-Bártoli, V.N. Tume-Zapata.
Cuestionario de Preocupación de Pensilvania (PSWQ 11): Evidencias Psicométricas en una muestra de adultos de lima metropolitana.
PsiqueMag, 11 (2022), pp. 31-41
[Carleton, Desgagné, Krakauer and Hong, 2019]
R.N. Carleton, G. Desgagné, R. Krakauer, R.Y. Hong.
Increasing intolerance of uncertainty over time: The potential influence of increasing connectivity.
Cognitive Behaviour Therapy, 48 (2019), pp. 121-136
[Chen, 2007]
F.F. Chen.
Sensitivity of goodness of fit indexes to lack of measurement invariance.
Structural Equation Modeling, 14 (2007), pp. 464-504
[Chen, Yao and Thissen, 2013]
W.H. Chen, J. Yao, D. Thissen.
A general method for assessing the fit of item response models.
Psychometrika, 78 (2013), pp. 26-55
[Christensen, Makransky and Horton, 2017]
K.B. Christensen, G. Makransky, M. Horton.
Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations.
Applied Psychological Measurement, 41 (2017), pp. 178-194
[Davey, 2018]
G.C.L Davey.
The anxiety epidemic.
Robinson, (2018),
[Davey, Meeten and Field, 2022]
G.C. Davey, F. Meeten, A.P. Field.
What’s worrying our students? Increasing worry levels over two decades and a new measure of student worry frequency and domains.
Cognitive Therapy and Research, (2022), pp. 1-14
[Ehring and Watkins, 2008]
T. Ehring, E.R. Watkins.
Repetitive negative thinking as a transdiagnostic process.
International Journal of Cognitive Therapy, 1 (2008), pp. 192-205
[Embretson, 1999]
S.E. Embretson.
Generating items during testing: Psychometric issues and models.
Psychometrika, 64 (1999), pp. 407-433
[Embretson and Reise, 2000]
S.E. Embretson, S.P. Reise.
Item response theory for psychologists.
Lawrence Erlbaum Associates, (2000),
[Freeston et al., 2020]
M. Freeston, A. Tiplady, L. Mawn, G. Bottesi, S. Thwaites.
Towards a model of uncertainty distress in the context of Coronavirus (COVID-19).
Cognitive Behaviour Therapy, 13 (2020), pp. e31
[Fresco, Heimberg, Mennin and Turk, 2002]
D.M. Fresco, R.G. Heimberg, D.S. Mennin, C.L. Turk.
Confirmatory factor analysis of the Penn State worry questionnaire.
Behavior Research and Therapy, 40 (2002), pp. 313-323
[Gould and Edelstein, 2010]
C.E. Gould, B.A. Edelstein.
Worry, emotion control, and anxiety control in older and young adults.
Journal of Anxiety Disorders, 24 (2010), pp. 759-766
[Hassen and Adane, 2021]
S. Hassen, M. Adane.
Facemask-wearing behavior to prevent COVID-19 and associated factors among public and private bank workers in Ethiopia.
[Hazlett-Stevens, Ullman and Craske, 2004]
H. Hazlett-Stevens, J.B. Ullman, M.G. Craske.
Factor structure of the Penn State worry questionnaire: Examination of a method factor.
Assessment, 11 (2004), pp. 361-370
[Henson and Douglas, 2005]
R. Henson, J. Douglas.
Test construction for cognitive diagnosis.
Applied Psychological Measurement, 29 (2005), pp. 262-277
[Heung and Rensvold, 2002]
G.W. Heung, R.B. Rensvold.
Evaluating goodness-of-fit indexes for testing measurement invariance.
Structural Equation Modeling, 9 (2002), pp. 233-255
[Hu and Bentler, 1999]
L.T. Hu, P.M. Bentler.
Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.
Structural Equation Modeling, 6 (1999), pp. 1-55
[Jeon, Kim, Kim and Roh, 2017]
J.W. Jeon, D. Kim, E. Kim, S. Roh.
Exploratory and confirmatory factor analysis of the Korean version of the Penn State worry questionnaire.
[Kline, 2015]
R.B. Kline.
Principles and practice of structural equation modeling.
4th ed., Guilford Press, (2015),
[Lim, Kim, Lee and Kwon, 2008]
Y.J. Lim, Y.H. Kim, E.H. Lee, S.M. Kwon.
The Penn State worry Questionnaire: Psychometric properties of the korean version.
Depression and Anxiety, 25 (2008), pp. E97-E103
[Limone, Toto and Messina, 2022]
P. Limone, G.A. Toto, G. Messina.
Impact of the COVID-19 pandemic and the Russia-Ukraine war on stress and anxiety in students: A systematic review.
Frontiers in Psychiatry, 13 (2022),
[Little, 2013]
T.D. Little.
Longitudinal structural equation modeling.
Guilford Press, (2013),
[Liu, Nijmeh and Warren, 2022]
K. Liu, J.S. Nijmeh, S.L. Warren.
Factor structure, measurement invariance, and concurrent validity of the Penn State Worry Questionnaire across development, psychopathology, and culture.
Assessment, 29 (2022), pp. 909-924
[Lord, 1980]
F.M. Lord.
Applications of item response theory to practical testing problems.
L. Erlbaum Associates, (1980),
[Lord and Novick, 1968]
F.M. Lord, M.R. Novick.
Statistical theories of mental test scores.
Addison-Wesley, (1968),
[Lovibond and Lovibond, 1995]
S.H. Lovibond, P.F. Lovibond.
Manual for the depression anxiety stress scales.
2nd ed., Psychology Foundation, (1995),
[Masters and Wright, 1984]
G.N. Masters, B.D. Wright.
The essential process in a family of measurement models.
Psychometrika, 49 (1984), pp. 529-544
[McEvoy, Watson, Watkins and Nathan, 2013]
P.M. McEvoy, H. Watson, E.R. Watkins, P. Nathan.
The relationship between worry, rumination, and comorbidity: Evidence for repetitive negative thinking as a transdiagnostic construct.
Journal of Affective Disorders, 151 (2013), pp. 313-320
[Meloni and Gana, 2001]
F. Meloni, K. Gana.
Wording effects in the italian version of the Penn State Worry questionnaire.
Clinical Psychology and Psychotherapy, 8 (2001), pp. 282-287
[Meyer, Miller, Metzger and Borkovec, 1990]
T.J. Meyer, M.L. Miller, R.L. Metzger, T.D. Borkovec.
Development and validation of the Penn State worry questionnaire.
Behavior Research and Therapy, 28 (1990), pp. 487-495
[Morani, Pricci and Sanavio, 1999]
S. Morani, D. Pricci, E. Sanavio.
Penn State worry questionnaire e worry domains questionnaire. Presentazione delle versioni italiane ed analisi della fedeltà.
Psicoterapia Cognitiva e Comportamentale, 5 (1999), pp. 13-34
[Novick, 1966]
M.R. Novick.
The axioms and principal results of classical test theory.
Mathematical Psychology, 3 (1966), pp. 1-18
[Padros-Blazquez, Gonzalez-Betanzos, Martinez-Medina and Wagner, 2018]
F. Padros-Blazquez, F. Gonzalez-Betanzos, M.P. Martinez-Medina, F. Wagner.
Psychometric characteristics of the original and brief version of the Penn State Worry Questionnaire (PSWQ) in Mexican samples.
Actas Españolas de Psiquiatría, 46 (2018), pp. 117-124
[Porta-Casteràs et al., 2020]
D. Porta-Casteràs, M.A. Fullana, D. Tinoco, I. Martínez-Zalacaín, J. Pujol, D.J. Palao, N. Cardoner.
Prefrontal-amygdala connectivity in trait anxiety and generalized anxiety disorder: Testing the boundaries between healthy and pathological worries.
Journal of Affective Disorders, 267 (2020), pp. 211-219
[Ruiz, Monroy-Cifuentes and Suárez-Falcón, 2018]
F.J. Ruiz, A. Monroy-Cifuentes, J.C. Suárez-Falcón.
Penn State worry questionnaire-11 validity in Colombia and factorial equivalence across gender and nonclinical and clinical samples.
Anales de Psicología, 34 (2018), pp. 451-457
[Schroder, Clark and Moser, 2019]
H.S. Schroder, D.A. Clark, J.S. Moser.
Screening for problematic worry in adults with a single item from the Penn State worry questionnaire.
Assessment, 26 (2019), pp. 336-346
[Schwartz and Melech, 2000]
S.H. Schwartz, G. Melech.
National differences in micro and macro worry: Social, economic, and cultural explanations.
Culture and subjective well-being, pp. 219-256
[Taylor, 2020]
S. Taylor.
Anxiety disorders, climate change, and the challenges ahead: Introduction to the special issue.
Journal of Anxiety Disorders, 76 (2020),
[Warren, Heller and Miller, 2021]
S.L. Warren, W. Heller, G.A. Miller.
The structure of executive dysfunction in depression and anxiety.
Journal of Affective Disorders, 279 (2021), pp. 208-216
[Wu et al., 2013]
S.M. Wu, T.A. Schuler, M.C. Edwards, H.C. Yang, B.M. Brothers.
Factor analytic and item response theory evaluation of the Penn State worry Questionnaire in women with cancer.
Quality of Life Research, 22 (2013), pp. 1441-1449
[Zhong, Wang, Li and Liu, 2009]
J. Zhong, C. Wang, J. Li, J. Liu.
Penn State worry questionnaire: Structure and psychometric properties of the chinese version.
Journal of Zhejiang University Science B, 10 (2009),
Download PDF
Article options
Tools
Supplemental materials