An overview of TED

The following overview and analysis are a living version of the analysis conducted in TED’s introductory paper. They will be updated as new data is included and may thus deviate from the published results. On this page, you can find an overview of included data, a brief meta-analysis on the truth effect within TED, and additional models estimating variability in the truth effect on subject, statement, or experiment level.

In the current version of TED, we included 56 studies from 27 publications, spanning 12002 participants contributing 778741 trials. A complete list of the included publications can be found in the Table “overview of studies included in TED”.

Sample composition ranged from 29 to 949 participants. On average, studies included 218.04 participants (\(\mu_{age} =\) 33.17,\(\sigma_{age} =\) 7.22). An overview of the rating scale usage for truth judgments and the use of a filler task over all included studies can be found in the figure below.

Overview of Study-related variables in TED

On average, studies employed 62.90 (\(SD =\) 39.97) statements per participant in the judgment session and in 88.64 % of procedure settings exactly 50% of statements were repeated. Of 88 judgment phases, 75.00 % were conducted on the same day as the exposure phase. The average delay between exposure and judgment phase if both were conducted on the same day was 3.77 minutes. The average delay between exposure and judgment phase, given the judgment phase was conducted at least one day after the exposure phase, was 7.45 days. An overview of additional variables pertaining to the procedure of the included studies can be found in the Figure below.

Overview of Procedure-related variables in TED

Detailed information on the statements presented is available for 53 out of 56 studies. Data on the accuracy of a statement is available for 359113 (52.41 %) of trials, the exact statement text is available for 306387 (44.72 %) of trials, and response times are available for 111077 (16.21 %) of trials.

Overview of studies included in TED
publication_id study_id procedure_id n_participants student_sample truth_rating_steps repetition_time n_statements
1 1 1 186 NA 2 5 56
2 2 2 138 NA 2 0 36
2 2 3 138 NA 2 10080 36
3 3 4 103 0 6 0 120
3 4 5 99 1 2 0 200
3 5 6 68 1 6 0 200
3 6 7 89 1 6 2880 200
4 7 8 380 0 2 0 40
5 8 9 283 0 6 0 40
5 9 10 271 1 6 0 40
5 10 11 200 0 6 0 40
5 11 12 299 0 6 0 40
5 12 13 291 0 6 0 40
6 13 14 113 1 6 1 40
6 13 15 113 1 6 1 40
6 13 16 113 1 6 1 40
6 13 17 113 1 6 1 40
6 13 18 113 1 6 1 40
6 14 19 430 0 6 1 40
6 14 20 430 0 6 1 40
6 14 21 430 0 6 1 40
6 14 22 430 0 6 1 40
6 14 23 430 0 6 1 40
7 15 24 371 0 11 10080 12
7 16 25 939 0 11 1 12
7 16 26 939 0 11 10080 12
7 17 27 408 0 11 10080 12
8 18 28 503 0 2 0 80
9 19 29 82 1 6 4 120
9 20 30 68 1 6 4 120
10 21 31 507 0 7 0 32
10 21 32 507 0 7 1440 32
10 21 33 507 0 7 10080 32
10 21 34 507 0 7 43200 32
11 22 35 220 1 6 5760 72
11 22 36 220 1 6 5760 72
11 23 37 282 0 6 20 72
11 23 38 282 0 6 20 72
11 23 39 282 0 6 20 72
11 24 40 405 0 6 20 72
11 24 41 405 0 6 20 72
12 25 42 240 0 101 0 16
13 26 44 60 0 6 5 80
14 27 45 526 0 6 0 105
15 28 47 54 1 6 10 88
16 29 48 139 0 5 10 20
17 30 49 267 0 5 10 20
18 32 53 66 1 6 10 88
18 32 54 66 1 6 10 88
19 33 55 65 1 6 10080 88
19 33 56 65 1 6 10080 88
19 33 57 65 1 6 10080 88
19 33 58 65 1 6 10080 88
19 34 59 202 0 6 0 80
19 34 60 202 0 6 0 80
20 35 61 73 0 6 3 56
20 36 62 79 1 2 0 56
20 36 63 79 1 2 20160 56
21 37 64 91 1 6 2 60
21 38 65 64 1 6 0 60
21 39 66 80 1 6 0 54
22 40 69 70 1 2 5 80
22 40 70 70 1 2 5 80
22 41 73 149 1 2 5 120
22 41 74 149 1 2 5 120
22 42 75 98 1 2 0 32
22 42 76 98 1 2 0 32
22 42 77 98 1 2 0 32
22 42 78 98 1 2 0 32
23 43 79 64 1 6 10080 84
23 43 80 64 1 6 10080 84
23 44 81 64 1 6 10080 84
23 45 82 65 1 2 10080 80
24 46 83 89 1 6 5760 72
25 47 84 409 0 6 2 28
25 48 85 949 0 4 1 24
25 48 86 949 0 4 1 24
25 49 87 940 0 4 2 16
25 49 88 940 0 4 2 16
25 49 89 940 0 4 10080 24
25 49 90 940 0 4 10080 24
26 50 91 29 1 2 9 120
26 51 92 41 1 101 9 120
26 52 93 42 1 101 9 120
26 53 94 37 1 101 9 80
27 54 95 132 0 6 0 56
27 55 96 102 0 6 0 48
27 56 97 104 1 6 1 56
a A note goes here.

Meta-Analysis

The following provides an illustrative meta-analysis of effect sizes derived from the TED Truth Effect database. It is based on trial-level data and demonstrates how a meta-analysis could be conducted. This example is not a definitive guide, nor does TED represent a comprehensive or random sample of all studies, since it only includes studies with openly available trial-level data.

Here, we included only studies with a heterogeneous presentation criterion (“between-items criterion”; Dechene et al., 2010). Effect sizes were calculated using Hedges’ g, derived as follows:

  1. For each subject within a study, the repeated and new average responses were calculated.
  2. Hedges’ g was computed per study using the effsize::cohen.d() function with the paired correction.
  3. Variances of the effect sizes were extracted to serve as input for the meta-analysis.
  4. The meta-analysis accounts for multiple entries per publication (as some publications have multiple studies).

Some small deviations from the originally reported effect sizes in the individual studies may exist. But we applied no additional exclusion criteria and tried to exclude (during encoding) all subjects excluded in the original studies.

First, we access the database and retrieve trial-level data:

library(acdcquery)

# Replace with your local path
conn <- connect_to_db("path/to/ted.db")

analysis_data <- query_db(
    conn,
    arguments,
    target_vars = c("default", "study_id", "publication_id", "authors", "conducted"),
    target_table = "observation_table"
  ) %>% 
  filter(phase == "test") %>% 
  filter(!is.na(repeated), !is.na(response)) 

# Here we only use data where the test phase has both 
# repeated and new statements
has_complete_data <- analysis_data %>% 
  count(procedure_id, subject, repeated) %>% 
  count(procedure_id, subject) %>% 
  mutate(
    has_complete_data = ifelse(n == 2, 1, 0)
  )

analysis_data <- analysis_data %>% 
  left_join(
    ., has_complete_data
  ) %>% 
  filter(has_complete_data == 1) 

Then we compute effect sizes per study using cohen.d.

eff_data <- analysis_data %>% 
  left_join(publications_overview) %>% 
  group_by(publication_id, authors, conducted, study_id, repeated, subject) %>% 
  summarize(
    mean_resp = mean(response, na.rm = TRUE)
  ) %>% 
  mutate(repeated = factor(
    ifelse(repeated > 0, "yes", "no"),
    levels = c("yes", "no"))
    ) %>% 
  pivot_wider(names_from = repeated, values_from = mean_resp) %>% 
  group_by(publication_id, authors, conducted, study_id) %>% 
  nest() %>% 
  mutate(effsize = map(
    data, 
    ~effsize::cohen.d(
      .$yes, 
      .$no, 
      data = ., 
      hedges.correction = TRUE,
      paired = TRUE
      )
    )
  ) %>% 
  mutate(
    estimate = map_dbl(effsize, ~{.$estimate}),
    var = map_dbl(effsize, ~{.$var})
  )
Joining with `by = join_by(publication_id)`
`summarise()` has grouped output by 'publication_id', 'authors', 'conducted',
'study_id', 'repeated'. You can override using the `.groups` argument.

Multi-Level Meta-Analysis

To account for non-independence of effect sizes within publications contributing multiple studies, we fitted a three-level meta-analytic model using rma.mv():

Level 1: Sampling variance of individual effect sizes

Level 2: Heterogeneity between studies within the same publication

Level 3: Heterogeneity between publications

eff_data <- eff_data %>% 
  mutate(label = paste0(authors, " (", conducted, ")"))

res_mv <- rma.mv(yi = estimate,
                 V = var,
                 random = ~ 1 | publication_id/study_id,  # random intercept per study
                 slab = label,
                 data = eff_data,
                 method = "REML")
summary(res_mv)

Multivariate Meta-Analysis Model (k = 49; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
-17.2796   34.5593   40.5593   46.1729   41.1047   

Variance Components:

            estim    sqrt  nlvls  fixed                   factor 
sigma^2.1  0.0847  0.2911     26     no           publication_id 
sigma^2.2  0.0498  0.2231     49     no  publication_id/study_id 

Test for Heterogeneity:
Q(df = 48) = 1190.4241, p-val < .0001

Model Results:

estimate      se     zval    pval   ci.lb   ci.ub      
  0.7022  0.0702  10.0035  <.0001  0.5647  0.8398  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results reveal a large effect of repetition (\(d =\) 0.70; 95 % CI = 0.56, 0.84). This is much larger than the effect size found in Dechene et al. (2010) of around \(d = 0.49\) for the between-items criterion. Notably, this analysis is only based on publications with openly available data. Thus, these results are based on a biased smaller sample of k = 26 publications.

Variance Decomposition

We can calculate variance proportions and I² values to quantify the contributions of sampling error, within-publication heterogeneity, and between-publication heterogeneity. This reveals substantial variance both within a publication and between publications, supporting the use of this multi-level approach.

i2 <- var.comp(res_mv)
i2$plot
Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'

Forest Plot

Finally, a forest plot shows the individual study effect sizes, their confidence intervals, and the overall estimate from the multi-level meta-analysis model.

# forest plot for multi-level model
forest(res_mv, 
       slab = paste0(eff_data$authors, " (", eff_data$conducted, ") | Study: ", eff_data$study_id),
       xlab = "Effect size (Hedges' g)",
       refline = 0,
       cex = 0.5,
      ) 

Summary

This workflow demonstrates how the TED Truth Effect database can be used to:

  • Compute effect sizes at the trial and study level

  • Fit multi-level meta-analytic models to account for clustering

  • Explore variance components and heterogeneity

  • Visualize results using a forest plot

It is intended as a tutorial example and not a definitive meta-analysis.

Hierachical Bayesian Model

To illustrate the benefits of our large collection of trial-level data, we fitted Bayesian multilevel models predicting truth judgments, with repetition as a fixed effect and random intercepts and slopes at the subject, statement, and procedure levels.

We grouped data at the level of the procedure_table, as this table contains detailed information about each experimental setup (e.g., proportion of repeated items, presence of warnings, number of sessions) beyond what is available in the broader study_table. Each entry in the study_table corresponds to at least one entry in the procedure_table, but a single study may include several procedures that differ in these settings. For example, the same study may have multiple judgment sessions, modify the percentage of repeated stimuli, or warn some participants about the truth effect. These different procedures will then also receive different procedure identifiers, but the same study identifier.

Thus, the procedure identifier (procedure_id) uniquely captures both the study context and its specific experimental conditions. This modeling approach allows us to estimate the variance in the truth effect at three levels simultaneously: (1) variance due to common experimental manipulations and study settings (procedure level), (2) variance due to individual statements (statement level), and (3) variance due to individual differences (subject level).

We analyzed the dichotomous and Likert-type response formats separately due to differences in their scale characteristics. Dichotomous responses (e.g., true/false) require logistic models, whereas Likert-type responses (e.g., 1–5 ratings) allow for linear models. All responses were maximum-normalized to the range 0-1 with one representing the maximum possible response indicating a “true” judgment. The repetition status was mean-centered to aid model estimation, a new statement was coded -0.5 and a repeated statement 0.5.

We ran all models using 4 chains with 3000 iterations per chain, 1000 of which were discarded as warmup-samples, leading to a total of 8000 posterior samples. There were no divergent transitions, no \(\hat{R} > 1.05\), and visual inspection confirmed that the chains mixed well. We used weakly informative priors for the intercept, fixed effect, and standard deviations for all models.

\[Intercept \sim Normal(0.5, 0.5)\] \[b \sim Normal(0, 1)\] \[\sigma \sim Gamma(1, 4)\]

Dichotomous Truth Judgments

The analysis was based on 112399 trials nested within 1576 subjects, 997 statements, and 14 procedures.

The table below provides a summary of parameter estimates. As expected, the model indicated a significant fixed effect of repetition (\(OR =\) 1.79, \(95\% \ CrI =\) [1.51, 2.12]). Notably, the standard deviation of the random slope of repetition was highest at the subject level (\(\sigma =\) 0.72, \(95\% \ CrI =\) [0.68, 0.77]), followed by the procedure level (\(\sigma =\) 0.28, \(95\% \ CrI =\) [0.18, 0.44]), and the statement level (\(\sigma =\) 0.13, \(95\% \ CrI =\) [0.03, 0.19]).

Variance in the truth effect at different levels
Effect Grouping Parameter Estimate l_95_CrI u_95_CrI
fixed Intercept 0.31 0.17 0.44
fixed repeated 0.58 0.41 0.75
random procedure Intercept (sd) 0.20 0.11 0.32
random procedure repeated (sd) 0.28 0.18 0.44
random statement Intercept (sd) 0.88 0.84 0.93
random statement repeated (sd) 0.13 0.03 0.19
random subject Intercept (sd) 0.69 0.66 0.72
random subject repeated (sd) 0.72 0.68 0.77

Note. N = 112399; N Procedure = 14; N Subjects = 1576; N Statements = 997; l_95_CrI refers to the lower boundary of the 95% credible interval, u_95_CrI refers to the upper boundary

 

Scale Truth Judgments

The analysis was based on 572775 trials nested within 8309 subjects, 2872 statements, and 65 procedures.

The table below provides a summary of parameter estimates. As expected, the model indicated a significant fixed effect of repetition (\(b =\) 0.08, \(95\% \ CrI =\) [0.07, 0.10]). Again, the standard deviation of the random slope of repetition was highest at the subject level (\(\sigma =\) 0.10, \(95\% \ CrI =\) [0.10, 0.10]), followed by the procedure level (\(\sigma =\) 0.07, \(95\% \ CrI =\) [0.05, 0.08]), and the statement level (\(\sigma =\) 0.03, \(95\% \ CrI =\) [0.02, 0.03]).

Variance in the truth effect at different levels
Effect Grouping Parameter Estimate l_95_CrI u_95_CrI
fixed Intercept 0.54 0.52 0.56
fixed repeated 0.08 0.07 0.10
random procedure Intercept (sd) 0.07 0.06 0.09
random procedure repeated (sd) 0.07 0.05 0.08
random statement Intercept (sd) 0.11 0.11 0.12
random statement repeated (sd) 0.03 0.02 0.03
random subject Intercept (sd) 0.10 0.09 0.10
random subject repeated (sd) 0.10 0.10 0.10

Note. N = 587999; N Procedure = 66; N Subjects = 8397; N Statements = 2872; l_95_CrI refers to the lower boundary of the 95% credible interval, u_95_CrI refers to the upper boundary

 

To further explore the influence of temporal delay between the exposure and judgment phases on inter-individual variability in the repetition effect, we included an interaction between subject and temporal delay (same-day vs. different day) in the random effect structure. The model then estimates two standard distributions for the random effect of repetition on the subject level. We can then investigate whether the difference in the standard deviation of the random effect of repetition on the subject-level is different depending on the temporal delay.

The table below provides a summary of parameter estimates. The standard deviation of the random slope of repetition at the subject level for a same-day judgment phase was \(\sigma_0 =\) 0.11 (\(95\% \ CrI =\) [0.10, 0.11]). The standard deviation for the random slope on a later day judgment phase was \(\sigma_1 =\) 0.08 (\(95\% \ CrI =\) [0.08, 0.09]). The difference in standard deviations in the random effect of repetition at the subject level deviated substantially from zero \(\sigma_0 - \sigma_1 =\) 0.02 (\(95\% \ CrI =\) [0.02, 0.02]).

Variance in the truth effect at different levels
Effect Grouping Parameter Estimate l_95_CrI u_95_CrI
fixed Intercept 0.54 0.52 0.56
fixed repeated 0.08 0.07 0.10
random procedure Intercept (sd) 0.07 0.06 0.09
random procedure repeated (sd) 0.07 0.06 0.08
random statement Intercept (sd) 0.11 0.11 0.12
random statement repeated (sd) 0.03 0.02 0.03
random subject (same day) Intercept (sd) 0.10 0.10 0.10
random subject (same day) repeated (sd) 0.11 0.10 0.11
random subject (later) Intercept (sd) 0.08 0.08 0.08
random subject (later) repeated (sd) 0.08 0.08 0.09

Note. N = 587999; N Procedure = 66; N Subjects = 9592; N Statements = 2872; l_95_CrI refers to the lower boundary of the 95% credible interval, u_95_CrI refers to the upper boundary

 

Variance in the truth effect at different levels