Simpson’s paradox: Hoermann et al explain why T4-T3 therapy trials are faulty

Why have clinical trials of T3-T4 combination therapy failed to show significant benefit?

Hoermann et al explain that the trials and the meta-analyses exhibit a fundamental problem with research methodology and a fundamental misunderstanding of thyroid hormone relationships that results in statistical errors.


Hoermann, R., Midgley, J. E. M., Larisch, R., & Dietrich, J. W. (2018). Lessons from Randomised Clinical Trials for Triiodothyronine Treatment of Hypothyroidism: Have They Achieved Their Objectives? Journal of Thyroid Research, Article ID 3239197.

Simpson’s Paradox

As explained by Wikipedia, “Simpson’s paradox, or the Yule–Simpson effect, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.”

Pace~svwiki 29 August 2017, Creative Commons Share-alike 4.0, from Wikimedia Commons

In the GIF image shown, “This visualization of Simpson’s paradox on data resembling real-world variability indicates that risk of misjudgment of true relationship can indeed be hard to spot.”

An alternate static image shows Simpson’s Paradox, a similar problem with averaging (dotted line) across two disparate groups of data (blue and red):


What does it mean for T3 therapy trials?

Hoermann et al summarize it in their article’s abstract:

“Given the high individuality expressed by thyroid hormones, their interrelationships, and shifted comfort zones, the response to LT4 treatment produces a statistical amalgamation bias (Simpson’s paradox), which has a key influence on interpretation.”

Below, we interpret their more in-depth explanation in the body of the article bit by bit, in quotations A, B, C, and D.

A. Trial design

“As for trial design, amalgamation bias (Simpson’s paradox [39]) arises when including heterogeneous study groups of patients who have different disease aetiologies, settle at different homeostatic equilibria, and display heterogeneous responses to treatment.”

What does this mean?

It means that too many studies group together patients who have different causes of hypothyroidism — such as autoimmune gland destruction, partial thyroidectomy, or total thyroidectomy. Hoermann et al have proven elsewhere that these patient groups differ in their response to L-T4 monotherapy. Each of these groups’ thyroid axis reaches equilibrium at a particular dose and produces a chronic T3 deficit at a different level. The patients with less thyroid gland tissue are generally the poorer converters of T4 into T3 hormone because thyroid tissue not only secretes, but converts T4 into T3 (the TSH-T3 shunt), and so people with less thyroid tissue have a more severe T3 deficiency.

B. Meta-analyses of disparate T3 trials

“Limited data are presently available on intraclass correlations (ICCs) and components of variance although this approach is essential for the interpretation of cluster-based studies [34, 35, 39, 40].”

What does this mean?

It means that when scientists interpret data in a “meta-analysis” that groups together varied T4-T3 clinical trials, they have to consider that each trial is unique and included different clusters of data

  • Each involved different populations
  • Each used different ratios of T4:T3 and
  • Each used a different dose timing per day (1 dose vs. 2 doses per day).

These groups of data vary from each other mathematically in their response to therapy, and averaging them together obscures the important differences. The numbered citations refer to four articles titled

  •  “When averages hide individual differences in clinical trials,”
  •  “Are systematic reviews and meta-analyses still useful research? We are not sure,”
  •  “Simpson’s Paradox in Real Life,” and
  •  “The disaggregation of within-person and between-person effects in longitudinal models of change.”

C. Individual set-points within the thyroid reference ranges

“In addition, thyroid hormones are known for their high degree of individuality, with a low ratio of the intraindividual to interindividual variation of approximately 0.5 (low individuality index) [41]. For a person, their perceived “comfort zone” of response, like the intraindividual reference range, is also narrower than for the entire group, as evidenced by substantial intraclass correlation (0.3-0.5) between FT3 and patient complaints during follow-up of patients with thyroid carcinoma [11].”

What does this mean?

Andersen et al, 2002 discovered that each individual patient’s TSH, T4 and T3 levels (without thyroid medication) differs greatly from the statistical average. The individual’s comfort zone for TSH, for T4, and for T3, fits within a range that is 50% narrower than the statistical reference range for the population at large. Larisch et al, 2018, discovered that optimal Free T3 values fit within an even narrower band in the upper part of reference range, the upper 30-50% of the statistical reference range being where the patients’ hypothyroid symptoms were alleviated.

D. Relationships between TSH, Free T4 and Free T3 in each individual

“Consequently, multiple measures obtained from each subject are nested within that subject. Classical (single‐level) methods cause issues related to the disaggregation of within-person and between-person effects over time, flattening the secondary level, and destroying the interrelationships within it. This is even further exacerbated by the fact that these two levels of influence can operate simultaneously and even in opposite directions [36, 39, 40]. This situation requires a multileveled instead of averaged statistical approach [42].”

What does this mean?

Each individual patient’s TSH, Free T4 and Free T3 values are a 3-part relationship that is unique to them as an individual.

If you separate their TSH from their T4 and T3, you flatten the differences among patients and destroy the 3-part relationships that exist in biology between these three values.

The relationships are even further destroyed when you consider that TSH, Free T4 and Free T3 can move in opposite, unexpected directions within an individual, as is shown in research when Free T4 is high, causing TSH to falls but also, unexpectedly, T3 also falls low, which T3 does not do in patients with healthy thyroid glands who are not on therapy.

Therefore, you need to have a multi-level statistical approach. It does not make biological sense to average all patients’ TSH, and to separate this average from the average Free T4 and the average Free T3.

Our challenge

Read more about our campaign’s challenges to thyroid researchers:  Challenges: Research



2 thoughts on “Simpson’s paradox: Hoermann et al explain why T4-T3 therapy trials are faulty

Leave a public reply here, on our website.

This site uses Akismet to reduce spam. Learn how your comment data is processed.