In this review, I outline the strengths and weaknesses of Hoang et al’s clinical trial of levothyroxine (LT4) and desiccated thyroid extract (here called DTE, also known as NDT, natural desiccated thyroid).
- Courage to conduct the first trial of DTE/NDT after decades of scientific neglect due to pharmaceutical prejudice
- Large sample size
- Diversity of patients’ age and type of hypothyroid diagnosis
- Wide range of pharmaceutical doses
- Large number of biomarkers measured and clinical surveys performed
The most important strength is its courage to study a neglected and slandered thyroid pharmaceutical.
When physicians were persuaded to switch their patients’ prescriptions away from desiccated thyroid and toward levothyroxine monotherapy in the 1970s and 1980s, they did not do so on the basis of high-quality randomized clinical trials.
Instead, historical claims about the superiority of LT4 monotherapy were made on the basis of prejudiced trials designed to attack the therapeutic dominance of DTE, such as Jackson and Cobb’s 1978 trial “Why does anyone still use desiccated thyroid USP?”, which engaged in methods biased against desiccated thyroid. The question was valid at a time when pharmaceutical controls were not in place yet, but the solution should have been to regulate its potency and source more carefully, rather than poison science with a pharma war that could only limit treatment options and harm patients who didn’t fare well on LT4 monotherapy.
This pharmaceutical has not only been attacked for its T4:T3 dose ratio, but also for its porcine origin, as if that makes it unreliable or unsafe. Heparin, a blood thinner used in hospitals, is also of porcine origin, and it overcame issues related to its safety.
DTE has also been attacked for its transient rise in Total and Free T3 levels in blood post dose, which people fail to acknowledge is quickly metabolized by D1 and D3 enzymes in tissues beyond the hypothalamus and pituitary and is cleared more quickly during its post-dose peak. Many patients can tolerate T3 dosing peaks without any extra-pituitary signs or symptoms of thyrotoxicosis, and yet fearmongering about fluctuations persists.
Hoang’s team conducted the FIRST double-blind clinical trial that made an effort to remove unfair bias against this hormone preparation.
It is a “Randomized, double blind cross-over study.”
In 2014, the American Thyroid Association (ATA) published updated Guidelines for the Treatment of Hypothyroidism that stated,
It is a strength to be pioneers traveling into new research territory.
At the same time, this statement is also a shocking admission of the ATA’s four decades of medical and scientific neglect. Their prejudice meant they failed to gather valid evidence regarding the supposed “superiority” of LT4 monotherapy over desiccated thyroid.
The strengths of Hoang’s trial included many noted by the ATA Guideline authors.
The sample size of 70 was twice the size needed for one of the outcome measures they set, the TSQ-36 score (thyroid symptom questionnaire), according to the ClinCalc Sample Size Calculator.
The age range was 18 to 65. There were 53 female and 17 male patients, which is acceptable since the vast majority of hypothyroid patients are female.
A strength was the diversity of patients with many etiologies of primary hypothyroidism Exactly 50% were autoimmune hypothyroid patients, the most common cause of hypothyroidism (in larger populations, mainly Hashimoto’s, and a smaller cohort with Atrophic Thyroiditis). Many other studies focus only on one type of patient, such as those with total thyroidectomies, because patient diversity can undermine generalizations.
This study also included a wide range of pharmaceutical doses, although both were limited at the lower and upper end of range due to their chosen TSH target.
- LT4 dose range 75–225 μg/d
- DTE dose range 43–172 mg/d dosed 1x/day
Patients were randomly allocated to try LT4 first or DTE first. They spent 16 weeks on one therapy, and then at “crossover,” spent 16 weeks on the other therapy.
The study included measures of cardiovascular health (heart rate and blood pressure), weight, cholesterol, triglycerides, and SHBG (sex hormone binding globulin.
To measure psychological outcomes, patients took no less than eight (8) clinical surveys.
The conclusion was rather underwhelming, but we should all celebrate that they achieved near equivalency on average:
Although it was based on diverse thyroid patients and was a controlled trial, it had core design flaws.
- Failure to target biomarkers of tissue T3 sufficiency
- Failure to optimize pituitary and thyroid biochemistry
- Failure to account for patients’ thyroid gland health status and autoimmunity
- Question the assumption that we are all “average” thyroid patients.
- Question the assumption of TSH-based euthyroidism.
- Do better comparative clinical trials that identify categories of good, intermediate, and poor response by metabolic indexes (such as hormone ratios) and etiology.
Some of the ATA’s critiques of Hoang’s study that they considered weaknesses (in Jonklaas et al, 2014) were based on fear, opinion, and ignorance in the place of scientific evidence.
For example, the ATA decried Hoang’s failure to measure the brief “excursion” of T3 concentration above reference 3 hours after a larger DTE dose.
Yet the ATA authors admitted that “The clinical consequences of such serum T3 excursions are unknown.” Then why complain about the unknown? Did these T3 excursions harm pregnant women or young children prior to the 1970s when many switched prescriptions to levothyroxine?
The double standard in the ATA’s review is revealed by the fact that the ATA guidelines minimizes fear regarding T3 levels below reference by claiming ignorance:
“the significance of perturbations in serum triiodothyronine [T3] concentrations within the reference range or of mildly low serum triiodothyronine concentrations is unknown.”
So, if T3 is chronically LOW but its consequences are unknown, that’s okay.
But if T3 is transiently HIGH, and its consequences are unknown, that’s time for panic and fear?
There is a lot that is known.
DTE was the only therapy option prior to 1948. DTE’s T3 excursions above reference were known to leading thyroid scientists like Utiger, who dismissed them as inconsequential to health based on his clinical experience with desiccated thyroid therapy. He also observed TSH’s inability to normalize in some patients with high T3 peaks while dosing T3. He acknowledged that TSH did not behave the same way to LT3 dosing and endogenous T3 (produced with in the body).
Pharmaceutical prejudice, once entrenched, ensured LT4 monotherapy was acceptable even for patients yielding chronic low T3, despite the known and mounting evidence of high mortality rates from low T3 levels in critical illness.
- Ataoglu: Low T3 in critical illness is deadly, and adding high T4 is worse.
- Low T3 effects on the cardiovascular system
- Low T3 thyroid hormone, insomnia, and sleep apnea
- Low-normal FT3 increases Alzheimer’s Disease risk
- Review: Chronic fatigue syndrome and Low T3
- Dosing by the TSH can cause Low T3
In contrast to the ATA’s 2014 skewed opinions based on the supposedly unknown, I identify weaknesses in Hoang’s study that are based on known scientific evidence from other published research.
Unlike the ATA’s 2014 authors, Thyroid Patients Canada does not take sides in a foolish pharma battle between DTE and LT4 that will only limit options for patients who might not be able to tolerate one or the other.
1. Failure to target biomarkers of tissue T3 sufficiency
The most significant design flaw was the failure to optimize any other health outcome measures except for a TSH between 0.5 and 3.0 uIU/mL, within the laboratory’s reference range of 0.27-4.20 uIU/mL.
A wiser research group admitted this was a core flaw of their own comparative study, which had measured far more biomarkers than Hoang’s.
Celi et al (2010, 2011) made TSH the only target of their comparative trial of LT4 monotherapy with LT3 monotherapy. In their third and final publication on this clinical trial, they stated that defining euthyroidism by TSH alone was a serious flaw:
What Yavuz, Celi and team mean by “generalized euthyroidism” were all the measures of T3 signaling or “hormonal action” in T3 receptors throughout various organs and tissues. They found a distinct disjoint between TSH and other meaures of T3 sufficiency.
As a result of this flaw, Hoang’s study failed to optimize patients’ health in BOTH the LT4 monotherapy arm and the DTE arm of the study.
Hoang’s study reveals pharmaceutical equivalency on the basis of mere “pituitary euthyroidism,” which had mildly positive outcomes for DTE.
Health outcome biomarkers
Insignificant statistical differences were discovered in mildly reducing the average total cholesterol on DTE, while both arms of treatment had an average almost exceeding the laboratory reference of <200 mg/dL
|DTE: 190.87 ± 34.70||LT4: 195.68 ± 35.19|
The only statistically significant health improvement on DTE was weight loss (-3 lbs. on average).
|172.87 ± 36.37||175.73 ± 37.68|
However, failure to achieve mathematical statistical significance for a group does not preclude clinical significance to individuals.
Where is the data showing the degree of relative change in individuals, which can be different from the absolute change in averages for the group?
Symptoms and quality of life were only mildly improved on DTE, again based on mere statistical means, not individual gains or losses.
|9.78 ± 4.33||10.97 ± 4.89|
|127.81 ± 13.06||125.65 ± 13.27|
2. Failure to optimize pituitary and thyroid biochemistry
A second core flaw was that this trial, while “normalizing” TSH to a target 0.5 to 3.0 range, neither the TSH, nor the Free T4, nor the Total T3 were optimized.
These are two different medications. One has a significantly greater T3 content and lower T4 content than the other.
Changing from LT4 to DTE will reduce TSH to different degrees in the same individual given their different T3 content, because T3, when dosed, has different physiological effects than T3 appearing gradually in blood 24/7.
Changing medications will also yield significantly different T3:T4 ratios in blood in the same individual.
- LT4 mono yields an unnaturally low T3:T4 ratio (higher Free T4, lower Total T3)
- The low ratio is more extreme in poor converters of T4 to T3.
- DTE yields an unnaturally high T3:T4 ratio (lower FT4, higher TT3)
- The high ratio is more extreme in good T4-T3 converters or poor T4 absorbers of oral T4 who have less difficulty absorbing oral T3.
Therefore, dosing to a TSH range will fail to optimize individuals’ response according to their individual response to two different pharmaceuticals.
Even considering the TSH alone, they permitted the TSH to rise in DTE therapy.
|DTE: 1.67 ± 0.77||LT4: 1.30 ± 0.63|
The researchers reported that the TSH range filled the width of their target:
- “0.56 to 3.0 μIU/mL for the DTE period”
- “0.51 to 3.0 μIU/mL for the L-T4 period”
The reference range of their TSH assay was 0.27-4.20.
However, there is no health outcome rationale for targeting a narrower TSH range of 0.5 to 3.0 in a study that compares thyroid therapy modalities.
It was the arbitrary decision of the researchers to narrow the TSH range.
If the aim was to have a higher probability of optimizing therapy by narrowing the target, the rationale does not make sense.
Their upper limit of their target TSH range was too high for optimization, given that 85% of the untreated healthy population has a TSH lower than 2.5 uIU/mL (Hamilton et al, 2008), and patients on LT4 therapy have higher rates of depression with a TSH over 2.5 (Talaei et al, 2017).
Their lower limit of their target TSH range was too high for optimization, given that patients with little to no thyroid tissue often require TSH levels below population reference range to achieve markers of tissue euthyroidism:
- Biomarkers such as heart rate, body weight, bone turnover, etc. (Ito et al, 2017)
- Pre-thyroidectomy or pre-RAI T3 or Free T3 levels (Ito et al, 2012, 2017)
- Disappearance of hypothyroid symptoms without appearance of hyperthyroid symptoms (Larisch et al, 2018; Ito et al, 2019)
Yet unfairly, in Hoang’s study, instead of permitting TSH to be as low as it is in the healthy population, they raised the lower permissible target TSH from 0.27 to 0.5.
The unfair prevention of any TSH levels below 0.5 would have been more harmful to health outcomes in the DTE arm, especially those taking higher doses of DTE.
Dosing with T3 hormone is clinically proven to suppress the TSH more powerfully when both levels are within reference range (Celi et al, 2010, 2011, Yavuz 2013). A higher-normal T3 level is required during T3 monotherapy to achieve the same target TSH.
And why? Because the pituitary and hypothalamus are not equipped with enzymes to transform transiently excess T3 to T2 and other non-T3 metabolites at the same rate that other bodily tissues can. TSH does not speak for the heart, liver or kidney or the D3-enzyme-expressing brain.
Free T4 and Total T3 results
The reported FT4 data appears to have been reported in error.
There is a clear inconsistency between the FT4 by immunoassay on DTE and the FT4 by direct dialysis:
|Test method:||DTE therapy||LT4 therapy|
|FT4 by immunoassay (0.89–1.76 ng/dL)||0.85 ± 0.16||1.36 ± 0.27|
|FT4 by direct dialysis (0.8 –2.7 ng/dL)||1.21 ± 0.35||2.09 ± 0.63|
By forbidding a low-normal TSH between 0.27 and 0.5, this experiment forced the average FT4 on DTE therapy to fall slightly below reference range.
Hoang and team did not explain why direct dialysis was performed as well, and which test is more accurate. However, most clinical trials and most laboratories will use the immunoassay.
Given the low average FT4 by immunoassay, it is questionable whether patients had enough Free T3 to compensate:
|Total T3 (60–181 ng/dL)||DTE: 138.96 ± 47.26||LT4: 89.13 ± 19.48|
Mean *Total* T3 was
- 65% of reference (DTE)
- 24% of reference (LT4).
NOTE that Free T3 was not measured in this study.
This strange choice to measure both Free and Total T4 but to measure only Total T3 was unexplained and is not metabolically justifiable.
The “free hormone hypothesis” supports the use of FT4 rather than Total T4. the same rationale supports measuring the fraction of T3 that is capable of entering cells.
Total T3 cannot reflect the FT3 available given patients’ diverse levels of estrogen, albumin and use of blood thinners.
There is no good reason for avoiding the FT3 test given that FT3 and FT4 tests have been almost equal in quality and precision since they were examined in 2011 by an international committee, and the main issue that remains is manufacturers’ decision to calibrate their tests to the international standard (Thienpont et al, 2011).
Because of their choice to ignore FT3 measurement, we cannot assess FT3:FT4 ratios. If Free T3 and Free T4 had both been measured, patients’ ratios while on LT4 could have been used to divide them into tertiles by this estimate of global T4-T3 conversion efficiency as Midgley et al, 2015 have done.
If they were in doubt about the usefulness or accuracy of FT3, they should have measured both Free and Total T3 to determine which one is more statistically and clinically significant.
Contrast with data from Midgley et al, 2015
3. Failure to account for patients’ thyroid gland health
The next design flaw was to fail to take into account patients’ diversity of thyroid status, choosing instead to average all of the results into a single cohort.
Thyroid gland status has a major impact on a thyroid patients’ dosage and T4-T3 conversion efficiency as estimated by FT3: FT4 ratios while on LT4 monotherapy.
According to Midgley et al, 2015, among patients with the least amount of functional thyroid tissue (total thyroidectomy) are found the poorest T4-T3 converters, and “poor converters” are also to be found in autoimmune thyroid disease.
Among the 70 patients studied by Hoang et al, 2013,
- 50% had “autoimmune” hypothyroidism of varying degrees,
- 20% had “idiopathic” hypo (unknown cause),
- 14.3% Post-RAI,
- 11.43% Post-surgical (could have been subtotal thyroidectomy),
- 4.3% Post-radiation.
As a result of this design flaw, the clinical variable of thyroid disease etiology was not taken into account as variable in optimizing a thyroid patient’s dosage to health outcomes.
This could have easily been done by the authors of the research study.
How should we read and use Hoang’s study outcomes and conversion table? How can we move forward? I have 5 recommendations.
These are the implications of the strengths and flaws in Hoang et al’s 2013 study.
2. Question the assumption that we are all like “average” thyroid patients.
Generalizations based on means (average) are not suited to achieve euthyroidism in patients who have little to no remaining functional thyroid tissue and who therefore differ in their T4-T3 conversion ability:
- Patients after a total thyroidectomy
- Patients with fully atrophied thyroid glands due to atrophic thyroiditis
- Patients with fully fibrosed thyroid glands due to Hashimoto’s thyroiditis
3. Question the assumption of TSH-based euthyroidism.
In this enlightened era, research evidence should trump tradition and opinion.
We have now had many clinical studies using multiple biochemical markers of tissue T3 status that go well beyond TSH. Yet even such rigorous studies seem to be afraid of letting go of opinion as guide and judge.
They still tend to idolize TSH as the only guide to titration and the final judge of euthyroidism.
Only the bravest of evidence-rich thyroid researchers such as Celi, Yavuz, Ito, Hoermann, Larisch, and Midgley have dared to question the validity of TSH—mere pituitary euthyroidism—as a sign of the entire body’s euthyroid status.
These scientists would without question agree that a TSH of 0.5 to 3.0 is not euthyroid for all thyroid patients on LT4 monotherapy, especially patients who lack thyroid tissue and who have thyroid metabolic handicaps.
Larisch’s study admitted that in about 1/3 of the patients, this level of FT3, the mean found in healthy controls, could not be achieved by LT4 dose escalation.
Such patients are often not truly euthyroid until FT3 levels are high enough. However, when this goal is achieved, the higher levels of FT4, the inactive hormone, will lower TSH.
The fall of TSH is a pituitary-specific response. It occurs in many such patients before Free T3 levels achieve true clinical euthyroidism for the individual. The FT3 needed to remove hypothyroid symptoms appears to be around mid-reference range in LT4 monotherapy. (Larisch et al, 2018; Hoermann et al, 2019).
Larisch and team suggest these may be patients who fare better on therapies including T3 hormone.
Larisch recommends that the TSH reference boundary must not be idolized at the expense of health outcomes:
“In any event, a highly individualized approach is required, in view of a considerable biochemical and symptomatic variation in the response to LT4 treatment displayed in this study. Patients and doctors may have to experiment with dosing for optimum symptom control, and cannot simply rely on a TSH target within the reference range of a healthy population.”
(As for low-TSH risk studies, there is still no conclusive evidence that low TSH alone can “cause” osteoporosis or other diseases. Most association studies do not correlate health outcomes with FT3:FT4 ratios or FT3 tertiles.)
In Hoang’s study, TSH was the idol, and it prevented both therapies from achieving more than modest success “on average.”
4. Design better comparative clinical trials.
The purpose of clinical trials should not be to pursue an average level of “superiority,” since all this does is play a petty game of thyroid pharmaceutical prejudice that has closed minds since the 1980s.
Certain pharmaceuticals and combinations are “superior” or less effective for an individual patient. To the individual patient, it is no consolation that the pharmaceutical that fails to resolve their disorder has achieved superiority or was non-inferior to the standard therapy in a clinical trial.
The goal should be to support excellence in thyroid therapy for the sake of individuals’ health, not to discover which therapy is better at achieving target outcomes “on average.” No physician treats averages; they treat individuals.
A focus on averages and central distributions ignores the health and well being of statistical outliers.
Given the diversity of metabolic response to thyroid therapy, identifying several ranges or categories of response should be the goal of analysis.
Trials should employ risk-stratification categories that assess the efficiency of an individual’s metabolism, such as
- FT3:FT4 ratio quintiles
- the SPINA-Thyr TSH-Index, an indicator of healthy pituitary TSH secretion in response to both T3 and T4.
When dividing patients into quantiles, one should not presume a linear risk model, since this is not a trial of a substance foreign to the human body whose concentration is 0 pmol/L in healthy controls. One must discover whether a better fit is an U-shaped curve in which the concentrations surrounding the population mean has the highest likelihood of good health outcomes.
Trials like these are deeply flawed in their methodology and their premises, in ways that the ATA’s own review (Jonklaas et al, 2014) neglects to mention. If medicine truly cares about the euthyroid status of diverse individuals on any and all thyroid therapy modality, fix the mistakes and learn more.
4. Fix the old “US Pharmacopeia” tables using Hoang’s study.
Here is the old version and Hoang’s 2013 recommended update.
Desiccated thyroid product monographs in the United States and Canada still include these tables 7 years after Hoang’s study was published with a correction.
Why? Because they are still in the USP standards.
Someone must submit a revision! Fix them.
How could the USP’s old tables be based on better evidence than the only double-blind clinical trial ever conducted on real thyroid patients using current preparations of desiccated thyroid and brand-name levothyroxine?
The old tables give a ratio that will lead to underdose when transitioning from LT4 to DTE, and overdose when transitioning from DTE to LT4.
A diverse population of hypothyroid patients needs all three pharmaceutical options.
Patients who struggle with LT4 monotherapy and sysnthetic T4-T3 combinations are left with the third safe, valid option that was historically the only option—desiccated thyroid.
5. Make conversion tables with ranges.
Thyroid hormone homeostasis is about ranges, not fixed ratios.
There is no such thing as a single euthyroid TSH level, such as 1.5 mU/L, for all individuals. There is a range.
There is no such thing as a single optimal Free T3 or Free T4 level, such as 50% of reference range, even in untreated, healthy individuals. There is a range.
Therefore, why must we give the impression that there is a single dose of DTE / Desiccated thyroid that is metabolically equivalent to a single dose of LT4 levothyroxine?
We have fallen into a trap of reductivism and oversimplification that will cost time, money and health in the long run.
The older tables had more clinically accurate conversion estimates given as ranges.
Click to reveal older LT4 to DTE scientific tables
- Annotated bibliography of desiccated thyroid therapy articles
- Shomon: Why endocrinologists may oppose desiccated thyroid therapy
- 2014 ATA Therapy guidelines: 3. Desiccated thyroid
- Trials of T3, desiccated thyroid and thyroxine in 1958
About LT4 and LT3 pharmaceutical equivalency (Synthroid and other LT4 brands to Cytomel and other LT3 brands)
In new thyroid science
(Celi et al, 2010, 2011, and Yavuz 2013)
Click to expand reference list
Categories: NDT / Desiccated thyroid