How We Score

Enhanced Health AI scores every supplement, food, habit, and exercise using a transparent, step-by-step process aligned with the PRISMA 2020 guidelines for systematic reviews. We automatically pull studies from medical research databases and score them using clear, consistent rules. Here's exactly how it works.

1. Data Collection

Where the evidence comes from

We automatically collect evidence from multiple high-quality sources:

  • PubMed / MEDLINE — We automatically search PubMed (the U.S. National Library of Medicine) for studies on each supplement and topic. Each study gets a unique ID so nothing is counted twice. Note: we cast a wide net — not every study has been read in full by a human.
  • NIH Office of Dietary Supplements (ODS) — Safety info, known drug interactions, warnings for specific groups (like pregnant women or older adults), and official safe upper dose limits.
  • Community Data (Supplementary) — Amazon reviews, Reddit discussions, GNC reviews — scanned for common keywords about real-world results people report. This counts for only 10% of the final score since it's not formal clinical data.
  • Social Media / Expert Channels (Exploratory) — Insights from qualified experts (registered dietitians, PhDs, MDs) used as background context only. Not treated as clinical evidence.

All data is refreshed periodically and versioned. Each claim is linked to its original source with full citations.

Search Strategy (PRISMA Item 7)

We query the NCBI E-utilities API (PubMed/MEDLINE) using structured search templates for each item. Typical queries include:

  • <item> AND (randomized controlled trial[pt] OR meta-analysis[pt]) AND humans[mh]
  • <item> AND (safety OR adverse effects OR drug interactions)
  • <item> AND (cohort study OR observational study) AND humans[mh]

Up to 200 records are retrieved per query type per item (500 unique PMIDs max per item). Results include study summaries fetched via esummary in batches of 200. An optional NCBI API key increases the rate limit from 3 to 10 requests per second. No date restrictions are applied — all indexed records since PubMed inception are eligible.

2. Study Eligibility Criteria

What gets included and what gets excluded (PRISMA Items 5 & 8)

Inclusion Criteria

  • Indexed in PubMed / MEDLINE
  • Published in a peer-reviewed journal
  • Investigate a supplement, food, diet, exercise, or habit with a measurable health outcome
  • Any study design accepted (meta-analysis, RCT, cohort, case-control, case report, animal, in vitro)
  • Any publication date (no date restrictions)
  • Any language indexed by PubMed (English titles required for pattern-matching pass)

Exclusion Criteria

  • Duplicate records (same PMID across queries)
  • Retracted publications (flagged by PubMed)
  • Editorials, letters, or commentaries without original data
  • Conference abstracts without full-text data (when identifiable)
  • Studies with no health-related outcome data
  • Non-PubMed-indexed sources (used as supplementary context only)

Selection is performed algorithmically. All retrieved PubMed records are processed through the classification pipeline. PMID-based deduplication ensures each study is counted exactly once.

3. Study Selection Flow

Adapted PRISMA flow diagram (Item 16a)

The following flow summarizes how studies move through our pipeline from initial database search to final quality scoring:

Identification

~655,000 records identified through PubMed database searches across 472 items (supplements, foods, diets, exercises, habits)

Deduplication

Duplicate records removed via PMID-based matching across overlapping query results

Screening

Records screened by PubType tags and title pattern matching to classify study design (meta-analysis, RCT, cohort, case-control, animal, in vitro, etc.)

Eligibility

Studies assessed for eligibility using automated classification; retracted publications and records without health outcome data excluded

Included

~34,000 studies individually classified, quality-scored, and assigned to 472 evidence-ranked items in the final database

Note: Unlike traditional systematic reviews where reviewers manually screen each record, our pipeline uses automated classification. This enables coverage of hundreds of thousands of records but means individual studies have not been read in full by a human reviewer.

4. Study Classification

Every study is sorted by how strong its research design is

Each of our PubMed studies is sorted by research type using two checks:

Pass 1: PubType Tags

PubMed labels each study with official tags describing its research type. We use these to sort studies into categories like meta-analysis, randomized controlled trial (RCT), observational study, lab study, animal study, and more.

Pass 2: Title Pattern Matching

For studies missing clear tags, we scan their titles for keywords that reveal the research type — words like “randomized”, “double-blind”, “meta-analysis”, “systematic review”, “cohort”, “lab study”, etc.

5. Evidence Strength Levels

Based on the Oxford Centre for Evidence-Based Medicine (OCEBM) — the gold standard for ranking research

Each research type gets rated from Level 1 (strongest) to Level 5 (weakest), which sets its starting quality score:

LevelStudy DesignBase Score
1aSystematic Reviews of RCTs / Meta-Analyses9-10
1bIndividual Randomized Controlled Trials7-8.5
2bCohort Studies / Controlled Trials5-7
3bCase-Control Studies3-5
4Case Reports / Animal Studies / In Vitro1-3
5Expert Opinion / Narrative Reviews1-2

6. Individual Study Quality & Risk of Bias (0-10)

Quality scoring with study-level risk of bias indicators (PRISMA Items 11 & 18)

Within each evidence level, studies earn bonus points for extra quality signals:

  • Double-blind design: Studies where neither patients nor researchers knew who got the real treatment get a +1 bonus
  • Placebo control: Detection of “placebo-controlled” adds +0.5
  • Number of participants: We detect how many people were in the study; larger studies get a +0.5 bonus
  • Duration indicator: Longer studies (≥12 weeks) score higher for chronic health interventions
  • Journal reputation: Studies published in top medical journals (like NEJM, Lancet, JAMA) get a small bonus. Note: a famous journal doesn't automatically make a study better — this is a minor factor in the overall score.

score = starting level + double-blind bonus + placebo bonus + sample-size bonus + study-length bonus + journal bonus

Risk of Bias Assessment

These quality metrics serve as automated proxy indicators for study-level risk of bias — analogous to domains assessed by formal tools such as the Cochrane RoB 2 (for RCTs) and ROBINS-I (for non-randomized studies). Specifically: blinding status addresses performance and detection bias; placebo control addresses expectation bias; sample size addresses imprecision; and study duration addresses adequacy of follow-up. This is a simplified, scalable adaptation — not a full manual risk of bias assessment for each individual study.

7. Per-Supplement Aggregation

How individual scores combine into a quality grade

All studies for a given supplement are aggregated into composite metrics:

Avg

Average Quality — the weighted average of all study scores

D

Depth Bonus — more studies = higher confidence (with diminishing returns)

H%

Human Evidence Ratio — what % of studies were done in real people (vs. animals or lab dishes)

final score = average quality (60%) + depth bonus (25%) + human-study ratio (15%)

Heterogeneity Handling (PRISMA Item 13e)

When studies for the same item yield conflicting quality scores, the weighted average naturally dampens outliers. The depth bonus rewards items with more replicated evidence — requiring consistent findings across multiple studies to achieve top scores. The human-evidence ratio ensures items supported only by animal or in-vitro data do not receive disproportionately high rankings. We do not perform formal statistical heterogeneity tests (I², Cochran Q) because our synthesis produces quality rankings rather than pooled effect estimates.

8–9. Grading Criteria: Evidence Strength & Demonstrated Efficacy

Objective criteria for every grade — aligned with PRISMA 2020 & GRADE (PRISMA Items 15 & 22)

Every item in our database receives both a letter grade (A+ through F) and an evidence tier (Strong through Exploratory). These are not subjective opinions — they are determined by objective, measurable criteria based on two dimensions:

1. Evidence Strength

How rigorous is the research? This evaluates study design quality (e.g., randomized controlled trials vs. observational studies), sample sizes, blinding, risk of bias, and how many independent teams have replicated the findings.

2. Demonstrated Efficacy

Does it actually work? This evaluates the direction and size of measured effects — whether studies consistently show a meaningful benefit, a small benefit, mixed results, or no benefit at all.

In line with the PRISMA 2020 statement and the GRADE framework, our grading system explicitly separates research quality from the size of the observed effect. Both must be strong for a top grade.

A+Gold StandardScore: ≥ 8.5

Evidence Strength Required

  • Multiple systematic reviews or meta-analyses of randomized controlled trials (OCEBM Level 1)
  • Combined sample sizes typically in the thousands
  • Results independently replicated by different research groups
  • Low risk of bias across the majority of included studies (double-blinded, placebo-controlled)

Demonstrated Efficacy Required

  • Consistently positive direction of effect across studies
  • Moderate-to-large effect sizes (clinically meaningful, not just statistically significant)
  • Narrow confidence intervals indicating precise estimates

The strongest evidence possible. Multiple large, high-quality clinical trials all agree this works, and independent teams have confirmed it. You can be highly confident in this recommendation.

AWell-StudiedScore: 7.5 – 8.4

Evidence Strength Required

  • Multiple well-designed RCTs or at least one strong meta-analysis (OCEBM Level 1–2)
  • Adequate sample sizes (typically 100+ participants per study)
  • Mostly double-blinded with active or placebo controls
  • Consistent results across different populations

Demonstrated Efficacy Required

  • Clear positive direction in the majority of trials
  • At least small-to-moderate effect sizes
  • Statistically significant results (p < 0.05) in most studies

Strong evidence from multiple good clinical trials. The research clearly supports a benefit, though not quite at the gold-standard level. High confidence.

B+PromisingScore: 6.5 – 7.4

Evidence Strength Required

  • At least one or two RCTs, possibly supplemented by strong cohort studies (OCEBM Level 2–3)
  • Sample sizes are adequate but not large (50–300 participants)
  • Some studies may have design limitations (e.g., single-blinded, short duration)

Demonstrated Efficacy Required

  • Positive results in most studies, but some inconsistency
  • Effect sizes are at least small and clinically relevant
  • May need more replication in broader populations

Good evidence from controlled trials. Most studies show a benefit, but either the trials are smaller, fewer in number, or tested in limited groups. Moderate-to-good confidence.

BPromisingScore: 5.5 – 6.4

Evidence Strength Required

  • One or two RCTs and/or multiple controlled observational studies (OCEBM Level 2–3)
  • Some design limitations are present (open-label, modest sample sizes)

Demonstrated Efficacy Required

  • Positive trend in results, though some studies show mixed outcomes
  • Effect sizes may be small or variable across studies

Adequate evidence. A few controlled studies support a benefit, but the picture isn't fully clear yet. Moderate confidence — reasonable to try, but more research would help.

C+EmergingScore: 4.5 – 5.4

Evidence Strength Required

  • A few controlled trials or strong observational studies (OCEBM Level 3–4)
  • Studies may be small, short, or have significant design limitations

Demonstrated Efficacy Required

  • Some positive results, but inconsistent across studies
  • Benefits may be limited to specific subgroups or outcomes

Limited but encouraging evidence. A few studies suggest a benefit, but the research needs more confirmation. Worth considering, but don't expect certainty.

CEmergingScore: 3.5 – 4.4

Evidence Strength Required

  • Mostly observational data or very small trials (OCEBM Level 3–4)
  • Higher risk of bias (no blinding, no placebo control, self-reported outcomes)

Demonstrated Efficacy Required

  • Trend toward benefit, but evidence is too thin to be confident
  • Effects may be hard to separate from placebo

Limited evidence. There's a reasonable scientific basis for why it might help, but the studies done so far are too small or too few to draw firm conclusions.

DEarly ResearchScore: 2.0 – 3.4

Evidence Strength Required

  • Mainly small pilot studies, case reports, or animal/in-vitro research (OCEBM Level 4–5)
  • Very few human studies, if any

Demonstrated Efficacy Required

  • Some positive signals in early-stage research
  • No well-powered controlled trial has confirmed a benefit in humans

Preliminary evidence only. Research is in early stages — mostly lab, animal, or very small human studies. The science is plausible but unproven in real-world use.

FLOWScore: < 2.0

Evidence Strength Required

  • Minimal published research — mostly anecdotal, traditional, or in-vitro only (OCEBM Level 5)
  • No controlled human studies available

Demonstrated Efficacy Required

  • No demonstrated efficacy in humans
  • Any positive claims are based on theory, animal data, or historical use

Insufficient evidence. There is no meaningful clinical evidence that this works in people. Listed for completeness, but we cannot recommend it with any confidence.

How this aligns with established standards: Our six-tier framework mirrors the certainty-of-evidence assessment recommended by the GRADE Working Group (High → Very Low) and used in Cochrane systematic reviews. The letter grades (A+ to F) provide an intuitive shorthand: grades A+ and A correspond to GRADE "High" (Gold Standard / Well-Studied); B+/B to "Moderate" (Promising); C+/C to "Low" (Emerging); and D/F to "Very Low" (Early Research / Exploratory). Every tier requires both rigorous study design and a consistent, measurable benefit — top grades are not awarded based on study quality alone if efficacy is weak, nor based on large effects if the studies are poorly designed.

9. Personalization Scoring

How your profile affects recommendations

Evidence grades are universal, but your final 1-10 personalized rating integrates your unique profile:

Goal Alignment

How well the item's studied benefits match your stated health goals. Primary goals weigh 2× more.

Age & Sex Modifiers

Adjusts scores based on age- and sex-specific research (e.g., extra calcium for women after menopause, CoQ10 for adults 65+).

Safety Filtering

Your medications, health conditions, allergies, and pregnancy status automatically remove unsafe items. Our 200+ interaction database flags or blocks risky combinations.

Community Signal (Supplementary)

Real-world feedback from Amazon, Reddit, and GNC reviews provides a small bonus signal (10% of score). This is everyday-user data, not clinical research, and is used only as a tiebreaker.

Commitment Matching

Your commitment level adjusts protocol scope (supplement count, habit intensity) but does not alter evidence quality scores.

Weight-Based Dosing

For weight-sensitive supplements (creatine, protein, vitamin D), doses are calculated per body weight and BMI where published guidelines exist.

rating = evidence tier range × (research quality 60% + community feedback 10% + trending momentum 20% + baseline 10% − safety-concern penalty)

10. AI Plan Generation

How we use AI to create your personalized regimen

After algorithmic scoring, the top-ranked items pass through our AI recommendation framework:

  1. Evidence Profile — Each item gets a detailed profile with its quality grade, how many studies back it (and what kind), how it works, dosage info, and safety notes.
  2. Interaction Check — We cross-check 200+ known drug-supplement and supplement-supplement interactions and flag anything that needs attention.
  3. Smart Timing — 35+ absorption and timing rules help determine when to take each supplement (morning, evening, with food, etc.).
  4. Body-Weight Dosing — For 26+ supplements, doses are calculated based on your body weight so the amount actually fits you.
  5. AI Plan Builder — Our AI pulls everything together into a clear, personalized daily plan with timing, helpful combinations, and safety warnings.

11. Acknowledged Limitations

What this system cannot do

  • Automated study collection — Studies are pulled automatically from PubMed, not individually read by a human the way formal academic reviews are done. Study counts reflect how many records our system found, not how many were deeply analyzed one by one.
  • Rule-based classification — Study design classification relies on MeSH tags and regex pattern matching, not manual assessment. Misclassification is possible, particularly for studies with ambiguous titles or missing metadata.
  • Not a formal clinical review — We rank studies by research type (using the OCEBM hierarchy), but we don't perform the full GRADE assessment used in academic clinical guidelines — that process evaluates potential bias, inconsistency, and precision at a deeper level than our automated system can.
  • Consumer reviews have limits — Amazon, Reddit, and GNC reviews count for only 10%. Online reviews can be skewed by fake reviews, self-selection (people with strong opinions post more), and the lack of scientific controls.
  • Journal name isn't everything — We give a small bonus for top-tier journals, but a well-known journal doesn't guarantee an individual study is high quality (as noted by the Cochrane Handbook).
  • Results may vary by person — Dose recommendations come from published trials that may not represent everyone. Your response can differ based on genetics, existing health conditions, and other medications you take.
  • Not clinical decision-making — This system generates educational health information, not clinical recommendations. All outputs should be reviewed with a qualified healthcare provider before implementation.
  • Evidence currency — Database updates occur periodically but may not capture the most recent publications. Check PubMed for the latest evidence on any topic.
  • Publication bias (PRISMA Item 14) — Like all PubMed-based reviews, our approach may be affected by publication bias: studies with positive or significant results are more likely to be published and indexed than null or negative findings. We do not currently perform funnel plot analysis or Egger's test to quantify this risk, but we partially mitigate it by including all study designs (not only RCTs) and by incorporating community-reported adverse effects as a supplementary signal.
  • No formal sensitivity analysis (PRISMA Item 13f) — We do not currently remove individual studies to test whether a single outlier drives the aggregate score. The weighted-average formula with a depth bonus naturally tempers the influence of any one study, but formal leave-one-out sensitivity analysis is not performed.
  • No pooled effect estimates — Our system produces evidence quality rankings, not pooled statistical effect sizes (e.g., weighted mean difference or odds ratio). This means traditional meta-analytic statistics (forest plots, I² heterogeneity) are outside our current scope.

12. Transparency & Conflict of Interest Disclosure

Our commitments and disclosures

  • We do not accept payment from supplement companies, food brands, or any commercial entity to influence rankings or recommendations.
  • We do not sell affiliate links or receive commissions from supplement purchases.
  • We do not fabricate evidence. Every claim links back to a PubMed PMID or named source.
  • This system does not replace medical advice. It is an informational tool — always consult a qualified healthcare provider before making health decisions.
  • We openly acknowledge evidence gaps. Items with limited research receive low grades and cautionary language.
  • Our scoring formulas are disclosed on this page. We update the database periodically and version all data.
  • The developer (Nicholas Householder, MD) has no financial conflicts of interest related to any supplement, food, or lifestyle product scored by this platform.
  • Protocol access (PRISMA Item 24) — Our systematic review protocol, data ingestion playbook, and step-specific evidence pipeline documentation are maintained in the project repository. This methodology page serves as the primary public-facing protocol document. No prospective registry (e.g., PROSPERO) has been filed because this is a continuously updated technology platform, not a single point-in-time review.
  • Data and code availability (PRISMA Item 27) — The source code, scoring algorithms, data collection scripts, and evidence cache files are maintained in a version-controlled GitHub repository. The scored study database (titles, PMIDs, quality scores, classifications) is available for inspection.
  • Funding (PRISMA Item 25) — This project receives no external funding from any supplement manufacturer, food company, pharmaceutical firm, or health product vendor. Development is self-funded by the developer.

13. PRISMA 2020 Alignment Statement

How this methodology maps to the 27-item PRISMA 2020 checklist

This methodology is designed in alignment with the PRISMA 2020 statement (Page et al., BMJ 2021;372:n71), the international standard for reporting systematic reviews and meta-analyses. As a continuously updated technology platform rather than a single published review, we have adapted the PRISMA framework as follows:

PRISMA DomainStatusWhere Addressed
Objectives (3-4)Section 1 header & intro
Eligibility criteria (5)Section 2
Information sources (6)Section 1
Search strategy (7)Section 1 (search strategy box)
Selection process (8)Sections 2-3
Data collection (9-10)Sections 1, 4
Risk of bias (11, 18)AdaptedSection 6 (proxy indicators)
Synthesis methods (13a-d)Sections 7, 8–9
Heterogeneity (13e, 20c)AdaptedSection 7 (heterogeneity box)
Sensitivity analysis (13f)NotedSection 11 (limitations)
Reporting bias (14, 21)NotedSection 11 (limitations)
Certainty assessment (15, 22)Sections 8–9 (grading criteria)
Study selection flow (16a)Section 3 (PRISMA flow)
Limitations (23b-c)Section 11
Registration/protocol (24)NotedSection 12
Funding/support (25)Section 12
Competing interests (26)Section 12
Data availability (27)Section 12

✓ = fully addressed  |  Adapted = adapted for automated platform  |  Noted = explicitly acknowledged as a limitation. Items 1-2 (title/abstract), 12 (effect measures), 17 (study characteristics), 19-20 (individual study results & synthesis results) are addressed through the searchable evidence database rather than this methodology page.