How We Score

Enhanced Health AI scores every supplement, food, habit, and exercise using a transparent, step-by-step process aligned with the PRISMA 2020 guidelines for systematic reviews. We automatically pull studies from medical research databases and score them using clear, consistent rules. Here's exactly how it works.

1. Data Collection

Where the evidence comes from

We automatically collect evidence from multiple high-quality sources:

PubMed / MEDLINE — We automatically search PubMed (the U.S. National Library of Medicine) for studies on each supplement and topic. Each study gets a unique ID so nothing is counted twice. Note: we cast a wide net — not every study has been read in full by a human.
NIH Office of Dietary Supplements (ODS) — Safety info, known drug interactions, warnings for specific groups (like pregnant women or older adults), and official safe upper dose limits.
Community Data (Supplementary) — Amazon reviews, Reddit discussions, GNC reviews — scanned for common keywords about real-world results people report. This counts for only 10% of the final score since it's not formal clinical data.
Social Media / Expert Channels (Exploratory) — Insights from qualified experts (registered dietitians, PhDs, MDs) used as background context only. Not treated as clinical evidence.

All data is refreshed periodically and versioned. Each claim is linked to its original source with full citations.

Search Strategy (PRISMA Item 7)

We query the NCBI E-utilities API (PubMed/MEDLINE) using structured search templates for each item. Typical queries include:

<item> AND (randomized controlled trial[pt] OR meta-analysis[pt]) AND humans[mh]
<item> AND (safety OR adverse effects OR drug interactions)
<item> AND (cohort study OR observational study) AND humans[mh]

Up to 200 records are retrieved per query type per item (500 unique PMIDs max per item). Results include study summaries fetched via esummary in batches of 200. An optional NCBI API key increases the rate limit from 3 to 10 requests per second. No date restrictions are applied — all indexed records since PubMed inception are eligible.

2. Study Eligibility Criteria

What gets included and what gets excluded (PRISMA Items 5 & 8)

Inclusion Criteria

Indexed in PubMed / MEDLINE
Published in a peer-reviewed journal
Investigate a supplement, food, diet, exercise, or habit with a measurable health outcome
Any study design accepted (meta-analysis, RCT, cohort, case-control, case report, animal, in vitro)
Any publication date (no date restrictions)
Any language indexed by PubMed (English titles required for pattern-matching pass)

Exclusion Criteria

Duplicate records (same PMID across queries)
Retracted publications (flagged by PubMed)
Editorials, letters, or commentaries without original data
Conference abstracts without full-text data (when identifiable)
Studies with no health-related outcome data
Non-PubMed-indexed sources (used as supplementary context only)

Selection is performed algorithmically. All retrieved PubMed records are processed through the classification pipeline. PMID-based deduplication ensures each study is counted exactly once.

3. Study Selection Flow

Adapted PRISMA flow diagram (Item 16a)

The following flow summarizes how studies move through our pipeline from initial database search to final quality scoring:

Identification

~655,000 records identified through PubMed database searches across 472 items (supplements, foods, diets, exercises, habits)

Deduplication

Duplicate records removed via PMID-based matching across overlapping query results

Screening

Records screened by PubType tags and title pattern matching to classify study design (meta-analysis, RCT, cohort, case-control, animal, in vitro, etc.)

Eligibility

Studies assessed for eligibility using automated classification; retracted publications and records without health outcome data excluded

Included

~34,000 studies individually classified, quality-scored, and assigned to 472 evidence-ranked items in the final database

Note: Unlike traditional systematic reviews where reviewers manually screen each record, our pipeline uses automated classification. This enables coverage of hundreds of thousands of records but means individual studies have not been read in full by a human reviewer.

4. Study Classification

Every study is sorted by how strong its research design is

Each of our PubMed studies is sorted by research type using two checks:

Pass 1: PubType Tags

PubMed labels each study with official tags describing its research type. We use these to sort studies into categories like meta-analysis, randomized controlled trial (RCT), observational study, lab study, animal study, and more.

Pass 2: Title Pattern Matching

For studies missing clear tags, we scan their titles for keywords that reveal the research type — words like “randomized”, “double-blind”, “meta-analysis”, “systematic review”, “cohort”, “lab study”, etc.

5. Evidence Strength Levels

Based on the Oxford Centre for Evidence-Based Medicine (OCEBM) — the gold standard for ranking research

Each research type gets rated from Level 1 (strongest) to Level 5 (weakest), which sets its starting quality score:

Level	Study Design	Base Score
1a	Systematic Reviews of RCTs / Meta-Analyses	9-10
1b	Individual Randomized Controlled Trials	7-8.5
2b	Cohort Studies / Controlled Trials	5-7
3b	Case-Control Studies	3-5
4	Case Reports / Animal Studies / In Vitro	1-3
5	Expert Opinion / Narrative Reviews	1-2

6. Individual Study Quality & Risk of Bias (0-10)

Quality scoring with study-level risk of bias indicators (PRISMA Items 11 & 18)

Within each evidence level, studies earn bonus points for extra quality signals:

Double-blind design: Studies where neither patients nor researchers knew who got the real treatment get a +1 bonus
Placebo control: Detection of “placebo-controlled” adds +0.5
Number of participants: We detect how many people were in the study; larger studies get a +0.5 bonus
Duration indicator: Longer studies (≥12 weeks) score higher for chronic health interventions
Journal reputation: Studies published in top medical journals (like NEJM, Lancet, JAMA) get a small bonus. Note: a famous journal doesn't automatically make a study better — this is a minor factor in the overall score.

score = starting level + double-blind bonus + placebo bonus + sample-size bonus + study-length bonus + journal bonus

Risk of Bias Assessment

These quality metrics serve as automated proxy indicators for study-level risk of bias — analogous to domains assessed by formal tools such as the Cochrane RoB 2 (for RCTs) and ROBINS-I (for non-randomized studies). Specifically: blinding status addresses performance and detection bias; placebo control addresses expectation bias; sample size addresses imprecision; and study duration addresses adequacy of follow-up. This is a simplified, scalable adaptation — not a full manual risk of bias assessment for each individual study.

7. Per-Supplement Aggregation

How individual scores combine into a quality grade

All studies for a given supplement are aggregated into composite metrics:

Avg

Average Quality — the weighted average of all study scores

Depth Bonus — more studies = higher confidence (with diminishing returns)

Human Evidence Ratio — what % of studies were done in real people (vs. animals or lab dishes)

final score = average quality (60%) + depth bonus (25%) + human-study ratio (15%)

Heterogeneity Handling (PRISMA Item 13e)

When studies for the same item yield conflicting quality scores, the weighted average naturally dampens outliers. The depth bonus rewards items with more replicated evidence — requiring consistent findings across multiple studies to achieve top scores. The human-evidence ratio ensures items supported only by animal or in-vitro data do not receive disproportionately high rankings. We do not perform formal statistical heterogeneity tests (I², Cochran Q) because our synthesis produces quality rankings rather than pooled effect estimates.

8–9. Grading Criteria: Evidence Strength & Demonstrated Efficacy

Objective criteria for every grade — aligned with PRISMA 2020 & GRADE (PRISMA Items 15 & 22)

Every item in our database receives both a letter grade (A+ through F) and an evidence tier (Strong through Exploratory). These are not subjective opinions — they are determined by objective, measurable criteria based on two dimensions:

1. Evidence Strength

How rigorous is the research? This evaluates study design quality (e.g., randomized controlled trials vs. observational studies), sample sizes, blinding, risk of bias, and how many independent teams have replicated the findings.

2. Demonstrated Efficacy

Does it actually work? This evaluates the direction and size of measured effects — whether studies consistently show a meaningful benefit, a small benefit, mixed results, or no benefit at all.

In line with the PRISMA 2020 statement and the GRADE framework, our grading system explicitly separates research quality from the size of the observed effect. Both must be strong for a top grade.

A+Gold StandardScore: ≥ 8.5

Evidence Strength Required

Multiple systematic reviews or meta-analyses of randomized controlled trials (OCEBM Level 1)
Combined sample sizes typically in the thousands
Results independently replicated by different research groups
Low risk of bias across the majority of included studies (double-blinded, placebo-controlled)