Zero-burden Risk Assessment for Test-free Screening &

Predictive Prognosis of Complex Diseases

Ishanu Chattopadhyay, PhD

Assistant Professor of Internal Medicine

Institute of Biomedical Informatics

University of Kentucky

The Laboratory for Zero Knowledge Discovery

mathematics

computer science

social science

medicine

Complex systems

AI/ML learning theory and applications

Implication of AI in Future of Societay

Dmytro Onishchenko, staff

Nathan Russel, PhD student

Postdoc TBD

Student Researcher TBD

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

Publications

Impact

Nature Medicine

Nature Human Behavior

Nature Commun-ication

Science Advances

(3)

PNAS

JAMA

JAHA

JACC

"test-free" screening?

Autism
Idiopathic Pulmonary Fibrosis
Alzheimer's Disease and related dementia
Suicidality, PTSD
Perioperative Cardiac Event
Aggressive Melanoma
Uterine Cancer
Pancreatic Cancer

non-existent biomarkers

expensive, time-consuming diagnostic tests

Lack of Universal Screening at the point of care

Early diagnosis is difficult, late or missed diagnosis costs lives

We lack Universal Screening

for most diseases

Known Co-morbidities of PF

Are there more? Subtle footprints in the medical history that are more heterogeneous?

Aim 1: Map AP Patient Journeys to Identify Risk Patterns in Acute and Recurrent Episodes.

Acute Pancreatitis

(with UK PRIME)

Aim 2: Model Transitions from AP to Type 3c Diabetes for Early Intervention.

Aim 3: Predict ICU Admission in AP Patients Based on Disease Severity Indicators.

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect

Primary Care

Pulmonologist

Zero-burden Co-morbid Risk Score (ZCoR)

Referral

Prognosis at Point-of-Diagnosis

Optimizing Management

Patient Journey

Continuous Risk Monitoring

Early Diagnosis

Universal Screening

Interstitial Lung Disease / Pulmonary Fibrosis

>50 years old

more men than women

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

ZCoR

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML attempts to model the physician

AI in IPF Research

Co-morbidity patterns
No data demands
Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

prediction

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

University of Chicago Medical Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

Data: Onishchenko etal. Nat. Medicine 2022

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR cut-down screen failure rate ~20%

cohort size: 2000

initial cohort size: 5000

initial cohort size with ZCoR: 2500

Cost per patient for confirmatory tests: ~7k USD

Savings: ~20M USD

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

n=3,294,608
average age: 57 years 2 months

Predicting Acute Pancreatitis

Autism

1 in 59

MCHAT/F

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

ZeD Lab: Predictive Screening from Comorbidity Footprints

CELL Reports

	ZCoR	Competition
Autism	>83%	"obvious"
Alzheimer's Disease	~90%	60-70%
Idiopathic Pulmonary Fibrosis	~90%	NA
MACE	~80%	~70%
Bipolar Disorder	~85%	NA
CKD	~85%	NA
Rare Cancers (Bladder, Uterus)	~75-80%	Low
Suicidality (with CAT-SS)	98% PPV	Low

Odds ratios combined via ML

Data

cases

control

\vdots

odds ratios for all ICD codes

ML Model

odds-based risk estimator

\rho(X) = \zeta\left (\bigcup_i \bigg \{ \mathcal{O}(x_i) \bigg \}\right )

minimize generalization error by constraining model capacity

Combining ~1000 features with constrained capacity models

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]

{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

Cohort Selection and Risk Analysis Testbed

https://paraknowledge.ai/zcor-testbed/

https://paraknowledge.ai/zcor-demo/

Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248220/

Questions.

ishanu_ch@uky.edu

target codes appear

Past medical history

No target codes appear

case

control

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

age > 50 years
at least two IPF target codes identified at least 1 month apart
chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

PFSAs

from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

Off-the-shelf AI does not suffice

How?

Odds ratios combined via ML

Data

cases

control

\vdots

odds ratios for all ICD codes

ML Model

odds-based risk estimator

0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

Longitudinal stochastic patterns