Ishanu Chattopadhyay, PhD
Assistant Professor of Biomedical Informatics & Computer Science
University of Kentucky
Data inference boundaries & limitations
Alignment validation
Complex phenomena
Adaptation to model obsolence
Precise validation protocols to assess process drift triggering re-calibration/training
Built-in flexibility for changing contexts and non-ergodicity
Scalable to thousands to millions of variables, intrinsic reflexivity
Component LSM predictors enforce statistical significance of splits in recursive partitioning, ensuring precise uncertainty quantification
*Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. "Unbiased recursive partitioning: A conditional inference framework." Journal of Computational and Graphical statistics 15, no. 3 (2006): 651-674.
emergent macro-structure
Component predictor (Conditional Inference Tree*)
Example: Influenza A HA protein
Recursive
LSM
forest
Revealing Emergent Cross-talk
GSS 2018 dataset
1. How will proposer form and maintain a computationally tractable LSM tree structure given, as proposed, hundreds to thousands of observable variables?
\(\checkmark\)
reliten | gunlaw | abany | --- | grass | |
---|---|---|---|---|---|
Person 1 | |||||
Person 2 | |||||
--- | |||||
Person m |
observables
samples
Distributions over alphabet \(\Sigma^i\)
Individual Predictor (CIT)
cross-talk
Tension between predicted and observed distribution drives change
Example
GSS topic: There should be more gun-control
\(\psi^i\)
strongly agree | agree | neutral | disagree | strongly disagree |
\(\phi\) estimates \(\psi\)
Examples: GSS, ANES, WVS, ESS, Eurobarometer, Afrobarometer, Asian Barometer etc
group
individual
estimate is always a non-empty non-degenerate distribution
missing observation
where \(D_{JS}(P\vert \vert Q)\) is the Jensen-Shannon divergence.
This bound connects ``closeness'' of samples to the odds of perturbing from one to the other, bridging geometry to dynamics
(Sanov's Theorem, Pinkser's Inequality)
\(\psi\)
\(\psi'\)
\(\theta\)
"spatial average": average of all plausible worldviews or states
* Sizemore, Nicholas, Kaitlyn Oliphant, Ruolin Zheng, Camilia R. Martin, Erika C. Claud, and Ishanu Chattopadhyay. "A digital twin of the infant microbiome to predict neurodevelopmental deficits." Science Advances 10, no. 15 (2024): eadj0400. https://www.science.org/doi/full/10.1126/sciadv.adj0400
persistence probability
Central to Model Drift Quantification
Start with opinion vector with all entries missing
This is a standard Physics construct, quantifying curvature of the underlying latent geometry
Easily computable in LSM framework!
Apply \(\phi^i\)
Random variable quantifying dispersion around the spatial average of worlviews
const. scaling as \(N^2\)
Sample predicted distributions
perturbed state within \(\epsilon\) of \(\psi\)
Variable | Masked | Reconstructed |
---|---|---|
spkcom | allowed | allowed |
colcom | not fired | not fired |
spkmil | allowed | allowed |
colmil | allowed | not allowed |
libmil | not remove | not remove |
libhomo | not remove | not remove |
reliten | strong | no religion |
pray | once a day | once a day |
bible | inspired word | word of god |
abhlth | yes | yes |
abpoor | no | no |
pillok | agree | agree |
intmil | very interested | very interested |
abpoorw | always wrong | not wrong at all |
godchnge | believe now, always have | believe now, always have |
prayfreq | several times a week | several times a week |
religcon | strong disagree | disagree |
religint | disagree | disagree |
Variable | Masked | Reconstructed |
---|---|---|
spkcom | allowed | allowed |
colcom | not fired | not fired |
libmil | not remove | not remove |
libhomo | not remove | not remove |
gunlaw | favor | favor |
reliten | no religion | no religion |
prayer | approve | approve |
bible | book of fables | inspired word |
abnomore | yes | yes |
abhlth | yes | yes |
abpoor | yes | yes |
abany | yes | yes |
owngun | no | no |
intmil | moderately interested | moderately interested |
abpoorw | not wrong at all | not wrong at all |
godchnge | believe now, didn't used to | believe now, always have |
prayfreq | several times a week | several times a week |
2018 GSS individual samples
Definition
Sample neighborhood to impute missing data
}
2018 GSS out-of-sample reconstruction
post-reconstruction error ratio (%)
LSM sampling: sampling the \(\epsilon\)-neighborhood of a state or worldview allows reconstruction of censored opinions
examples
Predictive ability of LSM quantified as ability to reconstruct censored out-of-sample opinions**
Null state (all missing observations)
Valid perturbations/ simulations
LSM sampling allows simulating opinion perturbations
Both Individuals and groups maybe modeled as digital twins\(\dag\)
2018 GSS
Polar separation over time
2016 Presidential Election Vote Prediction
2004
abany | no | yes |
abdefctw | always wrong | not wrong at all |
abdefect | no | yes |
abhlth | no | yes |
abnomore | no | yes |
abpoor | no | yes |
abpoorw | always wrong | not wrong at all |
abrape | no | yes |
absingle | no | yes |
bible | inspired word | book of fables |
colcom | fired | not fired |
colmil | not fired | not allowed |
comfort | strongly agree | strongly disagree |
conlabor | hardly any | a great deal |
godchnge | believe now, always have | don't believe now, never have |
grass | not legal | legal |
gunlaw | oppose | favor |
intmil | very interested | not at all interested |
libcom | remove | not remove |
libmil | not remove | remove |
maboygrl | true | false |
owngun | yes | no |
pillok | agree | strongly agree |
pilloky | strongly disagree | strongly agree |
polabuse | no | yes |
pray | several times a day | never |
prayer | disapprove | approve |
prayfreq | several times a day | never |
religcon | strongly disagree | strongly agree |
religint | strongly disagree | strongly agree |
reliten | strong | no religion |
rowngun | yes | no |
shotgun | yes | no |
spkcom | not allowed | allowed |
spkmil | allowed | not allowed |
taxrich | about right | much too low |
conservative pole
liberal pole
Clustering LSM distance \(\theta(x,y)\) between out-of-sample individuals
conservative
liberal
poles:
partial states aligning with extreme opposing worldviews
Predict 2016 votes using ideology index
Emergent global structure
Define Lagrangian*
Via the Euler-Lagrange Equations\(^\dag\):
Over-damped Gradient flow Equation*
where \(-g^{km}\) is the inverse metric tensor
kinetic energy
state collapse
strongly agree
agree
neutral
disagree
strongly disagree
strongly agree
agree
neutral
disagree
strongly disagree
\(X_i\)
potential energy
* Einstein notation used
Goldstein, Herbert, et al. Classical Mechanics. 3rd ed., Pearson, 2002.
\(^\dag\)
Principle of stationary action
Local potential field eqn
Stable
(captured by local extrema)
Free to move locally towards extrema
Why propaganda works so well
* “Exposure to opposing views on social media can increase political polarization”
by Christopher A. Bail et al., published in PNAS in September 2018 (Vol. 115, No. 37, pp. 9216–9221; DOI: 10.1073/pnas.1804840115)
GSS 2018 individuals and neighborhoods
Influenza C : strains and their neighborhoods
Even random perturbations will tend to move individuals towards local extrema increasing polarization
*
Observation: This lineage (Mississippi lineage) is now extinct since 2022/23
stable lineage
The LSM tells the latent opinion "space-time" how to curve, the curved "space-time" tells opinions how to change.
Local potential fields can be computed given the LSM and dynamical considerations, which reveal future evolution
The No-cheating Thorem: Generative models cannot cheat on complexity
Kolmogorov Complexity
Optimal Generative Model
compressed data representation
compressed model representation
Theorem
Conservation Law arising from the continuous symmetry of typicality*
Saturation relation:
Data Sufficiency Statistic \(\mu_0\)
We need LSM-sampling to calculate this
*Noether's Theorem
For every continuous symmetry of a physical system, there exists a corresponding conserved quantity
How much more data do we need?
Data saturation
Data deficient
Needed
Current
Empirical Validation
Do new samples (survey respondents) still conform to the model?
GSS Model drift
ergodic projection (all missing values)
A random belief state (with possibly missing entries)
random variable
normal variate
assess if \(\zeta\) is stationary: if not then new samples are not conforming to model
Example for GSS LSM inferred for year 2000
\(\checkmark\) 4. Address whether your approach makes assumptions regarding ergodicity, and if so, how these assumptions affect the model's applicability to non-ergodic systems.
No Convergence
(~50% belief mismatch between pairs)
2018 GSS survey belief vectors simulated via LSM sampling
When applied to Social Modeling and Opinion Dynamics
Belief about topic iii is expected to align with beliefs about other topics \(\displaystyle\psi^{-i}\).
Deviations are exponentially improbable \(\Rightarrow \) people/groups seek internal coherence.
Theory Link:
Cognitive consistency theory – Abelson et al. (1968)
Constraint satisfaction in beliefs – Read & Marcus-Newhall (1993)
Beliefs evolve to minimize tension between actual state and “expected” state.
Reflexive gradient flow — system reduces internal contradiction.
Theory Link:
Cognitive Dissonance Theory – Festinger (1957)
Homeostatic belief adjustment – Gawronski & Strack (2004)
Observing a belief changes it and affects all conditionals.
Direct encoding of feedback loops central to human systems.
Theory Link:
Reflexivity in social systems – Giddens (1984), Soros (1994)
Theory of mind / mutual modeling – Premack & Woodruff (1978)
Validation of Social Theory Questions:
Exploratory: Belief systems react measurably to exogenous events and shocks |
Exploratory: Cross-dependencies between beliefs have observable effects on societal resilience.
Is Polarization an Inevitable Attractor?
Social Identity Theory vs. Belief Proximity
A General Framework for modeling Complex Systems
Genomic database: Missing heritability problem
Personalized Clinical Digital Twin, Virtual Patients
Any structured interview, PTSD fabrication
Assess sysmptom data and co-pathologies
Predict future mutations; which animal strain is closest to jumping to humans
Mental health diagnosis
Microbiome Analysis**
Algorithmic lie detector
Viral emergence
Teomims
Opinion Dynamics
Darkome
Generative model of complex microbial ecosystems, and their impact on health and disease
Data requirements
Limitation | Mitigation / Response |
---|---|
Conventional time series is currently out-of-scope | Focus on cross-sectional interdependencies and belief geometry; time handled via drift |
LSMs model statistical interdependence, not causal mechanisms | Use perturbation-based simulations to infer plausible influence pathways |
Limited by observed belief variables | Integrate multiple surveys; use latent proxies and test sensitivity of digital twins |
Social theory connections and interpretability may be challenging | Anchor dynamics with theory-driven constructs (e.g., ToM, cognitive dissonance) |
LSMs for complex systems
**preliminary study published (https://www.science.org/doi/10.1126/sciadv.adj0400)