Ishanu Chattopadhyay PRO
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor
PI: Ishanu Chattopadhyay, PhD
Assistant Professor of Biomedical Informatics & Computer Science
University of Kentucky
DARPA-EA-25-02-05-MAGICS-PA-025
HR0011-26-3-E016
Dmytro Onishchenko
Staff Scientist + PhD Student:
Zhuoqun Li
Postdoctoral Associate:
| Estimated costs | USD |
|---|---|
| Labor cost | 157,227.86 |
| Other direct costs | 9,993.00 |
| Total (direct+indirects for 12 months) | 257,520.12 |
Gantt Chart*
*Milestone definitions in next slide
Dataset Acquisition (10 survey datasets)
LSM inference
LSM predictive ability validation
LSM model drift sense validation
LSM data sufficiency tracking validation
LSM mediated social theory analysis
| 1 | Kickoff Meeting: A briefing on the technical plan for the effort to include milestone schedule and path to accomplish the objectives of the agreement. | Government acceptance / Kickoff meeting briefing slides | Month 1 after award start |
| 2 | Validation plan: Detailed validation plan, including description,acquisition plan, and justification for the ground truth data, and description of the metricsand benchmarks to be used to measure performance. | Government acceptance / Technical report as described. | Month 1 |
| 3 |
Milestone Title: Dataset Acquisition and LSM Inference Technical goal: a) Dataset acquisition (10 social survey datasets acquired: GSS, ANES, CES, Eurobarometer etc) b) Infer LSM models for each dataset using 50% random samples, multiple LSMs trained with different random splits for each dataset. |
Government acceptance / Technical report detailing figure/code/data/etc. and all underlying materials generated in support of milestone, regardless of success | Month 2 |
| 4 |
Milestone Title: Masked sample reconstruction Technical goal: LSM predictive accuracy validation via censored sample reconstruction validation on out of sample data from each dataset, Demonstrate statistically significant reduction of LSM distance post reconstruction relative to post-masking. Target: Reconstruction metric error at least 50% improvement over 1) random imputation 2) median imputation |
Government acceptance / Technical report detailing figure/code/data/etc. and all underlying materials generated in support of milestone, regardless of success | Month 4 |
| 5 |
Milestone Title: Model drift sensing validation Technical goal: Demonstrate that LSM framework can reliably sense when underlying model drifts. Assess if the model drift statistic is stationary from samples drawn from the same survey wave of our datasets, and reliably indicates non-stationary drift for samples from different survey waves. Target: Model drift statistic must have statistical significance at 5% level for survey waves 5 years apart for at least GSS, CES and Eurobarometer Deliverable are detailed documentation on all 10 datasets |
Government acceptance / Technical report detailing figure/code/data/etc. and all underlying materials generated in support of milestone, regardless of success | Month 6 |
| 6 |
Milestone Title: Data sufficiency assessment capability Technical goal: Use the conservation of complexity principle to show that LSM framework can sense data deficiency and sufficiency. |
Government acceptance / Technical report detailing figure/code/data/etc. and all underlying materials generated in support of milestone, regardless of success.Analysis results on all 10 datasets | Month 8 |
| 7 |
Milestone Title: Social Theory and Competing Hypotheses Adjudication Technical goal: a) Social Theory Hypothesis Assessment: Polarization is an inevitable attractor b) Investigate the competing hypotheses that socio-economic identity vs belief proximity and latent opinion space geometry is more predictive of specific opinion / belief outcomes |
Government acceptance / Technical report detailing figure/code/data/etc. and all underlying materials generated in support of milestone, regardless of success | Month 10 |
| 8 |
Final milestone meeting and report (one month prior to award end date): The final briefing and final report should summarize all work completed on the project, highlighting accomplishments, lessons learned, unexpected outcomes, and challenges requiring further Research. Technical artifact delivery (Software release, evaluation results, source code, models, etc.) |
Government acceptance / Technical report as described.For software: Github repository with deployable code complete with example notebooks | Month 11 |
|
Milestone Title / Detailed Description |
|
Exit Criteria /Deliverable |
|
|
Milestone # |
A General Framework for modeling Complex Systems with Psycho-social Application
Survey Datasets (Public or available at nominal cost)
| Survey | Waves / Years | Avg Participants / Wave | Avg Questions / Wave | Participants (approx) | Data Source / Link |
|---|---|---|---|---|---|
| General Social Survey (GSS) | ~33 (1972–2024) | ~3,000 | ~1,500 | ~99,000 | NORC GSS Data Explorer |
| ANES | ~25 (election-year) | ~3,100 | ~1,000 | ~77,500 | ANES Data Portal |
| Cooperative Election Study (CES) | ~18 (2006–2024) | ~50,000 | ~200 | ~900,000 | CES Portal |
| Eurobarometer | ~100 (1973–2024, biannual) | ~30,000 | ~100 | ~3,000,000 | European Commission Archive |
| World Values Survey (WVS) | 7 waves (1981–2020) | ~2,000 / country | ~250 | ~1,120,000 | WVS Website |
| European Social Survey (ESS) | 10 waves (2002–2022) | ~2,500 / country | ~250 | ~750,000 | ESS Website |
| Latinobarómetro | ~25 waves (1995–2024) | ~18,000 | ~110 | ~450,000 | Latinobarómetro Archive |
| Afrobarometer | 6 rounds (1999–2022) | ~1,800 / country | ~120 | ~220,000 | Afrobarometer Archive |
| Arab Barometer | 5 waves (2006–2022) | ~1,800 / country | ~130 | ~135,000 | Arab Barometer Site |
| Asian Barometer | 4 waves (2001–2022) | ~1,500 / country | ~120 | ~108,000 | Asian Barometer Network |
\(\checkmark\)Exploration of Dataset Access Protocols Complete
DatasetAccess modelLicense / use constraints (typical for research use)
| General Social Survey (GSS) | Open public download | Free for research use; citation required; no redistribution of modified datasets |
| American National Election Studies (ANES) | Public-use + restricted-use tiers | Public-use data freely available; restricted-use data requires application and secure handling |
| Cooperative Election Study (CES) | Public download (common content) | Free for academic research; team modules may have additional citation or use constraints |
| Eurobarometer | Registration-based access (GESIS) | Free for non-commercial research; user registration required; citation and compliance with GESIS terms |
| World Values Survey (WVS) | Registration-based download | Free for non-commercial research; attribution required; redistribution restricted |
| European Social Survey (ESS) | Registration-based download | Free for non-commercial research; strict citation and documentation compliance |
| Latinobarómetro | Controlled public access | Use subject to project terms; citation required; redistribution limitations apply |
| Afrobarometer | Public download | Free for research and policy use; attribution required; redistribution limited |
| Arab Barometer | Form-based access | Free for non-commercial research; short request form; citation required |
| Asian Barometer | Application-based access | Explicit permission required; usage and redistribution restrictions apply |
least easy
less easy
easy
World Value Survey (WVS) is global
\(\checkmark\) Overlapping survey datasets
| General Social Survey (GSS) | United States | Repeated cross-sections with a stable core and rotating topical modules | Item nonresponse varies by topic; skip patterns common; structured missingness from module rotation | Long-horizon US belief drift with controlled module churn; strong testbed for latent reconstruction under partial observability |
| American National Election Studies (ANES) | United States | Election-year time series; some panel components depending on study | Complex skip logic; panel attrition where applicable; block-missingness across batteries | Links belief geometry to electoral cycles; supports cross-sectional vs panel consistency checks |
| Cooperative Election Study (CES) | United States | Large-N annual/biannual cross-sections with common content plus team modules | Strong module-induced missingness; very high N offsets sparsity | Stress-tests scalability and conditional belief inference under extreme module sparsity |
| Eurobarometer | Europe (multi-country) | Repeated cross-sections across multiple survey series (Standard/Special/Flash) | Cross-country harmonization issues; wording drift; topic-specific wave gaps | Ideal for cross-national latent-geometry comparisons and robustness to instrument drift |
| World Values Survey (WVS) | Global (multi-country) | Multi-year waves; repeated cross-sections with uneven country participation | Country-wave coverage gaps; partial item overlap; translation effects | Enables global worldview geometry and invariance-aware modeling across cultures |
| European Social Survey (ESS) | Europe (multi-country) | Biennial rounds; repeated cross-sections with rotating modules | High data quality; structured missingness from module rotation; variable country participation | Gold-standard benchmark for calibration, validation, and longitudinal stability |
| Latinobarómetro | Latin America (multi-country) | Annual/near-annual repeated cross-sections | Variable country-year coverage; evolving batteries; skip-pattern sparsity | Tests transferability to non-US/EU contexts and regime-sensitive belief dynamics |
| Afrobarometer | Africa (multi-country) | Multi-year rounds; repeated cross-sections | Uneven round participation; battery variation; structured round-level missingness | Robustness tests under irregular sampling and heterogeneous governance contexts |
| Arab Barometer | Middle East & North Africa | Wave-based repeated cross-sections | Coverage gaps driven by field conditions; variable item sets | Evaluates model stability under volatile sampling and political contexts |
| Asian Barometer | Asia (multi-country) | Wave/round-based repeated cross-sections | Heterogeneous item availability; access-driven release variation | Strong test of cross-cultural generalization and measurement invariance |
Why Relevant to MAGICS and digital twin construction
\(\checkmark\) Diverse observation contexts
A General Framework for modeling Complex Systems
Genomic database: Missing heritability problem
Personalized Clinical Digital Twin, Virtual Patients
Any structured interview, PTSD fabrication
Assess sysmptom data and co-pathologies
Predict future mutations; which animal strain is closest to jumping to humans
Mental health diagnosis
Microbiome Analysis**
Algorithmic lie detector
Viral emergence
Teomims
Opinion Dynamics
Darkome
Generative model of complex microbial ecosystems, and their impact on health and disease
Data requirements
| Limitation | Mitigation / Response |
|---|---|
| Conventional time series is currently out-of-scope | Focus on cross-sectional interdependencies and belief geometry; time handled via drift |
| LSMs model statistical interdependence, not causal mechanisms | Use perturbation-based simulations to infer plausible influence pathways |
| Limited by observed belief variables | Integrate multiple surveys; use latent proxies and test sensitivity of digital twins |
| Social theory connections and interpretability may be challenging | Anchor dynamics with theory-driven constructs (e.g., ToM, cognitive dissonance) |
LSMs for complex systems
**preliminary study published (https://www.science.org/doi/10.1126/sciadv.adj0400)
Text
By Ishanu Chattopadhyay
DARPA-EA-25-02-05-MAGICS-PA-025 University of Kentucky Kickoff
ML | Data Science Biomedical Informatics | Social Science | Assistant Professor