Skip to main navigation Skip to search Skip to main content

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

  • Jan-Niklas Eckardt
  • , Waldemar Hahn
  • , Christoph Röllig
  • , Sebastian Stasik
  • , Uwe Platzbecker
  • , Carsten Müller-Tidow
  • , Hubert Serve
  • , Claudia D Baldus
  • , Christoph Schliemann
  • , Kerstin Schäfer-Eckart (Co-author)
  • , Maher Hanoun
  • , Martin Kaufmann
  • , Andreas Burchert
  • , Christian Thiede
  • , Johannes Schetelig
  • , Martin Sedlmayr
  • , Martin Bornhäuser
  • , Markus Wolfien
  • , Jan Moritz Middeke
  • Technical University of Dresden
  • Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig
  • Masaryk University and University Hospital Brno
  • Department of Neuroradiology, University Hospital Heidelberg, Heidelberg, Germany; PD Dr. Markus Möhlenbruch
  • University Hospital Münster
  • Robert Bosch Hospital
  • Philipps-Universität Marburg

Research output: Contribution to journalOriginal Articlepeer-review

45 Citations (Web of Science)

Abstract

Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence - CTAB-GAN+ and normalizing flows (NFlow) - to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.

Original languageEnglish
Article number76
Pages (from-to)76
Number of pages11
JournalNPJ digital medicine
Volume7
Issue number1
DOIs
Publication statusPublished - 20 Mar 2024

Fingerprint

Dive into the research topics of 'Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence'. Together they form a unique fingerprint.

Cite this