Skip to main content

Collaborative Data Science for Healthcare

Access on edX

This course is currently archived on edX. Certificate enrollment is closed.

About This Course

Research has been traditionally viewed as a purely academic undertaking, especially in limited-resource healthcare systems. Clinical trials, the hallmark of medical research, are expensive to perform, and take place primarily in countries which can afford them. Around the world, the blood pressure thresholds for hypertension, or the blood sugar targets for patients with diabetes, are established based on research performed in a handful of countries. There is an implicit assumption that the findings and validity of studies carried out in the US and other Western countries generalize to patients around the world.

This course was created by members of MIT Critical Data, a global consortium that consists of healthcare practitioners, computer scientists, and engineers from academia, industry, and government, that seeks to place data and research at the front and center of healthcare operations.

Big data is proliferating in diverse forms within the healthcare field, not only because of the adoption of electronic health records, but also because of the growing use of wireless technologies for ambulatory monitoring. The world is abuzz with applications of data science in almost every field – commerce, transportation, banking, and more recently, healthcare. These breakthroughs are due to rediscovered algorithms, powerful computers to run them, and most importantly, the availability of bigger and better data to train the algorithms. This course provides an introductory survey of data science tools in healthcare through several hands-on workshops and exercises.

Who this course is aimed at

The most daunting global health issues right now are the result of interconnected crises. In this course, we highlight the importance of a multidisciplinary approach to health data science. It is intended for front-line clinicians and public health practitioners, as well as computer scientists, engineers and social scientists, whose goal is to understand health and disease better using digital data captured in the process of care.

We highly recommend that this course be taken as part of a team consisting of clinicians and computer scientists or engineers. Learners from the healthcare sector are likely to have difficulties with the programming aspect while the computer scientists and engineers will not be familiar with the clinical context of the exercises and workshops.

The MIT Critical Data team would like to acknowledge the contribution of the following members: Aldo Arevalo, Alistair Johnson, Alon Dagan, Amber Nigam, Amelie Mathusek, Andre Silva, Chaitanya Shivade, Christopher Cosgriff, Christina Chen, Daniel Ebner, Daniel Gruhl, Eric Yamga, Grigorich Schleifer, Haroun Chahed, Jesse Raffa, Jonathan Riesner, Joy Tzung-yu Wu, Kimiko Huang, Lawerence Baker, Marta Fernandes, Mathew Samuel, Philipp Klocke, Pragati Jaiswal, Ryan Kindle, Shrey Lakhotia, Tom Pollard, Yueh-Hsun Chuang, Ziyi Hou.


Experience with R, Python and/or SQL is required unless the course is taken with computer scientists in the team.

What you'll learn

  • Principles of data science as applied to health

  • Analysis of electronic health records

  • Artificial intelligence and machine learning in healthcare


Section 1 provides a general perspective about digital health data, their potential and challenges for research and use for retrospective analyses and modeling. Section 2 focuses on the Medical Information Mart for Intensive Care (MIMIC) database, curated by the Laboratory for Computational Physiology at MIT. The learners will have an opportunity to develop their analytical skills while following a research project, from the definition of a clinical question to the assessment of the analysis’ robustness. The last section is a collection of the workshops around the applications of data science in healthcare.


Course Staff

Louis Agha-Mir-Salim

Louis Agha-Mir-Salim

Louis Agha-Mir-Salim is a junior doctor in the Department of Nephrology at Klinikum Kassel, Germany. Being passionate about improving clinicians’ experience with healthcare IT and leveraging data for improving care, he is a former visiting student at the MIT Laboratory for Computational Physiology where he contributed to several research projects on the analysis of electronic health records. Before graduating from medical school at the University of Southampton, Louis completed a BSc in Medical Sciences with Management at Imperial College London while working for two digital health start-ups. Wanting to foster cross-disciplinary collaboration between data scientists and clinicians, he served as faculty member at numerous MIT Critical Data events worldwide.

Leo Anthony Celi

Leo Anthony Celi

As clinical research director and principal research scientist at the MIT Laboratory for Computational Physiology (LCP), and as a practicing intensive care unit (ICU) physician at the Beth Israel Deaconess Medical Center (BIDMC), Leo Anthony Celi brings together clinicians and data scientists to support research using data routinely collected in the process of care. His group built and maintains the publicly-available Medical Information Mart for Intensive Care (MIMIC) database and the Philips-MIT eICU Collaborative Research Database, with more than 15,000 users from around the world. The MIMIC-III paper has been cited more than 1700 times since 2016. In addition, Leo is one of the course directors for HST.936 – global health informatics to improve quality of care, and HST.953 – collaborative data science in medicine, both at MIT. He is an editor of the textbook for each course, both released under an open access license. "Secondary Analysis of Electronic Health Records" has been downloaded more than 500,000 times, and has been translated to Mandarin, Spanish and Korean. Leo has spoken in more than 35 countries across 6 continents about the value of data and learning in health systems. His publications have been cited more than 6000 times since 2015.

Marie-Laure Charpignon

Marie-Laure Charpignon

Marie Charpignon is a PhD student in the Interdisciplinary Doctoral Program in Statistics at MIT Institute for Data, Systems, and Society (IDSS). Her research focuses on causal inference methods for drug repurposing for Alzheimer's Disease using electronic health record data from the US and UK, in collaboration with MGH’s Biomedical Informatics, Harvard Medical School’s Systems Pharmacology Laboratory and Imperial College London. Since taking HST.953 and HST.956, Marie has had the opportunity to work with various health data science projects, honing her skills in data integration and harmonization. These include understanding the drivers of excess deaths from COVID-19, examining the effect of news media in disease outbreaks,and optimizing definitions of ventilator-associated conditions, among others. She is a veteran mentor of numerous health datathons organized by MIT Critical Data, including those held in Sao Paulo, Milan, Seoul, Boston, Singapore, New York and Orlando.

  1. Course Number:

  2. Classes Start:

  3. Classes End:

  4. Estimated Effort:

  5. Length:

    12 weeks
  6. Year Created: