<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@54bcf83ddb33474094164db0b32df1ee" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Introduction</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@fbf29ca3c23648e190316fc28a1ea143">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@fbf29ca3c23648e190316fc28a1ea143" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Introduction</h3>
<p>Pre-processing aims to assess and improve the quality of data to allow for reliable statistical analysis.</p>
<p>The general steps taken to pre-process data are:</p>
<ul>
<li>Data "cleaning" - This step deals with missing data, noise, outliers, and duplicate or incorrect records while minimizing the introduction of bias into the database.</li>
<li>"Data integration" - Extracted raw data can come from heterogeneous sources or be in separate datasets. This step reorganizes the various raw datasets into a single dataset that contains all the information required for the desired statistical analyses.</li>
<li>"Data transformation" - This step translates and/or scales variables stored in a variety of formats or units in the raw data into formats or units that are more useful for the statistical methods that the researcher wants to use.</li>
<li>"Data reduction" - After the dataset has been integrated and transformed, this step removes redundant records and variables, as well as reorganizes the data in an efficient and "tidy" manner for analysis.</li>
</ul>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@40be499b29514755923ea7e1aa4cb9ed">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@40be499b29514755923ea7e1aa4cb9ed" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Learning objectives</h3>
<p></p>
<ul>
<li>Understand the requirements for a “clean” database that is “tidy” and ready for use in statistical analysis.</li>
<li>Understand the steps of cleaning raw data, integrating data, reducing and reshaping data.</li>
<li>Be able to apply basic techniques for dealing with common problems with raw data including missing data, inconsistent data, and data from multiple sources.</li>
</ul>
<p></p>
<p></p>
</div>
</div>
<div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b281eb50b1884430870c9d099a604c54">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b281eb50b1884430870c9d099a604c54" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Credits</h3>
<p><a href="https://www.springer.com/gp/book/9783319437408" target="[object Object]">Book</a> chapter: Brian Malley, Daniele Ramazzotti and Joy Tzung-yu Wu</p>
<p>EdX content: Marta Fernandes, Miguel Armengol and Jesse Raffa</p>
<p>Videos: The first video is in this unit is presented by <span style="color: #313131; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif;">Jesse Raffa, subsequent videos are presented by Marta Fernandes and Miguel Armengol.</span></p>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@ae84f4e5a67844cc939ff9166d1aa76f" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Real World Data</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@773ba1e3e5134c31ba86b0cdc702fc03">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@773ba1e3e5134c31ba86b0cdc702fc03" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>Real-world data is usually "messy" in the sense that they can be incomplete (e.g. missing data), they can be noisy (e.g. random error or outlier values that deviate from the expected baseline), and they can be inconsistent (e.g. patient age and admission service in neonatal intensive care unit).</p>
<p>We present the following video about real-world clinical data and how it differs from an already preprocessed dataset.</p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4">
<div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="video" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Video"}
</script>
<h3 class="hd hd-2">Real-world Data</h3>
<div
id="video_01c92ca25cd2431ab799f816015869c4"
class="video closed"
data-metadata='{"saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/xmodule_handler/save_user_state", "lmsRootURL": "https://openlearninglibrary.mit.edu", "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/publish_completion", "streams": "1.00:kvpB0M6In_o", "duration": 0.0, "recordedYoutubeIsAvailable": true, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/transcript/available_translations", "captionDataDir": null, "ytApiUrl": "https://www.youtube.com/iframe_api", "speed": null, "end": 0.0, "completionPercentage": 0.95, "autoAdvance": false, "transcriptLanguage": "en", "prioritizeHls": false, "autohideHtml5": false, "ytTestTimeout": 1500, "transcriptLanguages": {"en": "English"}, "savedVideoPosition": 0.0, "sources": [], "completionEnabled": false, "saveStateEnabled": false, "generalSpeed": 1.0, "autoplay": false, "poster": null, "showCaptions": "true", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/transcript/translation/__lang__", "ytMetadataEndpoint": "", "start": 0.0}'
data-bumper-metadata='null'
data-autoadvance-enabled="False"
data-poster='null'
tabindex="-1"
>
<div class="focus_grabber first"></div>
<div class="tc-wrapper">
<div class="video-wrapper">
<span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span>
<span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span>
<div class="video-player-pre"></div>
<div class="video-player">
<div id="01c92ca25cd2431ab799f816015869c4"></div>
<h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4>
<h4 class="hd hd-4 video-hls-error is-hidden">
Your browser does not support this video format. Try using a different browser.
</h4>
</div>
<div class="video-player-post"></div>
<div class="closed-captions"></div>
<div class="video-controls is-hidden">
<div>
<div class="vcr"><div class="vidtime">0:00 / 0:00</div></div>
<div class="secondary-controls"></div>
</div>
</div>
</div>
</div>
<div class="focus_grabber last"></div>
<h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_01c92ca25cd2431ab799f816015869c4">Downloads and transcripts</h3>
<div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_01c92ca25cd2431ab799f816015869c4">
<div class="wrapper-download-transcripts">
<h4 class="hd hd-5">Transcripts</h4>
<ul class="list-download-transcripts">
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a>
</li>
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@01c92ca25cd2431ab799f816015869c4/handler/transcript/download" data-value="txt">Download Text (.txt) file</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@77119e65ae3e451c9db61e2b7258a8d4" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Missing Data</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@5a1903f8fc53407585ef2cbfb3569990">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@5a1903f8fc53407585ef2cbfb3569990" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>Missing data can be due to a variety of reasons, which include:</p>
<ul>
<li>Technical issues with biomonitors</li>
<li>Human error in data entry</li>
<li>Some clinical variables not being consistently collected, since Electronic Health Records (EHR) data were collected for non-study purposes</li>
</ul>
<p></p>
<p>Possible ways to deal with missing data:</p>
<p></p>
<ul>
<li>Ignore the record. This method is effective at removing missing data, but it comes with two problems. Firstly it can lead to low sample sizes if a large proportion of the data has missing values. Secondly, if the missing values are correlated with a feature, then removing entries can bias the dataset.</li>
<li>Determine and fill in the missing value manually. This approach is the most accurate but it is also time-consuming and often is not feasible in a large dataset with many missing values.</li>
<li>Use an expected value. The missing values can be filled in with predicted values (e.g. using the mean of the available data or some prediction method). It must be underlined that this approach may introduce bias in the data, as the inserted values may be wrong. This method is also useful for comparing and checking the validity of results obtained by ignoring missing records.</li>
</ul>
<p>In the first video, we present an example of missing data analysis through data visualization. The frequency of data acquisition is also an issue, as we address in the second video, since we may have unevenly sampled time series of data. Usually, heart rate is a vital sign with a higher frequency of acquisition. The acquisition of vital signs may also vary from hospital to hospital, e.g. in the case of the eICU database.</p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c">
<div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="video" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Video"}
</script>
<h3 class="hd hd-2">Missing data analysis</h3>
<div
id="video_badf83add7444834bb4619d50fa1276c"
class="video closed"
data-metadata='{"saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/xmodule_handler/save_user_state", "lmsRootURL": "https://openlearninglibrary.mit.edu", "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/publish_completion", "streams": "1.00:nkwM5Lp9zko", "duration": 0.0, "recordedYoutubeIsAvailable": true, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/transcript/available_translations", "captionDataDir": null, "ytApiUrl": "https://www.youtube.com/iframe_api", "speed": null, "end": 0.0, "completionPercentage": 0.95, "autoAdvance": false, "transcriptLanguage": "en", "prioritizeHls": false, "autohideHtml5": false, "ytTestTimeout": 1500, "transcriptLanguages": {"en": "English"}, "savedVideoPosition": 0.0, "sources": [], "completionEnabled": false, "saveStateEnabled": false, "generalSpeed": 1.0, "autoplay": false, "poster": null, "showCaptions": "true", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/transcript/translation/__lang__", "ytMetadataEndpoint": "", "start": 0.0}'
data-bumper-metadata='null'
data-autoadvance-enabled="False"
data-poster='null'
tabindex="-1"
>
<div class="focus_grabber first"></div>
<div class="tc-wrapper">
<div class="video-wrapper">
<span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span>
<span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span>
<div class="video-player-pre"></div>
<div class="video-player">
<div id="badf83add7444834bb4619d50fa1276c"></div>
<h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4>
<h4 class="hd hd-4 video-hls-error is-hidden">
Your browser does not support this video format. Try using a different browser.
</h4>
</div>
<div class="video-player-post"></div>
<div class="closed-captions"></div>
<div class="video-controls is-hidden">
<div>
<div class="vcr"><div class="vidtime">0:00 / 0:00</div></div>
<div class="secondary-controls"></div>
</div>
</div>
</div>
</div>
<div class="focus_grabber last"></div>
<h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_badf83add7444834bb4619d50fa1276c">Downloads and transcripts</h3>
<div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_badf83add7444834bb4619d50fa1276c">
<div class="wrapper-download-transcripts">
<h4 class="hd hd-5">Transcripts</h4>
<ul class="list-download-transcripts">
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a>
</li>
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@badf83add7444834bb4619d50fa1276c/handler/transcript/download" data-value="txt">Download Text (.txt) file</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@a35bbfdeed6b4af7987044cee492f960" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Frequency of Data Acquisition</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e">
<div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="video" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Video"}
</script>
<h3 class="hd hd-2">Frequency of data acquisition</h3>
<div
id="video_0920ed816d324773ad3b170d12a1376e"
class="video closed"
data-metadata='{"saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/xmodule_handler/save_user_state", "lmsRootURL": "https://openlearninglibrary.mit.edu", "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/publish_completion", "streams": "1.00:UVUD_iGJDkc", "duration": 0.0, "recordedYoutubeIsAvailable": true, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/transcript/available_translations", "captionDataDir": null, "ytApiUrl": "https://www.youtube.com/iframe_api", "speed": null, "end": 0.0, "completionPercentage": 0.95, "autoAdvance": false, "transcriptLanguage": "en", "prioritizeHls": false, "autohideHtml5": false, "ytTestTimeout": 1500, "transcriptLanguages": {"en": "English"}, "savedVideoPosition": 0.0, "sources": [], "completionEnabled": false, "saveStateEnabled": false, "generalSpeed": 1.0, "autoplay": false, "poster": null, "showCaptions": "true", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/transcript/translation/__lang__", "ytMetadataEndpoint": "", "start": 0.0}'
data-bumper-metadata='null'
data-autoadvance-enabled="False"
data-poster='null'
tabindex="-1"
>
<div class="focus_grabber first"></div>
<div class="tc-wrapper">
<div class="video-wrapper">
<span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span>
<span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span>
<div class="video-player-pre"></div>
<div class="video-player">
<div id="0920ed816d324773ad3b170d12a1376e"></div>
<h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4>
<h4 class="hd hd-4 video-hls-error is-hidden">
Your browser does not support this video format. Try using a different browser.
</h4>
</div>
<div class="video-player-post"></div>
<div class="closed-captions"></div>
<div class="video-controls is-hidden">
<div>
<div class="vcr"><div class="vidtime">0:00 / 0:00</div></div>
<div class="secondary-controls"></div>
</div>
</div>
</div>
</div>
<div class="focus_grabber last"></div>
<h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_0920ed816d324773ad3b170d12a1376e">Downloads and transcripts</h3>
<div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_0920ed816d324773ad3b170d12a1376e">
<div class="wrapper-download-transcripts">
<h4 class="hd hd-5">Transcripts</h4>
<ul class="list-download-transcripts">
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a>
</li>
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0920ed816d324773ad3b170d12a1376e/handler/transcript/download" data-value="txt">Download Text (.txt) file</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@a16adc2453124f8097ab824c6064c677">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@a16adc2453124f8097ab824c6064c677" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>The plot in Fig. 1 shows the number of samples for patients in the first 24 hours of ICU admission, for the top sampling times of vital signs.</p>
<p>We observe that the majority of the variables are collected hourly and that temperature is the variable with the lowest number of measurements. Thus, when assessing missing values, an hourly frequency should lead to a lower amount of missing values.</p>
<p></p>
<p>Fig. 1 - Number of samples of patients for the top 10 sampling times in the first 24 hours of ICU admission.</p>
<p><img src="/assets/courseware/v1/244c5c97a94cebec6bb2ba62af34adfb/asset-v1:MITx+HST.953x+3T2020+type@asset+block/newplot_2_.png" width="700" height="450" /></p>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@e25c6a62896640b0ab7a985215b92c1d" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Noisy and Inconsistent Data</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@48ba08a3cd7b4e8a8cd3ef51544ef93f">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@48ba08a3cd7b4e8a8cd3ef51544ef93f" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>Noisy data can be due to:</p>
<ul>
<li>Faults or technological limitations of instruments during data gathering</li>
<li>Human error in data entry<br /> </li>
</ul>
<p>Possible ways to deal with noisy data are to exclude outliers through:</p>
<ul>
<li>Binning methods - Binning methods smooth a sorted data value by considering their "neighborhood", or values around it</li>
<li>Clustering - Outliers may be detected by clustering, that is by grouping a set of values in such a way that the ones in the same group (i.e., in the same cluster) are more similar to each other than to those in other groups</li>
<li>Machine learning - One of the classical methods of machine learning is regression analysis, where data are fitted to a specified (often linear) function</li>
<li>Visualization - Outliers can be assessed directly by plotting the distribution of variables, eg. in a box plot</li>
</ul>
<p></p>
<p>With box-plots, we can graphically depict groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data. Outliers may be plotted as individual points. The ends of the whiskers can represent several possible alternative values, such as the minimum and maximum of all of the data, as illustrated in Fig. 1.</p>
<p>In the boxplot of Figure 1 below we observe that:</p>
<ul>
<li>The maximum and minimum values for heart rate are 300 and 0 bpm, respectively.</li>
</ul>
<ul>
<li>The median heart rate value of the population corresponds to the second quartile q2 and assumes the value 88 bpm.</li>
</ul>
<ul>
<li>The upper and lower quartiles correspond to q3 (104 bpm) and q1 (73 bpm), respectively.</li>
<li>The upper and lower fences correspond to 150 and 27 bpm, respectively.</li>
</ul>
<p>Interquartile range (IQR) is a measure of where the bulk of values lie. Upper (q3 + 1.5*IQR) and lower (q1 - 1.5*IQR) fences cordon off outliers from the bulk of data in a set.</p>
<p></p>
<p>Fig.1 - Distribution of heart rate values within the physiological ranges for the clinical dataset <span class="pl-mh">MIMIC-III demo version</span>.</p>
<p><img src="/assets/courseware/v1/57f902dcda924418a4215f6117c58b67/asset-v1:MITx+HST.953x+3T2020+type@asset+block/HR.jpg" alt="HR_boxplot" width="864" height="464" /></p>
<p></p>
<p>In the following video, we show an example of visualizing data to identify possible noise such as outliers. We can visualize a box-plot of vitals signs for a patient time series of evenly frequently sampled data and understand if there are outliers (usually dots out of the boxes, far apart from the mean value). A patient can have abnormal values (out of the clinical physiological established ranges) which may be possible values, not considered as outliers (which would be values impossible to exist in a human being).</p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a">
<div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="video" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Video"}
</script>
<h3 class="hd hd-2">Noisy Data</h3>
<div
id="video_0084317ed96e4ccc89f917bbe120b67a"
class="video closed"
data-metadata='{"saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/xmodule_handler/save_user_state", "lmsRootURL": "https://openlearninglibrary.mit.edu", "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/publish_completion", "streams": "1.00:wmqy7WFMPM4", "duration": 0.0, "recordedYoutubeIsAvailable": true, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/transcript/available_translations", "captionDataDir": null, "ytApiUrl": "https://www.youtube.com/iframe_api", "speed": null, "end": 0.0, "completionPercentage": 0.95, "autoAdvance": false, "transcriptLanguage": "en", "prioritizeHls": false, "autohideHtml5": false, "ytTestTimeout": 1500, "transcriptLanguages": {"en": "English"}, "savedVideoPosition": 0.0, "sources": [], "completionEnabled": false, "saveStateEnabled": false, "generalSpeed": 1.0, "autoplay": false, "poster": null, "showCaptions": "true", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/transcript/translation/__lang__", "ytMetadataEndpoint": "", "start": 0.0}'
data-bumper-metadata='null'
data-autoadvance-enabled="False"
data-poster='null'
tabindex="-1"
>
<div class="focus_grabber first"></div>
<div class="tc-wrapper">
<div class="video-wrapper">
<span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span>
<span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span>
<div class="video-player-pre"></div>
<div class="video-player">
<div id="0084317ed96e4ccc89f917bbe120b67a"></div>
<h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4>
<h4 class="hd hd-4 video-hls-error is-hidden">
Your browser does not support this video format. Try using a different browser.
</h4>
</div>
<div class="video-player-post"></div>
<div class="closed-captions"></div>
<div class="video-controls is-hidden">
<div>
<div class="vcr"><div class="vidtime">0:00 / 0:00</div></div>
<div class="secondary-controls"></div>
</div>
</div>
</div>
</div>
<div class="focus_grabber last"></div>
<h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_0084317ed96e4ccc89f917bbe120b67a">Downloads and transcripts</h3>
<div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_0084317ed96e4ccc89f917bbe120b67a">
<div class="wrapper-download-transcripts">
<h4 class="hd hd-5">Transcripts</h4>
<ul class="list-download-transcripts">
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a>
</li>
<li class="transcript-option">
<a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@0084317ed96e4ccc89f917bbe120b67a/handler/transcript/download" data-value="txt">Download Text (.txt) file</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@645f038fd81d4bc3bc43039d810c9953">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@645f038fd81d4bc3bc43039d810c9953" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Distribution of Vital Signs</h3>
<p>The plot in Fig. 2 shows the distribution of values for a patient in the first 24 hours of ICU admission.</p>
<p>We can observe that pulse oximetry (SpO2) values range from 60% to 100%. The value of 60% is an abnormal value and indicates that the patient is in a critical condition at that time. For the case of temperature, if we consider the unit of Celsius degrees, the value near 100 might seem an outlier, however, this value corresponds to Fahrenheit degrees, therefore it is within the physiological range.</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p>Fig. 2 - Distribution of a patient's vital signs in the first 24h of ICU admission.</p>
<p><img src="/assets/courseware/v1/600960e3a2760961daacc69b822f6043/asset-v1:MITx+HST.953x+3T2020+type@asset+block/newplot_3_.png" width="700" height="450" /></p>
</div>
</div>
<div class="vert vert-3" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@113f1c690aa34718ad3ea17420602616">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@113f1c690aa34718ad3ea17420602616" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Inconsistent data</h3>
<p>Inconsistent data in EHR can be due to:</p>
<ul>
<ul>
<ul>
<li>variation in how different staff and clinicians enter data - there may be thousands of staff in a single hospital.</li>
<li>multiple automated interfaces with the EHR, everything from telemetry monitors to the hospital laboratory.</li>
</ul>
</ul>
</ul>
<p></p>
<p>Often, correcting for inconsistencies involves some understanding of how the data of interest would have been captured in the clinical setting and where the data would be stored in the EHR database.</p>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@a7c7006d3f7d427da271bb6d9dbefe8c" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Data Integration and Transformation</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@fcdaa276328b40eca98bb05da61ee3e2">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@fcdaa276328b40eca98bb05da61ee3e2" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Data Integration</h3>
<p>Data integration is the process of combining data derived from various data sources (eg. such as databases) into a consistent dataset. There are a number of issues to consider during data integration related mostly to possible different standards among data sources. For example, certain variables can be referred to by means of different IDs in two or more sources.</p>
<p>An Example in the MIMIC database where data integration is performed:</p>
<p>A patient may have laboratory values taken in the ER before they are admitted to the ICU. In order to have a complete dataset, it will be necessary to integrate the patient’s full set of lab values (including those not associated with the same ICU stay identifier) with the record of that ICU admission without repeating or missing records. Using shared values between datasets (such as a hospital stay identifier or a timestamp in this example) can allow for this to be done accurately.</p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b0e0742639774b23a04e6e69666ecb6c">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b0e0742639774b23a04e6e69666ecb6c" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Data Transformation</h3>
<p>The aim of data normalization is to transform the data values into a format, scale, or unit that is more suitable for subsequent statistical analysis.</p>
<p>A few common possible approaches to normalize the data:</p>
<ul>
<li>Normalization - data for a numerical variable is scaled in order to range between a specified set of values, such as 0-1.</li>
<li>Aggregation - two or more values of the same attribute are aggregated into one value.</li>
<li>Generalization - similar to aggregation, in this case, low-level attributes are transformed into higher-level ones. For example, in the analysis of chronic kidney disease (CKD) patients, instead of using a continuous numerical variable like the patient's creatinine levels, one could use a variable for CKD stages as defined by accepted guidelines.</li>
</ul>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@5f63a78255f74a94a5a3e07027f148fc" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Data Reduction</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@6c3634ddd72f480a80cd58daa1f2cdaa">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@6c3634ddd72f480a80cd58daa1f2cdaa" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>The final step of data pre-processing is data reduction, i.e., the process of reducing the input data by means of a more effective representation of the dataset without compromising the integrity of the original data. The objective of this step is to provide a version of the dataset on which the subsequent statistical analysis will be more effective.</p>
<p>An example of data reduction would be in the case of using blood pressure as a variable in analysis. An ICU patient will generally have their systolic and diastolic blood pressure monitored continuously. This results in hundreds of data points for each of possibly thousands of study patients. Depending on the study aims, it may be necessary to calculate a new variable such as average mean arterial pressure (MAP) during the first day of ICU admission.</p>
<p>Lastly, as part of more effective organization of datasets, one would also aim to reshape the columns and rows of a dataset so that it conforms with the following 3 rules of a "tidy" dataset:</p>
<p>1. Each variable forms a column.</p>
<p>2. Each observation forms a row.</p>
<p>3. Each value has its own cell.</p>
<p></p>
<p>Fig. 1 - Dataset that conforms with the 3 rules of a "tidy" dataset. <em>N</em> corresponds to total number of patients and <em>n</em> to total number of variables.</p>
<p><img src="/assets/courseware/v1/1a7d1c96da49440b93f9a2f23f5a8adf/asset-v1:MITx+HST.953x+3T2020+type@asset+block/Table_.jpg" alt="Table" width="722" height="322" /></p>
<p>The example presented in Fig.1 is for the case wherein each row we have the information for a patient. However, we might have several rows per patient, correspondent to different values per variable (eg. time series of heart rate measurements). Aggregation techniques might be applied to time series data in order to have a single value per variable per patient. A study using the time series data can also be interesting, depending on the study objectives and design.</p>
<p></p>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@5314c49bc25c4a2aadf8ccb1cdaae0cd" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Conclusion</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@58a7fd7df8374b1884736a5f00e389ae">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@58a7fd7df8374b1884736a5f00e389ae" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<p>Data pre-processing is an important step in preparing raw data for statistical analysis. Several distinct steps are involved in pre-processing raw data as described in this chapter: cleaning, integration, transformation, and reduction. In the case of EHR data, such as that in the MIMIC database, pre-processing often requires some understanding of the clinical context under which data were entered in order to guide these pre-processing choices. The objective of all the steps is to arrive at a "clean" and "tidy" dataset suitable for effective statistical analyses while avoiding the inadvertent introduction of bias into the data.</p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@261098268dec4c42bdb2070d40de7427">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@261098268dec4c42bdb2070d40de7427" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Key Takeaways</h3>
<p></p>
<p>Data pre-processing:</p>
<ul>
<ul>
<li>Is an essential step in data science.</li>
<li>Takes a significant amount of effort and time.</li>
<li>Consists of transforming the raw data into a more useful format.</li>
<li>Involves the assessment of missing data and outliers.</li>
<li>Involves data integration, transformation, and reduction.</li>
</ul>
</ul>
<p>The more attention is given to this step, the higher the chances of gaining good insights from the data.</p>
<p></p>
<p></p>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@b85cf55e86ed441fa5c223a67e50ec94" data-init="VerticalStudentView" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="vertical" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<h2 class="hd hd-2 unit-title">Data Pre-processing Workshop</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@e3fc31ca99df47cf964b3f46581ce6eb">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@e3fc31ca99df47cf964b3f46581ce6eb" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="html" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="False" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<h3>Introduction</h3>
<p>After reading this chapter and following the exercises in the R markdown file from <a href="https://github.com/criticaldata/hst953-edx/tree/master/2.04.%20Data%20Preprocessing" target="[object Object]">GitHub</a>, you should be able to answer the following questions. Complete the multiple-choice questions below.</p>
<p></p>
<p><strong>Important:</strong><br />Please change the URL within the R markdown "DataPreprocessing.rmd" in line 93 to https://github.com/criticaldata/hst953-edx/blob/master/2.04.%20Data%20Preprocessing/Elixhauser.sql. </p>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="problem" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="True" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_85dc2ae9b7b54c32ae3dd7195652cd88" class="problems-wrapper" role="group"
aria-labelledby="85dc2ae9b7b54c32ae3dd7195652cd88-problem-title"
data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="85dc2ae9b7b54c32ae3dd7195652cd88-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88-problem-progress" tabindex="-1">
Question 1
</h3>
<div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@85dc2ae9b7b54c32ae3dd7195652cd88-problem-progress"></div>
<div class="problem">
<div>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_85dc2ae9b7b54c32ae3dd7195652cd88_2_1">
<fieldset aria-describedby="status_85dc2ae9b7b54c32ae3dd7195652cd88_2_1">
<legend id="85dc2ae9b7b54c32ae3dd7195652cd88_2_1-legend" class="response-fieldset-legend field-group-hd">Suppose you have a clinical dataset with a certain dimension (N patients with n clinical features) where 20% of the patients have missing values. What is the better approach to deal with these missing values?</legend>
<div class="field">
<input type="radio" name="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1" id="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="85dc2ae9b7b54c32ae3dd7195652cd88_2_1-choice_0-label" for="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_85dc2ae9b7b54c32ae3dd7195652cd88_2_1"> Exclude the variables which contain missing values from the dataset.
</label>
</div>
<div class="field">
<input type="radio" name="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1" id="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="85dc2ae9b7b54c32ae3dd7195652cd88_2_1-choice_1-label" for="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_85dc2ae9b7b54c32ae3dd7195652cd88_2_1"> Proceed with data imputation, using e.g. mean value of the population or linear regression.
</label>
</div>
<div class="field">
<input type="radio" name="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1" id="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="85dc2ae9b7b54c32ae3dd7195652cd88_2_1-choice_2-label" for="input_85dc2ae9b7b54c32ae3dd7195652cd88_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_85dc2ae9b7b54c32ae3dd7195652cd88_2_1"> Ignore these patients by excluding them from the dataset.
</label>
</div>
<span id="answer_85dc2ae9b7b54c32ae3dd7195652cd88_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_85dc2ae9b7b54c32ae3dd7195652cd88_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
</div>
<div class="action">
<input type="hidden" name="problem_id" value="Question 1" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_85dc2ae9b7b54c32ae3dd7195652cd88" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_85dc2ae9b7b54c32ae3dd7195652cd88">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="85dc2ae9b7b54c32ae3dd7195652cd88-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="85dc2ae9b7b54c32ae3dd7195652cd88-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="85dc2ae9b7b54c32ae3dd7195652cd88-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="False">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
<div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="problem" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="True" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_5dfe5580aff04955a1a3fa22e820b466" class="problems-wrapper" role="group"
aria-labelledby="5dfe5580aff04955a1a3fa22e820b466-problem-title"
data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="5dfe5580aff04955a1a3fa22e820b466-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466-problem-progress" tabindex="-1">
Question 2
</h3>
<div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@5dfe5580aff04955a1a3fa22e820b466-problem-progress"></div>
<div class="problem">
<div>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_5dfe5580aff04955a1a3fa22e820b466_2_1">
<fieldset aria-describedby="status_5dfe5580aff04955a1a3fa22e820b466_2_1">
<legend id="5dfe5580aff04955a1a3fa22e820b466_2_1-legend" class="response-fieldset-legend field-group-hd">Suppose you have a clinical dataset with a certain dimension (N patients with n features) where 5% of the patients present values outside of the physiological ranges for a particular feature. What is the best approach to deal with these abnormal values?</legend>
<div class="field">
<input type="radio" name="input_5dfe5580aff04955a1a3fa22e820b466_2_1" id="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="5dfe5580aff04955a1a3fa22e820b466_2_1-choice_0-label" for="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_5dfe5580aff04955a1a3fa22e820b466_2_1"> Exclude these patients, since 5% is not a significant proportion of the dataset.
</label>
</div>
<div class="field">
<input type="radio" name="input_5dfe5580aff04955a1a3fa22e820b466_2_1" id="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="5dfe5580aff04955a1a3fa22e820b466_2_1-choice_1-label" for="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_5dfe5580aff04955a1a3fa22e820b466_2_1"> Exclude the feature which contains these abnormal values.
</label>
</div>
<div class="field">
<input type="radio" name="input_5dfe5580aff04955a1a3fa22e820b466_2_1" id="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="5dfe5580aff04955a1a3fa22e820b466_2_1-choice_2-label" for="input_5dfe5580aff04955a1a3fa22e820b466_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_5dfe5580aff04955a1a3fa22e820b466_2_1"> Assess if these values are simply abnormal values or outliers (e.g. an error in sensor measurement). In case these are outliers, e.g. apply a method for fitting the values using a linear regression.
</label>
</div>
<span id="answer_5dfe5580aff04955a1a3fa22e820b466_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_5dfe5580aff04955a1a3fa22e820b466_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
</div>
<div class="action">
<input type="hidden" name="problem_id" value="Question 2" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_5dfe5580aff04955a1a3fa22e820b466" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_5dfe5580aff04955a1a3fa22e820b466">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="5dfe5580aff04955a1a3fa22e820b466-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="5dfe5580aff04955a1a3fa22e820b466-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="5dfe5580aff04955a1a3fa22e820b466-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="False">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
<div class="vert vert-3" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="problem" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="True" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_b826c43b3fe040eabe999a5e24312456" class="problems-wrapper" role="group"
aria-labelledby="b826c43b3fe040eabe999a5e24312456-problem-title"
data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="b826c43b3fe040eabe999a5e24312456-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456-problem-progress" tabindex="-1">
Question 3
</h3>
<div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b826c43b3fe040eabe999a5e24312456-problem-progress"></div>
<div class="problem">
<div>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><p>Select the correct sentence:</p>
<div class="choicegroup capa_inputtype" id="inputtype_b826c43b3fe040eabe999a5e24312456_2_1">
<fieldset aria-describedby="status_b826c43b3fe040eabe999a5e24312456_2_1">
<div class="field">
<input type="radio" name="input_b826c43b3fe040eabe999a5e24312456_2_1" id="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="b826c43b3fe040eabe999a5e24312456_2_1-choice_0-label" for="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_b826c43b3fe040eabe999a5e24312456_2_1"> Data transformation involves the steps of data normalization, data aggregation and data reduction.
</label>
</div>
<div class="field">
<input type="radio" name="input_b826c43b3fe040eabe999a5e24312456_2_1" id="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="b826c43b3fe040eabe999a5e24312456_2_1-choice_1-label" for="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_b826c43b3fe040eabe999a5e24312456_2_1"> Data transformation can involve the steps of data normalization, data aggregation and data generalization.
</label>
</div>
<div class="field">
<input type="radio" name="input_b826c43b3fe040eabe999a5e24312456_2_1" id="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="b826c43b3fe040eabe999a5e24312456_2_1-choice_2-label" for="input_b826c43b3fe040eabe999a5e24312456_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_b826c43b3fe040eabe999a5e24312456_2_1"> Data normalization is a possible approach to use in the data integration step.
</label>
</div>
<span id="answer_b826c43b3fe040eabe999a5e24312456_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_b826c43b3fe040eabe999a5e24312456_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
</div>
<div class="action">
<input type="hidden" name="problem_id" value="Question 3" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_b826c43b3fe040eabe999a5e24312456" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_b826c43b3fe040eabe999a5e24312456">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b826c43b3fe040eabe999a5e24312456-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b826c43b3fe040eabe999a5e24312456-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b826c43b3fe040eabe999a5e24312456-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="False">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
<div class="vert vert-4" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="problem" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="True" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_b5250ed7fcfe4bdfbe6ba8d3d7f359be" class="problems-wrapper" role="group"
aria-labelledby="b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-title"
data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-progress" tabindex="-1">
Question 4
</h3>
<div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-progress"></div>
<div class="problem">
<div>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><p>Suppose you have a clinical dataset wich contains categorical variables and you are in the process of data transformation. Select the correct option:</p>
<div class="choicegroup capa_inputtype" id="inputtype_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1">
<fieldset aria-describedby="status_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1">
<div class="field">
<input type="radio" name="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1" id="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1-choice_0-label" for="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1"> Data normalization can be performed to scale these variables.
</label>
</div>
<div class="field">
<input type="radio" name="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1" id="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1-choice_1-label" for="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1"> Data aggregation can be performed for one or more of these variables to aggregate two or more categories into one.
</label>
</div>
<div class="field">
<input type="radio" name="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1" id="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1-choice_2-label" for="input_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1"> These variables can be converted into numerical variables and normalized within a specific interval e.g. [0,1].
</label>
</div>
<span id="answer_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_b5250ed7fcfe4bdfbe6ba8d3d7f359be_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
</div>
<div class="action">
<input type="hidden" name="problem_id" value="Question 4" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_b5250ed7fcfe4bdfbe6ba8d3d7f359be" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_b5250ed7fcfe4bdfbe6ba8d3d7f359be">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="b5250ed7fcfe4bdfbe6ba8d3d7f359be-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="False">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
<div class="vert vert-5" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea" data-init="XBlockToXModuleShim" data-graded="False" data-request-token="0110582843bc11ef8f100e08775edbcd" data-block-type="problem" data-runtime-version="1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-has-score="True" data-runtime-class="LmsRuntime">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_dd4057c5a1964d4c92d6b1f1dfc0c3ea" class="problems-wrapper" role="group"
aria-labelledby="dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-title"
data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-progress" tabindex="-1">
Question 5
</h3>
<div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-progress"></div>
<div class="problem">
<div>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1">
<fieldset aria-describedby="status_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1">
<legend id="dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1-legend" class="response-fieldset-legend field-group-hd">Having in mind the three rules for a tidy dataset, and considering a patient as an object in the dataset, select the correct option:</legend>
<div class="field">
<input type="radio" name="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1" id="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1-choice_0-label" for="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1"> A patient can correspond to a row, multiple columns and have a single value in each cell.
</label>
</div>
<div class="field">
<input type="radio" name="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1" id="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1-choice_1-label" for="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1"> A patient can correspond to a row, one column and have several values in each cell.
</label>
</div>
<div class="field">
<input type="radio" name="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1" id="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1-choice_2-label" for="input_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1"> A patient can correspond to multiple rows, multiple columns and have several values in each cell.
</label>
</div>
<span id="answer_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_dd4057c5a1964d4c92d6b1f1dfc0c3ea_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
</div>
<div class="action">
<input type="hidden" name="problem_id" value="Question 5" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_dd4057c5a1964d4c92d6b1f1dfc0c3ea" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_dd4057c5a1964d4c92d6b1f1dfc0c3ea">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="dd4057c5a1964d4c92d6b1f1dfc0c3ea-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="False">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
</div>
</div>
© All Rights Reserved