Collaborative Data Science for Healthcare

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@7138d9fa9e824a13a190ca72ffc895cd" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Introduction</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@32acaf370e6d463287f3ec8cb55c5c9a"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@32acaf370e6d463287f3ec8cb55c5c9a" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Introduction</h3> <p>This chapter is the continuation of the Data Analysis Introduction. This chapter will focus exclusively on linear regression, one of the most widely used modeling techniques in health data analysis.</p> <p></p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b7d73103bf834bad9422d0bb516caa32"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b7d73103bf834bad9422d0bb516caa32" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Learning objectives</h3> <ol style="margin-top: 0pt; margin-bottom: 0pt;"> <li>Identifying data types and how to define study objectives for choosing an appropriate analysis technique.</li> <li>Be able to carry out one of most common and simplest data analysis methods for health data.</li> <li>Present and interpret the results from linear regression models.</li> </ol> <h3> </h3> <h3>Pre-requisites</h3> <p>We prepared interactive material for letting you run the exercises as you read the sections. However, some pre-requisites are required.</p> <p>Basic programming knowledge is required for the exercises in this section. The exercises for this section were developed in R. Therefore, it is important to have the software installed.</p> <p>It is recommended to install RStudio. If you don't have RStudio installed on your computer, please <a href="https://www.rstudio.com/products/rstudio/download/https://www.rstudio.com/products/rstudio/download/">download it</a> and follow the instructions for correct installation.</p> <p>The main purpose of this course is not to teach you to program or to code, which is why the material prepared was designed in a way such that you don't need to write code. However, you will have to face it and learn to program if you intend to work in health data analysis or data science at a higher level. If you want to learn more about basic programming, please refer to other courses available through the EdX platform.</p> <p></p> </div> </div> <div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b377c1ab69d048069cff3c7a4816e4bb"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@b377c1ab69d048069cff3c7a4816e4bb" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Credits</h3> <p>Aldo Arévalo, Marie Charpignon, Mathew Samuel, Philips Samuel.</p> <p>Textbook chapter: Jesse D. Raffa, Marzyeh Ghassemi, Tristan Naumann, Mengling Feng and Douglas Hsu</p> <p>Dataset used in the hands on example: Hsu DJ, Feng M, Kothari R, Zhou H, Chen KP, Celi LA. <em><a href="https://www.ncbi.nlm.nih.gov/pubmed/26270005">The association between indwelling arterial catheters and mortality in hemodynamically stable patients with respiratory failure: A propensity score analysis.</a></em> Chest, 148(6):1470–1476, Aug. 2015.</p> <p>Videos: The first video in this unit is presented by Ned McCague, subsequent videos are presented by Jesse Raffa</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@90af421d030f4c90b2f9726930cf9c60" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Defining Study Objectives</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@144835d498f24b5891da15a58cea4a24"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@144835d498f24b5891da15a58cea4a24" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Defining Study Objectives</h3> <p>Identifying a study objective is an important aspect of planning data analysis for health data. The study objective should clearly identify the study population, the outcome of interest, the covariate(s) of interest, the relevant time points for the study, and what you would like to do with these items.</p> <p>An example of a clearly stated study objective would be:</p> <p>"To estimate the reduction in 28-day mortality associated with vasopressor use during the first three days from admission to the ICU in MIMIC-III".</p> <p>An example of a vague and difficult to execute study objective may be:</p> <p>"To predict mortality in ICU patients”</p> <p>The former provides a much clearer path for the data scientist to perform the necessary analysis because it identifies the target population (patients admitted to the ICU in MIMIC III), outcome (28 day mortality), covariate of interest (vasopressor used in the first three days of an ICU admission) and the time span (28 days for the outcome, within the first three days for the covariate). Remember, the less complex, the better.</p> <p>To reinforce what has been discussed in previous units, it is important to first differentiate between <strong>outcomes</strong> and <strong>covariates</strong>. <strong>Outcomes,</strong> also referred to as response or dependent variables, are what the study aims to investigate or predict. In the previous example, the outcome is 28-day mortality. <strong>Covariates</strong> are the variables whose effect on the outcome you would like to study or that you believe may have some effect on the target outcome.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@a6450e5ab9c0418192070fb2876b742c" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Defining Study Objectives: Exercise</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6"> <div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-runtime-class="LmsRuntime" data-block-type="problem" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="True" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Problem"} </script> <div id="problem_f56805ab9e5c4613b019c7a95bdfabb6" class="problems-wrapper" role="group" aria-labelledby="f56805ab9e5c4613b019c7a95bdfabb6-problem-title" data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6/handler/xmodule_handler" data-problem-score="0" data-problem-total-possible="1" data-attempts-used="0" data-content=" <h3 class="hd hd-3 problem-header" id="f56805ab9e5c4613b019c7a95bdfabb6-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6-problem-progress" tabindex="-1"> Exercise </h3> <div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@f56805ab9e5c4613b019c7a95bdfabb6-problem-progress"></div> <div class="problem"> <div> <description>Among the sentences given below, select the most concise study objective(s). There can be more than one correct answer. </description> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> <fieldset aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> <div class="field"> <input type="checkbox" name="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1[]" id="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_0" class="field-input input-checkbox" value="choice_0"/><label id="f56805ab9e5c4613b019c7a95bdfabb6_2_1-choice_0-label" for="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> Identification and segmentation of patients admitted to the ICU in MIMIC-III. </label> </div> <div class="field"> <input type="checkbox" name="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1[]" id="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_1" class="field-input input-checkbox" value="choice_1"/><label id="f56805ab9e5c4613b019c7a95bdfabb6_2_1-choice_1-label" for="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> Determination of ongoing gastro-intestinal hemorrhage for adult patients (&gt;18 years old) during the first 24 hours of being admitted to the ICU in MIMIC-III. </label> </div> <div class="field"> <input type="checkbox" name="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1[]" id="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_2" class="field-input input-checkbox" value="choice_2"/><label id="f56805ab9e5c4613b019c7a95bdfabb6_2_1-choice_2-label" for="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> Prediction of complications in critically-ill patients. </label> </div> <div class="field"> <input type="checkbox" name="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1[]" id="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_3" class="field-input input-checkbox" value="choice_3"/><label id="f56805ab9e5c4613b019c7a95bdfabb6_2_1-choice_3-label" for="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> Determining which patients are at risk of developing sepsis or septic shock. </label> </div> <div class="field"> <input type="checkbox" name="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1[]" id="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_4" class="field-input input-checkbox" value="choice_4"/><label id="f56805ab9e5c4613b019c7a95bdfabb6_2_1-choice_4-label" for="input_f56805ab9e5c4613b019c7a95bdfabb6_2_1_choice_4" class="response-label field-label label-inline" aria-describedby="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1"> To estimate the absolute weight loss after 1 year and 2 years on adult patients that had sleeve gastrectomy in the BOLD database. </label> </div> <span id="answer_f56805ab9e5c4613b019c7a95bdfabb6_2_1"/> </fieldset> <div class="indicator-container"> <span class="status unanswered" id="status_f56805ab9e5c4613b019c7a95bdfabb6_2_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> </div></div> </div> <div class="action"> <input type="hidden" name="problem_id" value="Exercise" /> <div class="problem-hint"> <div class="notification problem-hint notification-hint is-hidden" tabindex="-1"> <span class="icon fa fa-question" aria-hidden="true"></span> <span class="notification-message" aria-describedby="f56805ab9e5c4613b019c7a95bdfabb6-problem-title"> </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn hint-button"> Next Hint </button> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> </div> <div class="submit-attempt-container"> <button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_f56805ab9e5c4613b019c7a95bdfabb6" > <span class="submit-label">Submit</span> </button> <div class="submission-feedback" id="submission_feedback_f56805ab9e5c4613b019c7a95bdfabb6"> <span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span> </div> </div> <div class="problem-action-buttons-wrapper"> <span class="problem-action-button-wrapper"> <button type="button" class="hint-button problem-action-btn btn-default btn-small" data-value="Hint" ><span class="icon fa fa-question" aria-hidden="true"></span>Hint</button> </span> </div> </div> <div class="notification warning notification-gentle-alert is-hidden" tabindex="-1"> <span class="icon fa fa-exclamation-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="f56805ab9e5c4613b019c7a95bdfabb6-problem-title"> </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification warning notification-save is-hidden" tabindex="-1"> <span class="icon fa fa-save" aria-hidden="true"></span> <span class="notification-message" aria-describedby="f56805ab9e5c4613b019c7a95bdfabb6-problem-title">None </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification general notification-show-answer is-hidden" tabindex="-1"> <span class="icon fa fa-info-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="f56805ab9e5c4613b019c7a95bdfabb6-problem-title">Answers are displayed within the problem </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> </div> " data-graded="False"> <p class="loading-spinner"> <i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i> <span class="sr">Loading…</span> </p> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@a4408588eb1d44b59698a4e8988a4cd4" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Introduction to Data Analysis: Linear Regression</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@ea059dfe655a46ef987989cad17bf5aa"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@ea059dfe655a46ef987989cad17bf5aa" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Introduction to Linear Regression</h3> <p>To go through the examples and exercises, we will describe a case study to explore data analysis approaches in health data. The case study data originates from a study examining the effect of indwelling arterial catheters (IAC) on 28-day mortality in the intensive care unit (ICU) in patients who were mechanically ventilated during the first day of ICU admission. The data comes from MIMIC-II v2.6.</p> <p>The MIMIC-II database (version 2.4) is described in: M. Saeed, M. Villarroel, A.T. Reisner, G. Clifford, L. Lehman, G.B. Moody, T. Heldt, T.H. Kyaw, B.E. Moody, R.G. Mark. <em><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124312/">Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public-access ICU database</a></em>. Critical Care Medicine 39(5):952-960 (2011 May)</p> <p>At this point, you are ready to do data analysis (the data extraction and cleaning have already been completed), and we will be using a comma-separated (.csv) file generated after this process, which you can load directly from <em>PhysioNet</em>.</p> <p>We will now move on to import the provided <code>*.rmd</code> file for this chapter in RStudio. The file will help you run the section exercises on your own.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@344385f5ff1a45299a00495898ea6bce" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Importing and Running the *.rmd file</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@229a86ae2f134deab2d3dd5c4bdeb67a"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@229a86ae2f134deab2d3dd5c4bdeb67a" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Importing and Running the *.rmd file </h3> <p>The R markdown can be found <a href="/assets/courseware/v1/f976638d9efaf524dadc0438ba0ff3ef/asset-v1:MITx+HST.953x+3T2020+type@asset+block/2.08_Linear_Regression.Rmd" target="[object Object]">here</a>.</p> <p>First, let's import the dataset into RStudio:</p> <pre style="text-align: center;">url <- "https://archive.physionet.org/physiobank/database/mimic2-iaccd/full_cohort_data.csv"</pre> <pre style="text-align: center;">dat <- read.csv(url)</pre> <p></p> <p>Alternatively, the dataset can be downloaded <a href="/assets/courseware/v1/e70cebeaf5bf7b9fdbb5584bf99e6562/asset-v1:MITx+HST.953x+3T2020+type@asset+block/full_cohort_data.csv" target="[object Object]">here</a>.</p> <p> </p> <p>After retrieving the information, you should have imported a dataframe with the size of 1776 observations (lines) and 46 variables (columns). The header of this file with the variable names can be accessed using the <span style="font-family: terminal, monaco; color: #3366ff;">names</span> function in R as follows:</p> <p style="text-align: center;">names(dat)</p> <p>The primary focus of the study was on the effect that IAC placement (<span style="font-family: terminal, monaco;">aline_flg</span>) has on 28-day mortality (<span style="font-family: terminal, monaco;">day_28_flg</span>). After we have covered the basics, we will identify a research objective and an appropriate analysis technique and execute an abbreviated analysis to illustrate how to use these techniques to address real scientific questions. Before we do this, we need to cover the basic techniques, and we will introduce four powerful data analysis methods frequently used in the analysis of health data. We will use examples from the case study dataset to introduce these concepts and will return to the the question of the effect of IAC on mortality toward the end of this chapter.</p> <p>Before moving on to the exercises, please watch the video that follows this section.</p> <p></p> <p></p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@e6389dad9d6a474c89bede7d0f98f065" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Introduction to Regression Analysis: Part I</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-runtime-class="LmsRuntime" data-block-type="video" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Regression Overview</h3> <div id="video_a1f6a76f9087433d990f64d2fccedde8" class="video closed" data-metadata='{"speed": null, "ytMetadataEndpoint": "", "recordedYoutubeIsAvailable": true, "captionDataDir": null, "autoAdvance": false, "poster": null, "ytTestTimeout": 1500, "prioritizeHls": false, "autoplay": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "completionPercentage": 0.95, "savedVideoPosition": 0.0, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/transcript/available_translations", "transcriptLanguage": "en", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/transcript/translation/__lang__", "saveStateEnabled": false, "duration": 0.0, "showCaptions": "true", "completionEnabled": false, "streams": "1.00:ilpYSbddG2A", "start": 0.0, "saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/xmodule_handler/save_user_state", "autohideHtml5": false, "sources": [], "lmsRootURL": "https://openlearninglibrary.mit.edu", "end": 0.0, "transcriptLanguages": {"en": "English"}, "generalSpeed": 1.0, "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/publish_completion"}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="a1f6a76f9087433d990f64d2fccedde8"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> <h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_a1f6a76f9087433d990f64d2fccedde8">Downloads and transcripts</h3> <div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_a1f6a76f9087433d990f64d2fccedde8"> <div class="wrapper-download-transcripts"> <h4 class="hd hd-5">Transcripts</h4> <ul class="list-download-transcripts"> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a> </li> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@a1f6a76f9087433d990f64d2fccedde8/handler/transcript/download" data-value="txt">Download Text (.txt) file</a> </li> </ul> </div> </div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@db95d9854aac4e73b5931600d6a6d4c8" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Introduction to Regression Analysis: Part II</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-runtime-class="LmsRuntime" data-block-type="video" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Regression Overview</h3> <div id="video_e5a516a3305140c7aebaecca710d0232" class="video closed" data-metadata='{"speed": null, "ytMetadataEndpoint": "", "recordedYoutubeIsAvailable": true, "captionDataDir": null, "autoAdvance": false, "poster": null, "ytTestTimeout": 1500, "prioritizeHls": false, "autoplay": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "completionPercentage": 0.95, "savedVideoPosition": 0.0, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/transcript/available_translations", "transcriptLanguage": "en", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/transcript/translation/__lang__", "saveStateEnabled": false, "duration": 0.0, "showCaptions": "true", "completionEnabled": false, "streams": "1.00:YmEsSCMjBOg", "start": 0.0, "saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/xmodule_handler/save_user_state", "autohideHtml5": false, "sources": [], "lmsRootURL": "https://openlearninglibrary.mit.edu", "end": 0.0, "transcriptLanguages": {"en": "English"}, "generalSpeed": 1.0, "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/publish_completion"}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="e5a516a3305140c7aebaecca710d0232"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> <h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_e5a516a3305140c7aebaecca710d0232">Downloads and transcripts</h3> <div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_e5a516a3305140c7aebaecca710d0232"> <div class="wrapper-download-transcripts"> <h4 class="hd hd-5">Transcripts</h4> <ul class="list-download-transcripts"> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a> </li> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@e5a516a3305140c7aebaecca710d0232/handler/transcript/download" data-value="txt">Download Text (.txt) file</a> </li> </ul> </div> </div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@06a6b381550e41e79a74710241e9c761" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Initial Analysis</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@f6bf8765894c4e1888046cab6f48d784"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@f6bf8765894c4e1888046cab6f48d784" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3 style="text-align: justify;"></h3> <p style="text-align: justify;">Please go back to the *.rmd file to follow the explanation of this example.</p> <p style="text-align: justify;">In the simplest scenario, we try to relate one <strong>continuous outcome</strong>, $y$, to a <strong>single continuous covariate</strong>, $x$, by trying to find values for $\beta_{0}$ and $\beta_{1}$ so that the following equation:</p> <p>\[y = \beta_{0} + \beta_{1}x \]</p> <p style="text-align: justify;">fits the data "optimally". The optimal values are $\hat\beta_{0}$ and $\hat\beta_{1}$ to distinguish them from the true values of $\beta_{0}$ and $\beta_{1}$, which are often unknown before. Fitting the data "optimally" means to minimize the squared distance between the fitted line and the observed data point, summed over all data points. This quantity is known as sum of squares error, or as the mean squared error <span style="font-size: 1em;">when divided by the number of observations.</span></p> <p style="text-align: justify;">It is always a good idea to visualize the data when you can, which allows one to assess if the subsequent analysis corresponds to what you could see with your eyes. In this case, a scatter plot can be produced using the plot function (check the *.rmd file for this section):</p> <pre style="text-align: center;">plot(dat$pco2_first,dat$tco2_first,xlab="PC02", ylab="TC02",pch=19,xlim=c(0,175))</pre> <p style="text-align: justify;">This command produces a scatterplot as in your *.rmd file. In the figure, we see a scatterplot of TCO2 (y: outcome) levels versus PCO2 (x: covariate) levels. We can clearly see that as PCO2 levels increase, the TCO2 levels also increase. This would suggest that we may be able to fit a linear regression model that predicts TCO2 from PCO2. </p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@4b6c784ba0ff422ab205c254d3119009"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@4b6c784ba0ff422ab205c254d3119009" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3 style="text-align: justify;"></h3> <p style="text-align: justify;">Finding the line of best fit for the scatterplot in R is relatively straightforward:</p> <pre style="text-align: center;">co2.lm <- lm(tco2_first~pco2_first,data=dat)</pre> <p style="text-align: justify;">Dissecting this command from left to right: the <span style="font-family: terminal, monaco;">co2.lm <-</span> part assigns the right part of the command to a new variable or object called <span style="font-family: terminal, monaco;">co2.lm</span> which contains information relevant to our linear regression model. The right side of this command runs the lm function in R . <span style="color: #3366ff; font-family: terminal, monaco;">lm </span>is a powerful function in R that fits linear models. As with any command in R , you can find additional helpful information by running <span style="color: #3366ff; font-family: terminal, monaco;">?lm</span> from the R command prompt. The basic <span style="color: #3366ff; font-family: terminal, monaco;">lm</span> command has two parts. The first is the formula which has the general syntax "outcome ~ covariates". Here, our outcome variable is called <span style="font-family: terminal, monaco;">tco2_first,</span> and we are just fitting one covariate, <span style="font-family: terminal, monaco;">pco2_first</span> , so our formula is <span style="font-family: terminal, monaco;">tco2_first ~ pco2_first</span> . The second argument is separated by a comma and is specifying the data frame to use. In our case, the data frame is called dat , so we pass <span style="font-family: terminal, monaco;"><span style="color: #3366ff;">data =</span> dat</span> , noting that both <span style="font-family: terminal, monaco;">tco2_first</span> and <span style="font-family: terminal, monaco;">pco2_first</span> are columns in the dataframe <span style="font-family: terminal, monaco; color: #3366ff;">dat</span>. The overall procedure of specifying a model formula (<span style="font-family: terminal, monaco;">tco2_first ~ pco2_first</span>), a data frame (<span style="font-family: terminal, monaco;"><span style="color: #3366ff;">data =</span> dat</span>) and passing it an appropriate R function (<span style="color: #3366ff; font-family: terminal, monaco;">lm</span>) will be used throughout this chapter and is the foundation for many types of statistical modeling in R .</p> <p style="text-align: justify;"></p> </div> </div> <div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@57083b00ef0b4c01acfbdfe49896bd7f"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@57083b00ef0b4c01acfbdfe49896bd7f" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p style="text-align: justify;">We would like to see some information about the model we just fit, and often a good way of doing this is to run the summary command on the object we created:</p> <pre style="line-height: 1.2; text-align: center;">summary(co2.lm)</pre> <p style="text-align: justify;">What can you read in your console? We basically get displayed the information about the <span style="font-family: terminal, monaco; color: #3366ff;">lm</span> object we created in the previous command (follow your *.rmd file)<span style="font-size: 1em;">.</span></p> <p style="text-align: justify;">The first part recalls the model we fit, which is useful when we have fit many models, and are trying to compare them. The second part lists some summary information about what are called <strong>residuals</strong>. Next, listed are the coefficient estimates; these are the $\hat\beta_{0}$, (Intercept), and $\hat\beta_{1}$, <span style="font-family: terminal, monaco;">pco2_first</span>, the parameters in the best fit line that we are trying to estimate. This output is telling us that the best fit equation for the data is:</p> <pre style="line-height: 1.2; text-align: center;">tco2_first = 16.21 + 0.189×pco2_first</pre> <p style="text-align: justify;">These two quantities have important interpretations. The estimated intercept ($\hat\beta_{0}$) tells us what TCO2 level we would predict for an individual with a PCO2 level of 0. This is the mathematical interpretation, and often this quantity has limited practical use. The estimated slope ($\hat\beta_{1}$) on the other hand can be interpreted as how quickly the predicted value of TCO2 goes up for every unit increase in PCO2. In this case, we estimate that TCO2 goes up about 0.189 mmol/L (or mM) for every 1 mmHg increase in PCO2. Each coefficient estimate has a corresponding Std. Error (standard error). This is a measure of how certain we are about the estimate. If the standard error is large relative to the coefficient, then we are less certain about our estimate. Many things can affect the standard error, including the study sample size. The next column in this table is the t value , which is simply the coefficient estimate divided by the standard error. This is followed by $Pr(|t|)$ which is also known as the <em>p</em>-value. The last two quantities are relevant to an area of statistics called hypothesis testing, which we will cover briefly next.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@fa4d156e531b4f908c40b1c8632aadd8" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Selecting a Model</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@9c9d787b60384ed38d5db62f52f4e156"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@9c9d787b60384ed38d5db62f52f4e156" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3 style="text-align: justify;"></h3> <p style="text-align: justify;">Model selection techniques are techniques related to selecting the best model from a list (perhaps a rather large list) of candidate models. We will cover some basics here, as more complicated techniques will be covered in a later chapter. In the simplest case, we have two models, and we want to know which one we should use.</p> <p style="text-align: justify;">We will begin by examining if the relationship between TCO2 and PCO2 is more complicated than the model we fit in the previous section. If you recall, we fit a model where we considered a linear <span style="font-family: terminal, monaco;">pco2_first</span> term:</p> <p style="text-align: center;">tco2_first = $\beta_{0} + \beta_{1} *$pco2_first</p> <p style="text-align: justify;">One may wonder if including a quadratic term would fit the data better, i.e. whether: <span style="font-family: terminal, monaco;">tco2_first</span> = $\beta_{0}+\beta_{1}*$<span style="font-family: terminal, monaco;">pco2_first</span>$+\beta_{2}*$<span style="font-family: terminal, monaco;">pco2_first</span>$^2$, is a better model. Adding a quadratic term (or any other function) is quite easy using a wide range of functions in Python or R.</p> <p style="text-align: justify;">On the next section, we are going to learn how to add this quadratic term.</p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@8c515d481cf24f8a96739d5a6d0d1247"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@8c515d481cf24f8a96739d5a6d0d1247" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3 style="text-align: justify;"></h3> <p style="text-align: justify;">One way to evaluate a quadratic term is by testing the null hypothesis: $\beta_{2}=0$. We do this by fitting the above model, and looking at the output. Adding a quadratic term (or any other function) is quite easy using the lm function. It is best practice to enclose any of these functions in the <span style="font-family: terminal, monaco;">I()</span> function to make sure they get evaluated as intended. The <span style="font-family: terminal, monaco;">I()</span> forces the formula to evaluate what is passed into it as is, which is necessary because the <span style="font-family: terminal, monaco;">^</span> operator has a different use in formulas in R (see <span style="font-family: terminal, monaco;">?formula</span> for further details). Fitting this model, and running the summary function for the model:</p> <pre style="text-align: center;">co2.quad.lm <- lm(tco2_first~pco2_first + I(pco2_first^2),data = dat)</pre> <pre style="text-align: center;">summary(co2.quad.lm)$coef</pre> <p>You should be able to read on your console or in your *.rmd file the information about estimates, standard errors and t-values.</p> <p>You will note that we have abbreviated the output from the summary function by appending <span style="font-family: terminal, monaco;">$coef</span> to the summary function. This tells R that we would like information about the coefficients only.</p> <p>Looking first at the estimates, we see the best fit line is estimated as:</p> <p style="text-align: center;">tco2_first = 160.09 + 0.19 × pco2_first + 0.00004 × pco2_first^2</p> <p>Now we are going to add best-fit lines to our scatterplots.</p> </div> </div> <div class="vert vert-2" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@75ea03ee65cb46cc88ff88a06654237b"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@75ea03ee65cb46cc88ff88a06654237b" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3 style="text-align: justify;"></h3> <p>We can add both best fit lines to the scatter plots using the <span style="font-family: terminal, monaco; color: #3366ff;">abline</span> function as seen in the *.rmd file. The red (linear term only) and blue (linear and quadratic terms) fits are nearly identical. This corresponds with the relatively small coefficient estimate for the <span style="font-family: terminal, monaco;">I(pco2_firstˆ2)</span> term. The <em>p</em>-value for this coefficient is about 0.86, and at the 0.05 significance level we would likely conclude that a quadratic term is not necessary in our model to fit the data, as the linear term only model fits the data nearly as well.</p> <p style="text-align: center;"><img height="301" width="421" src="/assets/courseware/v1/cb3cfaa974c54edc8a06a262d647d093/asset-v1:MITx+HST.953x+3T2020+type@asset+block/Rplot02.png" alt="Regression fits PCO2 on TCO2 with gender (black female; red male; solid no interaction; dotted with interaction). Note Both axes are cropped for illustration purposes" /></p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@30b52fd941e34e008d4a0153f05144da" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Hypothesis Testing</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@a5e4b2f4a46243fbbb64b7284d37a317"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@a5e4b2f4a46243fbbb64b7284d37a317" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p style="text-align: justify;"><em>Hypothesis testing</em> in statistics is fundamentally about evaluating two competing hypotheses. The null hypothesis is set up as a strawman (a sham argument set up to be defeated) and is the hypothesis you would like to provide evidence against.</p> <p style="text-align: justify;">This is almost always $\beta_{k}$, and it is often written as $H_{0}:\beta_{k}=0$. The alternative hypothesis is commonly assumed to be $\beta_{k}\neq 0$ and will often be written as $H_{A}:\beta_{k}\neq 0$. A statistical significance level, $\alpha$, should be established before any analysis is performed. This value is known as the Type I error and is the probability that we falsely conclude that the coefficient is non-zero when the coefficient is actually zero. It is commonly set at 0.05.</p> <p style="text-align: justify;">After specifying the hypotheses and the Type I error, hypotheses can be tested by computing a <em>p</em>-value. <em>P</em>-values are the probability of observing data as or more extreme than what was seen, assuming the null hypothesis is true. The null hypothesis is $\beta_{k}=0$. So when would observing <span style="color: #313131; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif;">data as or more extreme than what was seen </span><span style="font-size: 1em;">be unlikely? It is probably unlikely when we estimate $\beta_{k}$ to be rather large. However, how large is large enough? This would likely depend on how certain we are about the estimate of $\beta_{k}$ (</span>$\hat \beta_{k}$)<span style="font-size: 1em;">. If we were very certain, $\hat \beta_{k}$ likely would not have to be very large, but if we are less certain, then we might not think it to be unlikely for even very large values of $\hat \beta_{k}$. A </span><em style="font-size: 1em;">p</em><span style="font-size: 1em;">-value balances both of these aspects and computes a single number. We reject the null hypothesis when the </span><em style="font-size: 1em;">p-</em><span style="font-size: 1em;">value is smaller than the significance level, $\alpha$.</span></p> <p style="text-align: justify;"></p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@cc1501d6925f4ddf9e842cf844ae1f6d"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@cc1501d6925f4ddf9e842cf844ae1f6d" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p>Returning to the fit model of the previous scatterplot, the <em>p</em>-values for both coefficients are tiny ($2*10^{-16}$), so we would reject both null hypotheses, concluding that neither coefficient is likely zero. The intercept being zero, $\beta_{0}=0$, implies that the best-fit line goes through the origin [the ($x,y$) point ($0,0$)], and so we would reject this hypothesis. The slope being zero would mean that the best-fit line would be a flat horizontal line, and does not increase as PCO2 increases. Clearly, there is a relationship between TCO2 and PCO2, so we would also reject this hypothesis.</p> <p style="text-align: justify;">In summary, we would conclude that we need both an intercept and a slope in the model. The next obvious question would be, could the relationship be more complicated than a straight line? We will examine this next.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@51305e9d78eb420c963543b857c73752" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Exercise I</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4"> <div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-runtime-class="LmsRuntime" data-block-type="problem" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="True" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Problem"} </script> <div id="problem_20e77cf0fea6478b8c0b3661d4c221b4" class="problems-wrapper" role="group" aria-labelledby="20e77cf0fea6478b8c0b3661d4c221b4-problem-title" data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4/handler/xmodule_handler" data-problem-score="0" data-problem-total-possible="1" data-attempts-used="0" data-content=" <h3 class="hd hd-3 problem-header" id="20e77cf0fea6478b8c0b3661d4c221b4-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4-problem-progress" tabindex="-1"> Exercise I </h3> <div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@20e77cf0fea6478b8c0b3661d4c221b4-problem-progress"></div> <div class="problem"> <div> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> <fieldset aria-describedby="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> <legend id="20e77cf0fea6478b8c0b3661d4c221b4_2_1-legend" class="response-fieldset-legend field-group-hd">If you selected a significance level of \[\alpha = 0.05\], which of the following p-values would suggest that the null hypothesis should be rejected?</legend> <div class="field"> <input type="radio" name="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1" id="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_0" class="field-input input-radio" value="choice_0"/><label id="20e77cf0fea6478b8c0b3661d4c221b4_2_1-choice_0-label" for="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> 0.1 </label> </div> <div class="field"> <input type="radio" name="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1" id="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_1" class="field-input input-radio" value="choice_1"/><label id="20e77cf0fea6478b8c0b3661d4c221b4_2_1-choice_1-label" for="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> less than 0.01 </label> </div> <div class="field"> <input type="radio" name="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1" id="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_2" class="field-input input-radio" value="choice_2"/><label id="20e77cf0fea6478b8c0b3661d4c221b4_2_1-choice_2-label" for="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> 0.05 </label> </div> <div class="field"> <input type="radio" name="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1" id="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_3" class="field-input input-radio" value="choice_3"/><label id="20e77cf0fea6478b8c0b3661d4c221b4_2_1-choice_3-label" for="input_20e77cf0fea6478b8c0b3661d4c221b4_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1"> less than 0.05 </label> </div> <span id="answer_20e77cf0fea6478b8c0b3661d4c221b4_2_1"/> </fieldset> <div class="indicator-container"> <span class="status unanswered" id="status_20e77cf0fea6478b8c0b3661d4c221b4_2_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> </div></div> </div> <div class="action"> <input type="hidden" name="problem_id" value="Exercise I" /> <div class="submit-attempt-container"> <button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_20e77cf0fea6478b8c0b3661d4c221b4" > <span class="submit-label">Submit</span> </button> <div class="submission-feedback" id="submission_feedback_20e77cf0fea6478b8c0b3661d4c221b4"> <span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span> </div> </div> <div class="problem-action-buttons-wrapper"> </div> </div> <div class="notification warning notification-gentle-alert is-hidden" tabindex="-1"> <span class="icon fa fa-exclamation-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="20e77cf0fea6478b8c0b3661d4c221b4-problem-title"> </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification warning notification-save is-hidden" tabindex="-1"> <span class="icon fa fa-save" aria-hidden="true"></span> <span class="notification-message" aria-describedby="20e77cf0fea6478b8c0b3661d4c221b4-problem-title">None </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification general notification-show-answer is-hidden" tabindex="-1"> <span class="icon fa fa-info-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="20e77cf0fea6478b8c0b3661d4c221b4-problem-title">Answers are displayed within the problem </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> </div> " data-graded="False"> <p class="loading-spinner"> <i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i> <span class="sr">Loading…</span> </p> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@15f82d066e0b49459c8a572d65519c3d" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Statistical Interactions and Testing Nest Models - Part I</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@637e20b5f828459897c82b7ce544dc65"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@637e20b5f828459897c82b7ce544dc65" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p style="text-align: justify;">When we include other variables, we may wonder if the same straight line is true for all patients. For example, could the relationship between PCO2 and TCO2 be different among men and women? A more efficient way to accomplish this is by fitting both genders in a single model and including gender as a covariate.</p> <p>For example, we may fit:</p> <p style="text-align: center;">tco2_first = $\beta_{0}+\beta_{1}$*pco2_first + $\beta_{2}$*gender_num</p> <p>The variable gender_num takes on the values of 0 for women and 1 for men. For <strong>men</strong> the model is:</p> <p style="text-align: center;">tco2_first = ($\beta_{0} + \beta_{2})_{intercept} + \beta_1*$pco2_first</p> <p>and in <strong>women</strong>:</p> <p style="text-align: center;">tco2_first = $\beta_0 + \beta_1*$pco2_first</p> <p>As you can see, these models have the same slope but different intercepts (the distance between the slopes is $\beta_2$). In other words, the lines of best fit for men and women will be parallel and will be separated by a distance of β2 for all values of pco2_first . This isn’t exactly what we would like, as the slopes may also be different. To allow for this, we will be discussing the idea of an interaction between two variables.</p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@2b31be8a22bf44249ba4f7e0d4be7479"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@2b31be8a22bf44249ba4f7e0d4be7479" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>An interaction is essentially the product of two covariates. In this case, which we will call the interaction model, we would be fitting:</p> <p style="text-align: center;">tco2_first = $\beta_0 + \beta_1$*pco2_first$+\beta_2*$gender_num$+\beta_3*$gender_num*pco2_first$_{(interaction term)}$</p> <p style="text-align: justify;">Again, separating the cases for <strong>men</strong>:</p> <p style="text-align: center;">tco2_first $ = (\beta_0 + \beta_2)_{intercept} + (\beta_1+\beta_3)_{slope}*$pco2_first,</p> <p style="text-align: justify;">and <strong>women</strong>:</p> <p style="text-align: center;">tco2_first = $ (\beta_0)_{intercept} + (\beta_1)_{slope}*$pco2_first</p> <p>Now men and women have different intercepts and slopes.</p> <p style="text-align: justify;">Fitting these models in R is relatively straightforward. Although not absolutely required in this particular circumstance, it is wise to make sure that R handles data types in the correct way by ensuring our variables are of the right class. In this particular case, men are coded as 1 and women as 0 (a discrete binary covariate), but R thinks this is numeric (continuous) data:</p> <pre style="text-align: center;">class(dat$gender_num)</pre> <p>You should read this in the console and on the chunk of the *.rmd: <span style="font-family: terminal, monaco;">[1] "integer".</span></p> <p style="text-align: justify;">Leaving this unaltered will not affect the analysis in this instance, but it can become problematic when dealing with other types of data, such as categorical data with several categories (e.g., ethnicity). Also, by setting the data to the right type, the output that R generates can also be more informative. We can set the gender_num variable to the class factor by using the as.factor function:</p> <pre style="text-align: center;">dat$gender_num <-as.factor(dat$gender_num)</pre> <p style="text-align: justify;">Here we have just overwritten the old variable in the dat data frame with a new copy which is of class:</p> <pre style="text-align: center;">class(dat$gender_num).</pre> <p style="text-align: justify;">Now that we have the gender variable correctly encoded, we can fit the models we discussed above. </p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@5f3600f5d1f04023a117b6b942783c1b" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Statistical Interactions and Testing Nest Models - Part II</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@58ba63df30324e109165785157da66ab"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@58ba63df30324e109165785157da66ab" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p style="text-align: justify;">First, let's fit the model with gender as a covariate, but no interaction. We can do this by simply adding the variable <span style="font-family: terminal, monaco;">gender_num</span> to the previous formula for our <span style="font-family: terminal, monaco;">co2.lm</span> model fit by:</p> <pre style="text-align: center;">co2.gender.lm <- lm(tco2_first~pco2_first+gender_num,data = dat)</pre> <pre style="text-align: center;">summary(co2.gender.lm)$coef</pre> <p style="text-align: justify;">After that, you should be able to check the information that contains the estimates, standard error, and t-values for the intercept and the different terms in the equation. See the *.rmd file.</p> <p style="text-align: justify;">This output is very similar to what we had before, but now there’s a gender_num term as well. The 1 is present in the first column after gender_num, and it tells us who this coefficient is relevant to (subjects with 1 for the <span style="font-family: terminal, monaco;">gender_num</span> – men). This is always relative to the baseline group, and which in this case is women. The estimate is negative, meaning that the line fit for males will be below the line for females.</p> <p style="text-align: justify;">Plotting it should render the following figure:</p> <p style="text-align: center;"><img height="301" width="421" src="/assets/courseware/v1/ff7e3d36e7d7169d6f040ef2cf305ca2/asset-v1:MITx+HST.953x+3T2020+type@asset+block/Rplot03.png" alt="Regression fits" /></p> <p style="text-align: justify;">We see that the lines are parallel, but almost indistinguishable. In fact, this plot has been cropped in order to see any difference at all. From the estimate from the summary output above, the difference between the two lines is −0.182 mmol/L, which is quite small, so perhaps this isn’t too surprising. We can also see in the above summary output that the p-value is about 0.42, so we would likely not reject the null hypothesis that the true value of the gender_num coefficient is zero.</p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@306ba0d826b64448a1a33e9061ad6b20"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@306ba0d826b64448a1a33e9061ad6b20" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p style="text-align: justify;">And now moving on to the model with an interaction between <span style="font-family: terminal, monaco;">pco2_first</span> and <span style="font-family: terminal, monaco;">gender_num</span> . To add an interaction between two variables, use the * operator within a model formula. By default, R will add all of the main effects (variables contained in the interaction) to the model as well, so simply adding <span style="font-family: terminal, monaco;">pco2_first*gender_num</span> will add effects for <span style="font-family: terminal, monaco;">pco2_first</span> and <span style="font-family: terminal, monaco;">gender_num</span> in addition to the interaction between them to the model fit.</p> <pre style="text-align: center;">co2.gender.interaction.lm <-lm(tco2_first~pco2_first*gender_num,data = dat)</pre> <p>Let's get the summary of this model:</p> <pre style="text-align: center;">summary(co2.gender.interaction.lm)$coef. </pre> <p style="text-align: left;">You should be able to read it in the *.rmd on your console.</p> <p style="text-align: justify;">The estimated coefficients are $\beta_0,\beta_1,\beta_2,$ and $\beta_3$, respectively, and we can determine the best fit lines for <strong>men</strong>:</p> <p style="text-align: center;">tco2_first = (15.85 + 0.81) + (0.20 − 0.023) * pco2_first = 16.67 + 0.18 * pco2_first,</p> <p>and for <strong>women</strong>:</p> <p style="text-align: center;">tco2_first = 15.85 + 0.20 * pco2_first</p> <p>Based on this, the men’s intercept should be higher, but their slope should not be as steep relative to the women. Let’s check this and add the new model fits as dotted lines, and add a legend to the previous figure with the following code.</p> <p>In the next section, we are going to add the line fits.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@e30f45f0eba4437f8cd8ecd066e048c3" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Statistical Interactions and Testing Nest Models - Part III</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@bee3938cb4ae447cae437770af58ca24"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@bee3938cb4ae447cae437770af58ca24" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>Just follow the code that is in the *.rmd file to add the line fits. You can copy-paste in your console if you prefer to render by yourself. You should be able to render the following plot:</p> <p><img height="420" width="860" src="/assets/courseware/v1/0e552eed2126679add0835b9eee1485b/asset-v1:MITx+HST.953x+3T2020+type@asset+block/Rplot04.png" alt="Plot with legends" /></p> <p style="text-align: justify;">We can see that the fits generated from this plot are a little different than the one generated for a model without the interaction. The biggest difference is that the dotted lines are no longer parallel. This has some serious implications, particularly when it comes to interpreting our results. First note that the estimated coefficient for the gender_num variable is now positive. This means that at <span style="font-family: terminal, monaco;">pco2_first</span> = 0 , men (red) have higher <span style="font-family: terminal, monaco;">tco2_first</span> levels than women (black). If you recall from the previous model fit, women had higher levels of <span style="font-family: terminal, monaco;">tco2_first</span> at all levels of <span style="font-family: terminal, monaco;">pco2_first</span> . At some point around <span style="font-family: terminal, monaco;">pco2_first</span> = 35 this changes, and women (black) have higher <span style="font-family: terminal, monaco;">tco2_first</span> levels than men (red). This means that the effect of gender_num may vary as you change the level of <span style="font-family: terminal, monaco;">pco2_first</span> , and this is why interactions are often referred to as effect modification in the epidemiological literature. The effect need not change signs (i.e., the lines do not need to cross) over the observed range of values for an interaction to be present.</p> <p>The question remains, is the variable gender_num important? We looked at this briefly when we examined the t-value column in the no-interaction model, which included gender_num. What if we wanted to test (simultaneously) the null hypotheses: $\beta_2$ and $\beta_3=0$?</p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@3053a53d9db6498ab58e5bf98fa6f4e1"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@3053a53d9db6498ab58e5bf98fa6f4e1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p style="text-align: justify;">There is a useful test known as the F-test that can help us to simultaneously test the null hypotheses: $\beta_2$ and $\beta_3=0$. The F-test applies only to nested models—the larger model must contain each covariate that is used in the smaller model, and the smaller model cannot contain covariates that are not in the larger model.</p> <p style="text-align: justify;">The interaction model and the model with gender are nested models since all the covariates in the model with gender are also in the larger interaction model. An example of a non-nested model would be the quadratic model and the interaction model: the smaller (quadratic) model has a term (<span style="font-family: terminal, monaco;">pco2_first^2</span>) that is not in the larger (interaction) model. An F-test would not be appropriate for this latter case.</p> <p>To perform an F-test, first fit the two models you wish to consider, and then run the <span style="font-family: terminal, monaco; color: #3366ff;">anova</span> command passing the two model objects.</p> <pre style="text-align: center;">anova(co2.lm,co2.gender.interaction.lm)</pre> <p>You should be able to read either in your console or in the *.rmd file an Analysis of Variance Table. As you can see, the anova command first lists the models it is considering.</p> <p>Much of the rest of the information is beyond the scope of this chapter, but we will highlight the reported F-test <em>p</em>-value ( $Pr(\gt F)$), which in this case is 0.2515. In nested models, the null hypothesis is that all coefficients in the larger model but not in the smaller model are zero. In the case we are testing, our null hypotheses are $\beta_2$ and $\beta_3=0$. Since the <em>p</em>-value exceeds the typically used significance level ($\alpha=0.05$), we would not reject the null hypothesis and say the smaller model likely explains the data just as well as the larger model. If these were the only models we were considering, we would use the smaller model as our final model and report the final model in our results. We will now discuss what exactly you should report and how you can interpret the results.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@37ea30c6a0a54ee28b2b36c075694cff" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Reporting and Interpreting</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@595fce51d4f34e6c82531a9a42939af1"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@595fce51d4f34e6c82531a9a42939af1" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>Before presenting the results, some discussion of how you got the results should be done. It is a good idea to report the following: whether you transformed the outcome or any covariates in any way (e.g., by taking the logarithm), what covariates you considered, and how you chose the covariates which were in the model you reported.</p> <p>In our example above, we did not transform the outcome (TCO2); we considered PCO2 both as a linear and quadratic term; and we considered gender on its own and as an interaction term with PCO2. We first evaluated whether a quadratic term should be included in the model by using a t-test, after which we considered a model with gender and a gender-PCO2 interaction, and performed model selection with an F-test. Our final model involved only a linear PCO2 term and an intercept.</p> <p>When reporting your results, it's a good idea to report three aspects for each covariate.</p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@e0456cb6a78245e1b302e98452fcdba8"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@e0456cb6a78245e1b302e98452fcdba8" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>Firstly, you should always report the coefficient estimate. The coefficient estimate allows the reader to assess the magnitude of the effect. There are many circumstances where a result may be statistically significant but practically meaningless.</p> <p>Secondly, alongside your estimate, you should always report some measure of uncertainty or precision. For linear regression, the standard error (the Std. Error column in the R output) can be reported. We will cover another method called a confidence interval later on in this section.</p> <p>Lastly, reporting a p-value for each of the coefficients is also a good idea. An example of an appropriate presentation of our final model would be something similar to: TCO2 increased 0.18 (SE: 0.008, <em>p</em>-value <0.001) units per unit increase of PCO2. You will note we reported p-value <0.001, when in fact it is smaller than this. It is common to report very small <em>p</em>-values as <0.001 or <=0.0001 instead of using a large number of decimal places. While sometimes it's simply reported whether <em>p<=</em>0.05 not (i.e., if the result is statistically significant or not), this practice should be avoided.</p> <p>Often, it’s a good idea to also discuss how well the overall model fit. There are several ways to accomplish this, but reporting a unit-less quantity known as $R^2$ (pronounced r-squared) is often done. Looking back to the output $R$ provided for our chosen final model, we can find the value of $R^2$ for this model under <span style="font-family: terminal, monaco;">Multiple R-squared : 0.2647</span>. This quantity is a proportion (a number between 0 and 1) and describes how much of the total variability in the data is explained by the model. An $R^2$ of 1 indicates a perfect fit, while 0 explains no variability in the data. What exactly constitutes a ‘good’ $R^2$ depends on the subject matter and how it will be used.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@9d2cc50332f94b2da10153f324f9bccc" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Exercise II</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc"> <div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-runtime-class="LmsRuntime" data-block-type="problem" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="True" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Problem"} </script> <div id="problem_c1c52f7b32844c0e8cb9efef6d8a12fc" class="problems-wrapper" role="group" aria-labelledby="c1c52f7b32844c0e8cb9efef6d8a12fc-problem-title" data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc/handler/xmodule_handler" data-problem-score="0.0" data-problem-total-possible="1.0" data-attempts-used="0" data-content=" <h3 class="hd hd-3 problem-header" id="c1c52f7b32844c0e8cb9efef6d8a12fc-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc-problem-progress" tabindex="-1"> Exercise II </h3> <div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@c1c52f7b32844c0e8cb9efef6d8a12fc-problem-progress"></div> <div class="problem"> <div> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><p>Answer the following question.</p> <div class="choicegroup capa_inputtype" id="inputtype_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> <fieldset aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> <legend id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-legend" class="response-fieldset-legend field-group-hd">There are many performance metrics that you can estimate to asses how your linear regression model fit into your data. Select the metrics that would be useful for this task.</legend> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_0" class="field-input input-checkbox" value="choice_0"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_0-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> R-squared \[R^{2}\]. </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_1" class="field-input input-checkbox" value="choice_1"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_1-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> Area under the receiver curve (AUROC). </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_2" class="field-input input-checkbox" value="choice_2"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_2-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> Root Mean Squared Error (RMSE). </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_3" class="field-input input-checkbox" value="choice_3"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_3-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> Sensitivity. </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_4" class="field-input input-checkbox" value="choice_4"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_4-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_4" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> Mean Squared Error (MSE). </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_5" class="field-input input-checkbox" value="choice_5"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_5-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_5" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> Accuracy. </label> </div> <div class="field"> <input type="checkbox" name="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1[]" id="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_6" class="field-input input-checkbox" value="choice_6"/><label id="c1c52f7b32844c0e8cb9efef6d8a12fc_2_1-choice_6-label" for="input_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1_choice_6" class="response-label field-label label-inline" aria-describedby="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"> <i>p</i>-value. </label> </div> <span id="answer_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1"/> </fieldset> <div class="indicator-container"> <span class="status unanswered" id="status_c1c52f7b32844c0e8cb9efef6d8a12fc_2_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> </div></div> </div> <div class="action"> <input type="hidden" name="problem_id" value="Exercise II" /> <div class="submit-attempt-container"> <button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_c1c52f7b32844c0e8cb9efef6d8a12fc" > <span class="submit-label">Submit</span> </button> <div class="submission-feedback" id="submission_feedback_c1c52f7b32844c0e8cb9efef6d8a12fc"> <span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span> </div> </div> <div class="problem-action-buttons-wrapper"> <span class="problem-action-button-wrapper"> <button type="button" class="save problem-action-btn btn-default btn-small" data-value="Save"> <span class="icon fa fa-floppy-o" aria-hidden="true"></span> <span aria-hidden="true">Save</span> <span class="sr">Save your answer</span> </button> </span> </div> </div> <div class="notification warning notification-gentle-alert is-hidden" tabindex="-1"> <span class="icon fa fa-exclamation-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="c1c52f7b32844c0e8cb9efef6d8a12fc-problem-title"> </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification warning notification-save is-hidden" tabindex="-1"> <span class="icon fa fa-save" aria-hidden="true"></span> <span class="notification-message" aria-describedby="c1c52f7b32844c0e8cb9efef6d8a12fc-problem-title">None </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification general notification-show-answer is-hidden" tabindex="-1"> <span class="icon fa fa-info-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="c1c52f7b32844c0e8cb9efef6d8a12fc-problem-title">Answers are displayed within the problem </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> </div> " data-graded="False"> <p class="loading-spinner"> <i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i> <span class="sr">Loading…</span> </p> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@2576d377e3d64e169a3db85362389361" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Confidence and Prediction Intervals - Part I</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@c151438b5787432b966bd41949c0c738"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@c151438b5787432b966bd41949c0c738" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>As mentioned above, one method to quantify the uncertainty around coefficient estimates is by reporting the standard error. Another commonly used method is to report a confidence interval, most commonly a 95% confidence interval. A 95% confidence interval for $\beta$ is an interval for which if the data were collected repeatedly, about 95% of the intervals would contain the <em>true value</em> of the parameter, $\beta$, assuming the modeling assumptions are correct.</p> <p>To get 95% confidence intervals of coefficients, R has a <span style="color: #3366ff; font-family: terminal, monaco;">confint</span> function, which you pass an <span style="color: #3366ff; font-family: terminal, monaco;">lm</span> object to. It will then output 2.5 and 97.5% confidence interval limits for each coefficient if you write this code:</p> <pre style="text-align: center;">confint(co2.lm)</pre> <p style="text-align: justify;">In your console, the 95% confidence intervals <span style="font-size: 1em;">should be displayed. For </span><span style="font-family: terminal, monaco;">pco2_first</span><span style="font-family: terminal, monaco;">, it </span><span style="font-size: 1em;">is about 0.17–0.20, which may be slightly more informative than reporting the standard error. Often, people will look at if the confidence interval includes zero (no effect). Since it does not, and since the interval is in fact quite narrow and not very close to zero, this provides some additional evidence of its importance. There is a well-known link between hypothesis testing and confidence intervals, but we will not get into those details here.</span></p> <p style="text-align: justify;"></p> <p style="text-align: justify;"></p> </div> </div> <div class="vert vert-1" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@0e360d4a84814410a613d6fcadc9c0b5"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@0e360d4a84814410a613d6fcadc9c0b5" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p>When plotting the data with the model fit, as done before in this unit, it is a good idea to include some sort of assessment of uncertainty as well. To do this in R, we will first create a data frame with PCO2 levels, which we would like to predict. In this case, we would like to predict the outcome (TCO2) over the range of observed covariate (PCO2) values. We do this by creating a data frame, where the variable names in the data frame must match the covariates used in the model. In our case, we have only one covariate (<span style="font-family: terminal, monaco;">pco2_first</span>), and we predict the outcome over the range of covariate values we observed, as determined by the min and max functions.</p> <pre style="text-align: center;">grid.pred<-data.frame(pco2_first=seq.int(from=min(dat$pco2_first,na.rm = T), to=max(dat$pco2_first,na.rm = T)))</pre> <p>Then, by using the predict function, we can predict TCO2 levels at these PCO2 values. The predict function has three arguments: the model we have constructed (in this case, using <span style="font-family: terminal, monaco; color: #3366ff;">lm</span> ), <span style="font-family: terminal, monaco;">newdata</span> , and <span style="font-family: terminal, monaco;">interval</span> . The <span style="font-family: terminal, monaco;">newdata</span> argument allows you to pass any data frame with the same covariates as the model fit, which is why we created <span style="font-family: terminal, monaco;">grid.pred</span> above. Lastly, the interval argument is optional and allows for the inclusion of any confidence or prediction intervals. We want to illustrate a prediction interval that incorporates both the uncertainty about the model coefficients and the uncertainty generated by the data generating process, so we will pass interval=”prediction”. To do that, write the following lines in RStudio:</p> <pre style="text-align: center;">preds<-predict(co2.lm,newdata = grid.pred,interval = "prediction")</pre> <p style="text-align: justify;">To print the first two rows of our predictions:</p> <pre style="text-align: center;">preds[1:2,]</pre> <p>This will render some examples of our predictions, <span style="font-family: terminal, monaco;">preds</span> , which are the model’s predictions for PCO2 at 8 and 9. We can see that our predictions (<span style="font-family: terminal, monaco;">fit</span>) are about 0.18 apart, which makes sense given our estimate of the slope (0.18). We also see that our 95% prediction intervals are very wide, spanning about 9 (<span style="font-family: terminal, monaco;">lwr</span>) to 26 (<span style="font-family: terminal, monaco;">upr</span>). This indicates that, despite coming up with a model which is very statistically significant, we still have a lot of uncertainty about the predictions generated from such a model.</p> <p>In the next section, we are going to analyze the plot of our model.</p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@3f88b133dcd04a118b5daff7924e06e9" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Confidence and Prediction Intervals - Part II</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@05c00c79b455414ebdaaab2ff516db48"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@05c00c79b455414ebdaaab2ff516db48" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3></h3> <p>It is a good idea to capture this quality when plotting how well your model fits by adding the interval lines as dotted lines. Let’s plot our final model fit, <span style="font-family: terminal, monaco;">co2.lm</span> , along with the scatterplot and prediction interval with the following code:</p> <pre style="text-align: center;">plot(dat$pco2_first,dat$tco2_first,xlab = "PCO2",ylab = "TCO2",pch=19,xlim = c(0,175))</pre> <pre style="text-align: center;">co2.lm <- lm(tco2_first ~ pco2_first, data = dat)</pre> <pre style="text-align: center;">abline(co2.lm,col="red",lwd=2)</pre> <pre style="text-align: center;">lines(grid.pred$pco2_first,preds[,2],lty=3)</pre> <pre style="text-align: center;">lines(grid.pred$pco2_first,preds[,3],lty=3)</pre> <p style="text-align: center;"><span face="terminal, monaco" style="font-family: terminal, monaco;"><img height="374" width="563" src="/assets/courseware/v1/7dc284ca724718e70232b4ee3ab88365/asset-v1:MITx+HST.953x+3T2020+type@asset+block/Rplot05.png" alt="Scatterplot of PCO2 (x-axis) and TCO2 (y-axis) along with linear regression estimates from the linear only model (co2.lm). The dotted line represents 95 % prediction intervals for the model" /></span></p> <p style="text-align: justify;">In the scatterplot of PCO2 (x-axis) and TCO2 (y-axis), along with linear regression estimates from the linear-only model (co2.lm), the 95% prediction intervals for the model are represented as dotted lines.</p> <p style="text-align: justify;"></p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@b9ec4cbad5704343beca3f51a8bcc8e2" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Exercise III</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4"> <div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-runtime-class="LmsRuntime" data-block-type="problem" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="True" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Problem"} </script> <div id="problem_3b621d406da449cdb2b041ca7ac12cc4" class="problems-wrapper" role="group" aria-labelledby="3b621d406da449cdb2b041ca7ac12cc4-problem-title" data-problem-id="block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4" data-url="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4/handler/xmodule_handler" data-problem-score="0.0" data-problem-total-possible="3.0" data-attempts-used="0" data-content=" <h3 class="hd hd-3 problem-header" id="3b621d406da449cdb2b041ca7ac12cc4-problem-title" aria-describedby="block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4-problem-progress" tabindex="-1"> Exercise III </h3> <div class="problem-progress" id="block-v1:MITx+HST.953x+3T2020+type@problem+block@3b621d406da449cdb2b041ca7ac12cc4-problem-progress"></div> <div class="problem"> <div> <p>A Linear Regression model of $n$ feature vectors $x_{1}$, $x_{2}$, $x_{3}$, ..., $x_{n}$ that predict for an output $\hat{y}$ can be summarized in the following equation:</p> <p>\[\hat{y} = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + ... + \beta_{n}x_{n}\]</p> <p>Answer the following questions:</p> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="inputtype option-input "> <label class="problem-group-label" for="input_3b621d406da449cdb2b041ca7ac12cc4_2_1" id="label_3b621d406da449cdb2b041ca7ac12cc4_2_1">The predicted value.</label> <select name="input_3b621d406da449cdb2b041ca7ac12cc4_2_1" id="input_3b621d406da449cdb2b041ca7ac12cc4_2_1" aria-describedby="status_3b621d406da449cdb2b041ca7ac12cc4_2_1"> <option value="option_3b621d406da449cdb2b041ca7ac12cc4_2_1_dummy_default">Select an option</option> <option value="\[\beta_{0}\]"> \[\beta_{0}\]</option> <option value="\[\hat{y}\]"> \[\hat{y}\]</option> <option value="\[\beta_{1}, \beta_{2}, ..., \beta_{n} \]"> \[\beta_{1}, \beta_{2}, ..., \beta_{n} \]</option> </select> <div class="indicator-container"> <span class="status unanswered" id="status_3b621d406da449cdb2b041ca7ac12cc4_2_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> <p class="answer" id="answer_3b621d406da449cdb2b041ca7ac12cc4_2_1"/> </div></div> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 2" role="group"><div class="inputtype option-input "> <label class="problem-group-label" for="input_3b621d406da449cdb2b041ca7ac12cc4_3_1" id="label_3b621d406da449cdb2b041ca7ac12cc4_3_1">The bias term or intercept.</label> <select name="input_3b621d406da449cdb2b041ca7ac12cc4_3_1" id="input_3b621d406da449cdb2b041ca7ac12cc4_3_1" aria-describedby="status_3b621d406da449cdb2b041ca7ac12cc4_3_1"> <option value="option_3b621d406da449cdb2b041ca7ac12cc4_3_1_dummy_default">Select an option</option> <option value="\[\beta_{1}, \beta_{2}, ..., \beta_{n} \]"> \[\beta_{1}, \beta_{2}, ..., \beta_{n} \]</option> <option value="\[\beta_{0}\]"> \[\beta_{0}\]</option> <option value="\[\hat{y}\]"> \[\hat{y}\]</option> </select> <div class="indicator-container"> <span class="status unanswered" id="status_3b621d406da449cdb2b041ca7ac12cc4_3_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> <p class="answer" id="answer_3b621d406da449cdb2b041ca7ac12cc4_3_1"/> </div></div> <div class="wrapper-problem-response" tabindex="-1" aria-label="Question 3" role="group"><div class="inputtype option-input "> <label class="problem-group-label" for="input_3b621d406da449cdb2b041ca7ac12cc4_4_1" id="label_3b621d406da449cdb2b041ca7ac12cc4_4_1">The feature weights or feature coefficients.</label> <select name="input_3b621d406da449cdb2b041ca7ac12cc4_4_1" id="input_3b621d406da449cdb2b041ca7ac12cc4_4_1" aria-describedby="status_3b621d406da449cdb2b041ca7ac12cc4_4_1"> <option value="option_3b621d406da449cdb2b041ca7ac12cc4_4_1_dummy_default">Select an option</option> <option value="\[\beta_{0}\]"> \[\beta_{0}\]</option> <option value="\[\beta_{1}, \beta_{2}, ..., \beta_{n} \]"> \[\beta_{1}, \beta_{2}, ..., \beta_{n} \]</option> <option value="\[\hat{y}\]"> \[\hat{y}\]</option> </select> <div class="indicator-container"> <span class="status unanswered" id="status_3b621d406da449cdb2b041ca7ac12cc4_4_1" data-tooltip="Not yet answered."> <span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/> </span> </div> <p class="answer" id="answer_3b621d406da449cdb2b041ca7ac12cc4_4_1"/> </div></div> </div> <div class="action"> <input type="hidden" name="problem_id" value="Exercise III" /> <div class="submit-attempt-container"> <button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_3b621d406da449cdb2b041ca7ac12cc4" > <span class="submit-label">Submit</span> </button> <div class="submission-feedback" id="submission_feedback_3b621d406da449cdb2b041ca7ac12cc4"> <span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span> </div> </div> <div class="problem-action-buttons-wrapper"> <span class="problem-action-button-wrapper"> <button type="button" class="save problem-action-btn btn-default btn-small" data-value="Save"> <span class="icon fa fa-floppy-o" aria-hidden="true"></span> <span aria-hidden="true">Save</span> <span class="sr">Save your answer</span> </button> </span> </div> </div> <div class="notification warning notification-gentle-alert is-hidden" tabindex="-1"> <span class="icon fa fa-exclamation-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="3b621d406da449cdb2b041ca7ac12cc4-problem-title"> </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification warning notification-save is-hidden" tabindex="-1"> <span class="icon fa fa-save" aria-hidden="true"></span> <span class="notification-message" aria-describedby="3b621d406da449cdb2b041ca7ac12cc4-problem-title">None </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> <div class="notification general notification-show-answer is-hidden" tabindex="-1"> <span class="icon fa fa-info-circle" aria-hidden="true"></span> <span class="notification-message" aria-describedby="3b621d406da449cdb2b041ca7ac12cc4-problem-title">Answers are displayed within the problem </span> <div class="notification-btn-wrapper"> <button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button> </div> </div> </div> " data-graded="False"> <p class="loading-spinner"> <i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i> <span class="sr">Loading…</span> </p> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@adeb291bdb4445d3a3e1755424f9a902" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear Regression: Reinforcement Material (optional)</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-runtime-class="LmsRuntime" data-block-type="video" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Reinforcement Material (optional)</h3> <div id="video_d45677d90222495593205a622369dfba" class="video closed" data-metadata='{"speed": null, "ytMetadataEndpoint": "", "recordedYoutubeIsAvailable": true, "captionDataDir": null, "autoAdvance": false, "poster": null, "ytTestTimeout": 1500, "prioritizeHls": false, "autoplay": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "completionPercentage": 0.95, "savedVideoPosition": 0.0, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/transcript/available_translations", "transcriptLanguage": "en", "transcriptTranslationUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/transcript/translation/__lang__", "saveStateEnabled": false, "duration": 0.0, "showCaptions": "true", "completionEnabled": false, "streams": "1.00:ooYn5nT-Fho", "start": 0.0, "saveStateUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/xmodule_handler/save_user_state", "autohideHtml5": false, "sources": [], "lmsRootURL": "https://openlearninglibrary.mit.edu", "end": 0.0, "transcriptLanguages": {"en": "English"}, "generalSpeed": 1.0, "publishCompletionUrl": "/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/publish_completion"}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="d45677d90222495593205a622369dfba"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> <h3 class="hd hd-4 downloads-heading sr" id="video-download-transcripts_d45677d90222495593205a622369dfba">Downloads and transcripts</h3> <div class="wrapper-downloads" role="region" aria-labelledby="video-download-transcripts_d45677d90222495593205a622369dfba"> <div class="wrapper-download-transcripts"> <h4 class="hd hd-5">Transcripts</h4> <ul class="list-download-transcripts"> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/transcript/download" data-value="srt">Download SubRip (.srt) file</a> </li> <li class="transcript-option"> <a class="btn btn-link" href="/courses/course-v1:MITx+HST.953x+3T2020/xblock/block-v1:MITx+HST.953x+3T2020+type@video+block@d45677d90222495593205a622369dfba/handler/transcript/download" data-value="txt">Download Text (.txt) file</a> </li> </ul> </div> </div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@vertical+block@494912d64c30474485fded74cbeb1f08" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Take Home Messages</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+HST.953x+3T2020+type@html+block@9fda4c322511442bac9bbd158395f64a"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+HST.953x+3T2020+type@html+block@9fda4c322511442bac9bbd158395f64a" data-course-id="course-v1:MITx+HST.953x+3T2020" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="064965005f2c11f0bb1012a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <h3>Summary</h3> <p>To wrap it all up, linear regression is an extremely powerful tool for conducting data analysis on continuous outcomes. Despite this, there are several aspects to be aware of when performing this type of analysis:</p> <ol> <li>Hypothesis testing and the interval generation are reliant on modeling assumptions.</li> <li>Outliers can be problematic when fitting models.</li> <li>Be concerned about missing data.</li> <li>Assess potential multi-colinearity. Co-linearity can occur when two or more covariates are highly correlated, for instance if blood pressure on the left and right arms were simultaneously measured, and both were used as covariates in the model.</li> <li>Check to see if outcomes are dependent.</li> </ol> <p>These concerns should not discourage you from using linear regression. It is extremely powerful, simple and reasonably robust to some of the problems discussed above, depending on the situation.</p> </div> </div> </div> </div>