Introduction to Machine Learning

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@classifier_notes" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Notes – Chapter 2: Linear classifiers</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@classifier_notes_top"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@classifier_notes_top" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> You can sequence through the Linear Classifier lecture video and note segments (go to Next page). </p><p> You can also (or alternatively) download the <a href="/assets/courseware/v1/9d904854b4ae0878cfdcedcdceabf937/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifiers.pdf" target="_blank">Chapter 2: Linear classifiers</a> notes as a PDF file. </p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L01h_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Lecture: Linear classifiers</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-runtime-class="LmsRuntime" data-block-type="video" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: Linear classifiers</h3> <div id="video_MIT6036L01h" class="video closed" data-metadata='{"speed": null, "ytMetadataEndpoint": "", "recordedYoutubeIsAvailable": true, "captionDataDir": null, "autoAdvance": false, "poster": null, "ytTestTimeout": 1500, "prioritizeHls": false, "autoplay": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "completionPercentage": 0.95, "savedVideoPosition": 0.0, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h/handler/transcript/available_translations", "transcriptLanguage": "en", "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h/handler/transcript/translation/__lang__", "saveStateEnabled": false, "duration": 0.0, "showCaptions": "true", "completionEnabled": false, "streams": "1.00:yOKDzd73KgM", "start": 0.0, "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h/handler/xmodule_handler/save_user_state", "autohideHtml5": false, "sources": [], "lmsRootURL": "https://openlearninglibrary.mit.edu", "end": 0.0, "transcriptLanguages": {"en": "English"}, "generalSpeed": 1.0, "publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01h/handler/publish_completion"}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L01h"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@linear_classifiers_classification_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Classification</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_classification"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_classification" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> A binary <em>classifier</em> is <span options="" class="marginote"><span class="marginote_desc" style="display:none">Actually, general classifiers can have a range which is any discrete set, but we'll work with this specific case for a while.</span><span>a mapping from [mathjaxinline]\mathbb {R}^ d \rightarrow \{ -1, +1\}[/mathjaxinline].</span></span> We'll often use the letter [mathjaxinline]h[/mathjaxinline] (for hypothesis) to stand for a classifier, so the classification process looks like: </p><table id="a0000000007" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]x \rightarrow \boxed {h} \rightarrow y \; \; .[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table><p> Real life rarely gives us vectors of real numbers; the [mathjaxinline]x[/mathjaxinline] we really want to classify is usually something like a song, image, or person. In that case, we'll have to define a function [mathjaxinline]\varphi (x)[/mathjaxinline], whose domain is [mathjaxinline]\mathbb {R}^ d[/mathjaxinline], where [mathjaxinline]\varphi[/mathjaxinline] represents <em>features</em> of [mathjaxinline]x[/mathjaxinline], like a person's height or the amount of bass in a song, and then let the [mathjaxinline]h: \varphi (x) \rightarrow \{ -1, +1\}[/mathjaxinline]. In much of the following, we'll omit explicit mention of [mathjaxinline]\varphi[/mathjaxinline] and assume that the [mathjaxinline]x^{(i)}[/mathjaxinline] are in [mathjaxinline]\mathbb {R}^ d[/mathjaxinline], but you should always have in mind that some additional process was almost surely required to go from the actual input examples to their feature representation. </p><p> In <em>supervised learning</em> we are given a training data set of the form </p><table id="a0000000008" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]{\cal D}_ n = \left\{ \left(x^{(1)}, y^{(1)}\right), \dots , \left(x^{(n)}, y^{(n)}\right)\right\} \; \; .[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table><p> We will assume that each [mathjaxinline]x^{(i)}[/mathjaxinline] is a [mathjaxinline]d \times 1[/mathjaxinline] <em>column vector</em>. The intended meaning of this data is that, when given an input [mathjaxinline]x^{(i)}[/mathjaxinline], the learned hypothesis should generate output [mathjaxinline]y^{(i)}[/mathjaxinline]. </p><p> What makes a classifier useful? That it works well on <em>new</em> data; that is, that it makes good predictions on <span options="" class="marginote"><span class="marginote_desc" style="display:none">My favorite analogy is to problem sets. We evaluate a student's ability to <em>generalize</em> by putting questions on the exam that were not on the homework (training set).</span><span>examples it hasn't seen.</span></span> But we don't know exactly what data this classifier might be tested on when we use it in the real world. So, we have to <em>assume</em> a connection between the training data and testing data; typically, they are drawn independently from the same probability distribution. </p><p> Given a training set [mathjaxinline]{\cal D}_ n[/mathjaxinline] and a classifier [mathjaxinline]h[/mathjaxinline], we can define the <em>training error</em> of [mathjaxinline]h[/mathjaxinline] to be </p><table id="a0000000009" cellpadding="7" width="100%" cellspacing="0" class="eqnarray" style="table-layout:auto"><tr id="a0000000010"><td style="width:40%; border:none"> </td><td style="vertical-align:middle; text-align:right; border:none"> [mathjaxinline]\displaystyle \mathcal{E}_ n(h) = \frac{1}{n}\sum _{i = 1}^{n}\begin{cases} 1 & h(x^{(i)}) \ne y^{(i)} \\ 0 & \text {otherwise}\end{cases} \; \; .[/mathjaxinline] </td><td style="width:40%; border:none"> </td><td style="width:20%; border:none" class="eqnnum"> </td></tr></table><p> For now, we will try to find a classifier with small training error (later, with some added criteria) and hope it <em>generalizes well</em> to new data, and has a small <em>test error</em> </p><table id="a0000000011" cellpadding="7" width="100%" cellspacing="0" class="eqnarray" style="table-layout:auto"><tr id="a0000000012"><td style="width:40%; border:none"> </td><td style="vertical-align:middle; text-align:right; border:none"> [mathjaxinline]\displaystyle \mathcal{E}(h) = \frac{1}{n'}\sum _{i = n + 1}^{n + n'}\begin{cases} 1 & h(x^{(i)}) \ne y^{(i)} \\ 0 & \text {otherwise}\end{cases}[/mathjaxinline] </td><td style="width:40%; border:none"> </td><td style="width:20%; border:none" class="eqnnum"> </td></tr></table><p> on [mathjaxinline]n'[/mathjaxinline] new examples that were not used in the process of finding the classifier. </p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/dc2ca4ef11bd7be3faec0efc56f46cca/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifers.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:04 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@linear_classifiers_learning_algorithm_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Learning algorithm</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_learning_algorithm"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_learning_algorithm" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> A <em>hypothesis class</em> [mathjaxinline]{\cal H}[/mathjaxinline] is a set (finite or infinite) of possible classifiers, each of which represents a mapping from [mathjaxinline]\mathbb {R}^ d \rightarrow \{ -1, +1\}[/mathjaxinline]. </p><p> A <em>learning algorithm</em> is a procedure that takes a data set [mathjaxinline]{\cal D}_ n[/mathjaxinline] as input and returns an element [mathjaxinline]h[/mathjaxinline] of [mathjaxinline]{\cal H}[/mathjaxinline]; it looks like </p><table id="a0000000013" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]{\cal D}_ n \longrightarrow \boxed {\text {learning alg (${\cal H}$)}} \longrightarrow h[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table><p> We will find that the choice of [mathjaxinline]{\cal H}[/mathjaxinline] can have a big impact on the test error of the [mathjaxinline]h[/mathjaxinline] that results from this process. One way to get [mathjaxinline]h[/mathjaxinline] that generalizes well is to restrict the size, or “expressiveness" of [mathjaxinline]{\cal H}[/mathjaxinline]. </p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/dc2ca4ef11bd7be3faec0efc56f46cca/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifers.pdf" target="_blank">Download this chapter as a PDF file</a></p><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:04 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@linear_classifiers_linear_classifiers_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Linear classifiers</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_linear_classifiers"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_linear_classifiers" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> We'll start with the hypothesis class of <em>linear classifiers</em>. They are (relatively) easy to understand, simple in a mathematical sense, powerful on their own, and the basis for many other more sophisticated methods. </p><p> A linear classifier in [mathjaxinline]d[/mathjaxinline] dimensions is defined by a vector of parameters [mathjaxinline]\theta \in \mathbb {R}^ d[/mathjaxinline] and scalar [mathjaxinline]\theta _0 \in \mathbb {R}[/mathjaxinline]. So, the hypothesis class [mathjaxinline]{\cal H}[/mathjaxinline] of linear classifiers in [mathjaxinline]d[/mathjaxinline] dimensions is the <em>set</em> of all vectors in [mathjaxinline]\mathbb {R}^{d+1}[/mathjaxinline]. We'll assume that [mathjaxinline]\theta[/mathjaxinline] is an [mathjaxinline]d \times 1[/mathjaxinline] column vector. </p><p> Given particular values for [mathjaxinline]\theta[/mathjaxinline] and [mathjaxinline]\theta _0[/mathjaxinline], the <span options="" class="marginote"><span class="marginote_desc" style="display:none">Let's be careful about dimensions. We have assumed that [mathjaxinline]x[/mathjaxinline] and [mathjaxinline]\theta[/mathjaxinline] are both [mathjaxinline]d \times 1[/mathjaxinline] column vectors. So [mathjaxinline]\theta ^ T x[/mathjaxinline] is [mathjaxinline]1 \times 1[/mathjaxinline], which in math (but not necessarily numpy) is the same as a scalar.</span><span>classifier is defined by</span></span> </p><table id="a0000000014" cellpadding="7" width="100%" cellspacing="0" class="eqnarray" style="table-layout:auto"><tr id="a0000000015"><td style="width:40%; border:none"> </td><td style="vertical-align:middle; text-align:right; border:none"> [mathjaxinline]\displaystyle h(x; \theta , \theta _0) = \text {sign}(\theta ^ T x + \theta _0) = \begin{cases} +1 & \text {if $\theta ^ Tx + \theta _0 > 0$} \\ -1 & \text {otherwise}\end{cases} \; \; .[/mathjaxinline] </td><td style="width:40%; border:none"> </td><td style="width:20%; border:none" class="eqnnum"> </td></tr></table><p> Remember that we can think of [mathjaxinline]\theta , \theta _0[/mathjaxinline] as specifying a hyperplane. It divides [mathjaxinline]\mathbb {R}^ d[/mathjaxinline], the space our [mathjaxinline]x^{(i)}[/mathjaxinline] points live in, into two half-spaces. The one that is on the same side as the normal vector is the <em>positive</em> half-space, and we classify all points in that space as positive. The half-space on the other side is <em>negative</em> and all points in it are classified as negative. </p><p><div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><p><b class="bf">Example:</b> Let [mathjaxinline]h[/mathjaxinline] be the linear classifier defined by [mathjaxinline]\theta = \begin{bmatrix} -1 \\ 1.5 \end{bmatrix}, \theta _0 = 3[/mathjaxinline]. </p><p> The diagram below shows several points classified by [mathjaxinline]h[/mathjaxinline]. In particular, let [mathjaxinline]x^{(1)} = \begin{bmatrix} 3 \\ 2 \end{bmatrix}[/mathjaxinline] and [mathjaxinline]x^{(2)} = \begin{bmatrix} 4 \\ -1 \end{bmatrix}[/mathjaxinline]. </p><table id="a0000000016" cellpadding="7" width="100%" cellspacing="0" class="eqnarray" style="table-layout:auto"><tr id="a0000000017"><td style="width:40%; border:none"> </td><td style="vertical-align:middle; text-align:right; border:none"> [mathjaxinline]\displaystyle h(x^{(1)}; \theta , \theta _0)[/mathjaxinline] </td><td style="vertical-align:middle; text-align:left; border:none"> [mathjaxinline]\displaystyle = \text {sign}\left(\begin{bmatrix} -1 & 1.5 \end{bmatrix}\begin{bmatrix} 3 \\ 2 \end{bmatrix} + 3\right) = \text {sign}(3) = +1[/mathjaxinline] </td><td style="width:40%; border:none"> </td><td style="width:20%; border:none" class="eqnnum"> </td></tr><tr id="a0000000018"><td style="width:40%; border:none"> </td><td style="vertical-align:middle; text-align:right; border:none"> [mathjaxinline]\displaystyle h(x^{(2)}; \theta , \theta _0)[/mathjaxinline] </td><td style="vertical-align:middle; text-align:left; border:none"> [mathjaxinline]\displaystyle = \text {sign}\left(\begin{bmatrix} -1 & 1.5 \end{bmatrix}\begin{bmatrix} 4 \\ -1 \end{bmatrix} + 3\right) = \text {sign}(-2.5) = -1[/mathjaxinline] </td><td style="width:40%; border:none"> </td><td style="width:20%; border:none" class="eqnnum"> </td></tr></table><p> Thus, [mathjaxinline]x^{(1)}[/mathjaxinline] and [mathjaxinline]x^{(2)}[/mathjaxinline] are given positive and negative classfications, respectively. </p><p><img src="/assets/courseware/v1/ffbb222e7c1b940253f07913aeca9dfb/asset-v1:MITx+6.036+1T2019+type@asset+block/images_linear_classifiers_linear_classifiers_tikzpicture_1-crop.png" width="618"/></p></div></p><p> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF">What is green vector normal to the hyperplane? Specify it as a column vector.</span> <br/> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> What change would you have to make to [mathjaxinline]\theta , \theta _0[/mathjaxinline] if you wanted to have the separating hyperplane in the same place, but to classify all the points labeled '+' in the diagram as negative and all the points labeled '-' in the diagram as positive?</span> <br/></p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/dc2ca4ef11bd7be3faec0efc56f46cca/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifers.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:04 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L01j_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Lecture: The random linear classifier algorithm</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-runtime-class="LmsRuntime" data-block-type="video" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: The random linear classifier algorithm</h3> <div id="video_MIT6036L01j" class="video closed" data-metadata='{"speed": null, "ytMetadataEndpoint": "", "recordedYoutubeIsAvailable": true, "captionDataDir": null, "autoAdvance": false, "poster": null, "ytTestTimeout": 1500, "prioritizeHls": false, "autoplay": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "completionPercentage": 0.95, "savedVideoPosition": 0.0, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j/handler/transcript/available_translations", "transcriptLanguage": "en", "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j/handler/transcript/translation/__lang__", "saveStateEnabled": false, "duration": 0.0, "showCaptions": "true", "completionEnabled": false, "streams": "1.00:Mm314LUrNN8", "start": 0.0, "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j/handler/xmodule_handler/save_user_state", "autohideHtml5": false, "sources": [], "lmsRootURL": "https://openlearninglibrary.mit.edu", "end": 0.0, "transcriptLanguages": {"en": "English"}, "generalSpeed": 1.0, "publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L01j/handler/publish_completion"}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L01j"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@linear_classifiers_learning_linear_classifiers_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Learning linear classifiers</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_learning_linear_classifiers"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_learning_linear_classifiers" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> Now, given a data set and the hypothesis class of linear classifiers, our objective will be to find the linear classifier with the smallest possible training error. </p><p><em>This is a well-formed optimization problem. But it's not computationally easy!</em></p><p> We'll start by considering a very <span options="" class="marginote"><span class="marginote_desc" style="display:none">It's a good idea to think of the “stupidest possible" solution to a problem, before trying to get clever. Here's a fairly (but not completely) stupid algorithm.</span><span>simple learning algorithm. </span></span> The idea is to generate [mathjaxinline]k[/mathjaxinline] possible hypotheses by generating their parameter vectors at random. Then, we can evaluate the training-set error on each of the hypotheses and return the hypothesis that has the lowest training error (breaking ties arbitrarily). </p><p><img src="/assets/courseware/v1/a07a6224f2c522696ec3c5e584fa0cc7/asset-v1:MITx+6.036+1T2019+type@asset+block/images_linear_classifiers_learning_linear_classifiers_codebox_1-crop.png" width="506"/></p><p><span options="" class="marginote"><span class="marginote_desc" style="display:none">This might be new notation: [mathjaxinline]{\rm arg}\min _{x} f(x)[/mathjaxinline] means the value of [mathjaxinline]x[/mathjaxinline] for which [mathjaxinline]f(x)[/mathjaxinline] is the smallest. Sometimes we write [mathjaxinline]{\rm arg}\min _{x \in {\cal X}} f(x)[/mathjaxinline] when we want to explicitly specify the set [mathjaxinline]{\cal X}[/mathjaxinline] of values of [mathjaxinline]x[/mathjaxinline] over which we want to minimize.</span><span>A note about notation.</span></span></p><p> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> What do you think happens to [mathjaxinline]\mathcal{E}_ n(h)[/mathjaxinline], where [mathjaxinline]h[/mathjaxinline] is the hypothesis returned by Random-Linear-Classifier, as [mathjaxinline]k[/mathjaxinline] is increased? </span> <br/></p><p> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> What properties of [mathjaxinline]{\cal D}_ n[/mathjaxinline] do you think will have an effect on [mathjaxinline]\mathcal{E}_ n(h)[/mathjaxinline]? </span> <br/> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/dc2ca4ef11bd7be3faec0efc56f46cca/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifers.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:04 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-runtime-class="LmsRuntime" data-block-type="vertical" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@linear_classifiers_evaluation_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-init="VerticalStudentView" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <h2 class="hd hd-2 unit-title">Evaluating a learning algorithm</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_evaluation"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-runtime-class="LmsRuntime" data-block-type="html" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@linear_classifiers_evaluation" data-course-id="course-v1:MITx+6.036+1T2019" data-init="XBlockToXModuleShim" data-has-score="False" data-graded="False" data-request-token="4acd151c5f0611f0bb0312a532d12aaf" data-runtime-version="1"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> How should we evaluate the performance of a <em>classifier</em> [mathjaxinline]h[/mathjaxinline]? The best method is to measure <em>test error</em> on data that was not used to train it. </p><p> How should we evaluate the performance of a <em>learning algorithm</em>? This is trickier. There are many potential sources of variability in the possible result of computing test error on a learned hypothesis [mathjaxinline]h[/mathjaxinline]: </p><ul class="itemize"><li><p> Which particular <em>training examples</em> occurred in [mathjaxinline]{\cal D}_ n[/mathjaxinline] </p></li><li><p> Which particular <em>testing examples</em> occurred in [mathjaxinline]{\cal D}_{n'}[/mathjaxinline] </p></li><li><p> Randomization inside the learning <em>algorithm</em> itself </p></li></ul><p> Generally, we would like to execute the following process multiple times: </p><ul class="itemize"><li><p> Train on a new training set </p></li><li><p> Evaluate resulting [mathjaxinline]h[/mathjaxinline] on a testing set <em>that does not overlap the training set</em> </p></li></ul><p> Doing this multiple times controls for possible poor choices of training set or unfortunate randomization inside the algorithm itself. </p><p> One concern is that we might need a lot of data to do this, and in many applications data is expensive or difficult to acquire. We can re-use data with <em>cross validation</em> (but it's harder to do theoretical analysis). <br/></p><p><img src="/assets/courseware/v1/eb9d5e71939ffc5fdd7b627f962e22e9/asset-v1:MITx+6.036+1T2019+type@asset+block/images_linear_classifiers_evaluating_a_learning_algorithm_codebox_1-crop.png" width="641"/></p><p> It's very important to understand that cross-validation neither delivers nor evaluates a single particular hypothesis [mathjaxinline]h[/mathjaxinline]. It evaluates the <em>algorithm</em> that produces hypotheses. </p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/dc2ca4ef11bd7be3faec0efc56f46cca/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Linear_classifers.pdf" target="_blank">Download this chapter as a PDF file</a></p><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:04 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>