Introduction to Machine Learning

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@feature_representation_notes" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Notes – Chapter 4: Feature representation</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_notes_top"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_notes_top" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="html" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> You can sequence through the Feature representation lecture video and note segments (go to Next page). </p><p> You can also (or alternatively) download the <a href="/assets/courseware/v1/b5ca509c17bab346cc6252ca41a1aac7/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Feature_representation.pdf" target="_blank">Chapter 4: Feature representation</a> notes as a PDF file. </p> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L02g_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Lecture: Feature representation - transforming through-origin to not-through-origin</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="video" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: Feature representation - transforming through-origin to not-through-origin</h3> <div id="video_MIT6036L02g" class="video closed" data-metadata='{"publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g/handler/publish_completion", "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g/handler/xmodule_handler/save_user_state", "duration": 0.0, "sources": [], "showCaptions": "true", "ytTestTimeout": 1500, "saveStateEnabled": false, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g/handler/transcript/available_translations", "ytMetadataEndpoint": "", "prioritizeHls": false, "completionPercentage": 0.95, "transcriptLanguage": "en", "recordedYoutubeIsAvailable": true, "savedVideoPosition": 0.0, "completionEnabled": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "lmsRootURL": "https://openlearninglibrary.mit.edu", "captionDataDir": null, "streams": "1.00:5TW1A1ToaXg", "transcriptLanguages": {"en": "English"}, "autoAdvance": false, "autoplay": false, "start": 0.0, "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02g/handler/transcript/translation/__lang__", "poster": null, "autohideHtml5": false, "generalSpeed": 1.0, "speed": null, "end": 0.0}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L02g"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@feature_representation_top_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Linear classifiers and the XOR dataset</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_top"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_top" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="html" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> Linear classifiers are easy to work with and analyze, but they are a very restricted class of hypotheses. If we have to make a complex distinction in low dimensions, then they are unhelpful. </p><p> Our favorite illustrative example is the “exclusive or" (<i class="sc">xor</i>) data set, <span options="" class="marginote"><span class="marginote_desc" style="display:none">D. Melanogaster is a species of fruit fly, used as a simple system in which to study genetics, since 1910.</span><span>the drosophila </span></span> of machine-learning data sets: </p><p><div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><p><img src="/assets/courseware/v1/7113a168fd1cb279b0a1548c7e16c08c/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_top_tikzpicture_1-crop.png" width="158"/></p></center></div></p><p> There is no linear separator for this two-dimensional dataset! But, we have a trick available: take a low-dimensional data set and move it, using a non-linear transformation into a higher-dimensional space, and look for a linear separator there. Let's look at an example data set that starts in 1-D: </p><p><div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><p><img src="/assets/courseware/v1/083f20fe82de4d0b45c86e695e4d142c/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_top_tikzpicture_2-crop.png" width="410"/></p></center></div></p><p> These points are not linearly separable, <span options="" class="marginote"><span class="marginote_desc" style="display:none">What's a linear separator for data in 1D? A point!</span><span>note</span></span> but consider the transformation [mathjaxinline]\phi (x) = [x,x^2][/mathjaxinline]. Putting the data in [mathjaxinline]\phi[/mathjaxinline] space, we see that it is now separable. There are lots of possible separators; we have just shown one of them here. </p><p><div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><p><img src="/assets/courseware/v1/bbc807b76c87fd03a606af741140257c/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_top_tikzpicture_3-crop.png" width="410"/></p></center></div></p><p> A linear separator in [mathjaxinline]\phi[/mathjaxinline] space is a nonlinear separator in the original space! Let's see how this plays out in our simple example. Consider the separator [mathjaxinline]x^2 - 1 = 0[/mathjaxinline], which labels the half-plane [mathjaxinline]x^2 -1 > 0[/mathjaxinline] as positive. What separator does it correspond to in the original 1-D space? We have to ask the question: which [mathjaxinline]x[/mathjaxinline] values have the property that [mathjaxinline]x^2 - 1 = 0[/mathjaxinline]. The answer is [mathjaxinline]+1[/mathjaxinline] and [mathjaxinline]-1[/mathjaxinline], so those two points constitute our separator, back in the original space. And we can use the same reasoning to find the region of 1D space that is labeled positive by this separator. </p><p><div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><p><img src="/assets/courseware/v1/2df7785d97c451a36a9ff9b7683f1e2a/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_top_tikzpicture_4-crop.png" width="412"/></p></center></div></p><p> This is a very general and widely useful strategy. It's the basis for <em>kernel methods</em>, a powerful technique that we unfortunately won't get to in this class, and can be seen as a motivation for multi-layer neural networks. </p><p> There are many different ways to construct [mathjaxinline]\phi[/mathjaxinline]. Some are relatively systematic and domain independent. We'll look at the <em>polynomial basis</em> in section <ref/> as an example of that. Others are directly related to the semantics (meaning) of the original features, and we construct them deliberately with our domain in mind. We'll explore that strategy in section <ref/>. </p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/b5ca509c17bab346cc6252ca41a1aac7/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Feature_representation.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Saturday November 16, 2019; 07:31:02 PM (revision f808f068e)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L02h_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Lecture: Feature representation - polynomial basis</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="video" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: Feature representation - polynomial basis</h3> <div id="video_MIT6036L02h" class="video closed" data-metadata='{"publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h/handler/publish_completion", "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h/handler/xmodule_handler/save_user_state", "duration": 0.0, "sources": [], "showCaptions": "true", "ytTestTimeout": 1500, "saveStateEnabled": false, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h/handler/transcript/available_translations", "ytMetadataEndpoint": "", "prioritizeHls": false, "completionPercentage": 0.95, "transcriptLanguage": "en", "recordedYoutubeIsAvailable": true, "savedVideoPosition": 0.0, "completionEnabled": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "lmsRootURL": "https://openlearninglibrary.mit.edu", "captionDataDir": null, "streams": "1.00:RwOx668Jt9E", "transcriptLanguages": {"en": "English"}, "autoAdvance": false, "autoplay": false, "start": 0.0, "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02h/handler/transcript/translation/__lang__", "poster": null, "autohideHtml5": false, "generalSpeed": 1.0, "speed": null, "end": 0.0}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L02h"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@feature_representation_polynomial_basis_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Polynomial basis</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_polynomial_basis"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_polynomial_basis" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="html" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> If the features in your problem are already naturally numerical, one systematic strategy for constructing a new feature space is to use a <em>polynomial basis</em>. The idea is that, if you are using the [mathjaxinline]k[/mathjaxinline]th-order basis (where [mathjaxinline]k[/mathjaxinline] is a positive integer), you include a feature for every possible product of [mathjaxinline]k[/mathjaxinline] different dimensions in your original input. </p><p> Here is a table illustrating the [mathjaxinline]k[/mathjaxinline]th order polynomial basis for different values of [mathjaxinline]k[/mathjaxinline]. </p><table cellspacing="0" class="tabular" style="table-layout:auto"><tr><td style="text-align:center; border:none"> Order </td><td style="text-align:center; border:none"> [mathjaxinline]d=1[/mathjaxinline] </td><td style="text-align:center; border:none"> in general </td></tr><tr><td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:center; border:none"> 0 </td><td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:center; border:none"> [mathjaxinline][1][/mathjaxinline] </td><td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:center; border:none"> [mathjaxinline][1][/mathjaxinline]</td></tr><tr><td style="text-align:center; border:none"> 1 </td><td style="text-align:center; border:none"> [mathjaxinline][1,x][/mathjaxinline] </td><td style="text-align:center; border:none"> [mathjaxinline][1,x_1, \ldots , x_ d][/mathjaxinline]</td></tr><tr><td style="text-align:center; border:none"> 2 </td><td style="text-align:center; border:none"> [mathjaxinline][1,x,x^2][/mathjaxinline] </td><td style="text-align:center; border:none"> [mathjaxinline][1,x_1, \ldots , x_ d, x_1^2, x_1x_2, \ldots ][/mathjaxinline]</td></tr><tr><td style="text-align:center; border:none"> 3 </td><td style="text-align:center; border:none"> [mathjaxinline][1,x,x^2,x^3][/mathjaxinline] </td><td style="text-align:center; border:none"> [mathjaxinline][1,x_1, \ldots , x_1^2, x_1x_2, \ldots , x_1x_2x_3, \ldots ][/mathjaxinline] </td></tr><tr><td style="text-align:center; border:none"> </td><td style="text-align:center; border:none"> </td><td style="text-align:center; border:none"> </td></tr></table><p> So, what if we try to solve the <i class="sc">xor</i> problem using a polynomial basis as the feature transformation? We can just take our two-dimensional data and transform it into a higher-dimensional data set, by applying [mathjaxinline]\phi[/mathjaxinline]. Now, we have a classification problem as usual, and we can use the perceptron algorithm to solve it. </p><p> Let's try it for [mathjaxinline]k = 2[/mathjaxinline] on our <i class="sc">xor</i> problem. The feature transformation is </p><table id="a0000000002" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]\phi ((x_1, x_2)) = (1, x_1, x_2, x_1^2, x_1 x_2, x_2^2)\; \; .[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table><p> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> If we use perceptron to train a classifier after performing this feature transformation, would we lose any expressive power if we let [mathjaxinline]\theta _0 = 0[/mathjaxinline] (i.e. trained without offset instead of with offset)?</span> <br/>After 4 iterations, perceptron finds a separator with coefficients [mathjaxinline]\theta = (0, 0, 0, 0, 4, 0)[/mathjaxinline] and [mathjaxinline]\theta _0 = 0[/mathjaxinline]. This corresponds to </p><table id="a0000000003" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]0 + 0 x_1 + 0 x_2 + 0 x_1^2 + 4 x_1 x_2 + 0x_2^2 + 0 = 0[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table><p> and is plotted below, with the gray shaded region classified as negative and the white region classified as positive: <div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><img src="/assets/courseware/v1/4288137b7bd7f6b7a431b5b6c9f90b85/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_1.png" width="400" style="scale:0.3"/></center></div> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> Be sure you understand why this high-dimensional hyperplane is a separator, and how it corresponds to the figure.</span> <br/></p><p> For fun, we show some more plots below. Here is the result of running perceptron on <i class="sc">xor</i>, but where the data are put in a different place on the plane. After 65 mistakes (!) it arrives at these coefficients: [mathjaxinline]\theta = ( 1, -1, -1, -5, 11, -5)[/mathjaxinline], [mathjaxinline]\theta _0 = 1[/mathjaxinline], which <span options="" class="marginote"><span class="marginote_desc" style="display:none">The jaggedness in the plotting of the separator is an artifact of a lazy lpk strategy for making these plots–the true curves are smooth. </span><span>generates this separator: </span></span> <div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><img src="/assets/courseware/v1/4b61c604452f9f0d62c39ec28345ce8e/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_2.png" width="400" style="scale:0.3"/></center></div> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF">It takes many more iterations to solve this version. Apply knowledge of the convergence properties of the perceptron to understand why.</span> <br/></p><p> Here is a harder data set. After 200 iterations, we could not separate it with a second or third-order basis representation. Shown below are the results after 200 iterations for bases of order 2, 3, 4, and 5. <div style="border-radius:10px;padding:5px;border-style:solid;background-color:rgba(0,255,0,0.03);" class="examplebox"><center><img src="/assets/courseware/v1/5ecc31b1e67647cd012007c78c303be7/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_3.png" width="400"/><img src="/assets/courseware/v1/2fa58ddc29ad247b6937521bd7d0cce1/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_4.png" width="400"/><br/><img src="/assets/courseware/v1/722b0abd100d03c738617979ddce8a62/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_5.png" width="400"/><img src="/assets/courseware/v1/ee0ef00782fd3da6f009fee4baf10dd5/asset-v1:MITx+6.036+1T2019+type@asset+block/images_feature_representation_6.png" width="400"/></center></div> </p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/b5ca509c17bab346cc6252ca41a1aac7/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Feature_representation.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:24 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L02j_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Lecture: Example of perceptron algorithm with polynomial basis transformations</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="video" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: Example of perceptron algorithm with polynomial basis transformations</h3> <div id="video_MIT6036L02j" class="video closed" data-metadata='{"publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j/handler/publish_completion", "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j/handler/xmodule_handler/save_user_state", "duration": 0.0, "sources": [], "showCaptions": "true", "ytTestTimeout": 1500, "saveStateEnabled": false, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j/handler/transcript/available_translations", "ytMetadataEndpoint": "", "prioritizeHls": false, "completionPercentage": 0.95, "transcriptLanguage": "en", "recordedYoutubeIsAvailable": true, "savedVideoPosition": 0.0, "completionEnabled": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "lmsRootURL": "https://openlearninglibrary.mit.edu", "captionDataDir": null, "streams": "1.00:KXJ9sUsKXP4", "transcriptLanguages": {"en": "English"}, "autoAdvance": false, "autoplay": false, "start": 0.0, "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02j/handler/transcript/translation/__lang__", "poster": null, "autohideHtml5": false, "generalSpeed": 1.0, "speed": null, "end": 0.0}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L02j"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@MIT6036L02k_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Lecture: Feature representation strategies for dealing with varied data</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k"> <div class="xblock xblock-public_view xblock-public_view-video xmodule_display xmodule_VideoBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="video" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "Video"} </script> <h3 class="hd hd-2">Lecture: Feature representation strategies for dealing with varied data</h3> <div id="video_MIT6036L02k" class="video closed" data-metadata='{"publishCompletionUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k/handler/publish_completion", "saveStateUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k/handler/xmodule_handler/save_user_state", "duration": 0.0, "sources": [], "showCaptions": "true", "ytTestTimeout": 1500, "saveStateEnabled": false, "transcriptAvailableTranslationsUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k/handler/transcript/available_translations", "ytMetadataEndpoint": "", "prioritizeHls": false, "completionPercentage": 0.95, "transcriptLanguage": "en", "recordedYoutubeIsAvailable": true, "savedVideoPosition": 0.0, "completionEnabled": false, "ytApiUrl": "https://www.youtube.com/iframe_api", "lmsRootURL": "https://openlearninglibrary.mit.edu", "captionDataDir": null, "streams": "1.00:kC6mFgyeAtQ", "transcriptLanguages": {"en": "English"}, "autoAdvance": false, "autoplay": false, "start": 0.0, "transcriptTranslationUrl": "/courses/course-v1:MITx+6.036+1T2019/xblock/block-v1:MITx+6.036+1T2019+type@video+block@MIT6036L02k/handler/transcript/translation/__lang__", "poster": null, "autohideHtml5": false, "generalSpeed": 1.0, "speed": null, "end": 0.0}' data-bumper-metadata='null' data-autoadvance-enabled="False" data-poster='null' tabindex="-1" > <div class="focus_grabber first"></div> <div class="tc-wrapper"> <div class="video-wrapper"> <span tabindex="0" class="spinner" aria-hidden="false" aria-label="Loading video player"></span> <span tabindex="-1" class="btn-play fa fa-youtube-play fa-2x is-hidden" aria-hidden="true" aria-label="Play video"></span> <div class="video-player-pre"></div> <div class="video-player"> <div id="MIT6036L02k"></div> <h4 class="hd hd-4 video-error is-hidden">No playable video sources found.</h4> <h4 class="hd hd-4 video-hls-error is-hidden"> Your browser does not support this video format. Try using a different browser. </h4> </div> <div class="video-player-post"></div> <div class="closed-captions"></div> <div class="video-controls is-hidden"> <div> <div class="vcr"><div class="vidtime">0:00 / 0:00</div></div> <div class="secondary-controls"></div> </div> </div> </div> </div> <div class="focus_grabber last"></div> </div> </div> </div> </div> </div>

<div class="xblock xblock-public_view xblock-public_view-vertical" data-init="VerticalStudentView" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@vertical+block@feature_representation_hand-constructing_features_for_real_domains_vert" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="vertical" data-runtime-class="LmsRuntime" data-has-score="False"> <h2 class="hd hd-2 unit-title">Hand-constructing features for real domains</h2> <div class="vert-mod"> <div class="vert vert-0" data-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_hand-constructing_features_for_real_domains"> <div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-init="XBlockToXModuleShim" data-graded="False" data-runtime-version="1" data-usage-id="block-v1:MITx+6.036+1T2019+type@html+block@feature_representation_hand-constructing_features_for_real_domains" data-course-id="course-v1:MITx+6.036+1T2019" data-request-token="ddac9d905be511f08acd0ec1f404a635" data-block-type="html" data-runtime-class="LmsRuntime" data-has-score="False"> <script type="json/xblock-args" class="xblock-json-init-args"> {"xmodule-type": "HTMLModule"} </script> <p> In many machine-learning applications, we are given descriptions of the inputs with many different types of attributes, including numbers, words, and discrete features. An important factor in the success of an ML application is the way that the features are chosen to be encoded by the human who is framing the learning problem. </p><p><h3>Discrete features</h3> Getting a good encoding of discrete features is particularly important. You want to create “opportunities" for the ML system to find the underlying regularities. Although there are machine-learning methods that have special mechanisms for handling discrete inputs, all the methods we consider in this class will assume the input vectors [mathjaxinline]x[/mathjaxinline] are in [mathjaxinline]\mathbb {R}^ d[/mathjaxinline]. So, we have to figure out some reasonable strategies for turning discrete values into (vectors of) real numbers. </p><p> We'll start by listing some encoding strategies, and then work through some examples. Let's assume we have some feature in our raw data that can take on one of [mathjaxinline]k[/mathjaxinline] discrete values. </p><ul class="itemize"><li><p><b class="bf">Numeric</b> Assign each of these values a number, say [mathjaxinline]1.0/k, 2.0/k, \ldots , 1.0[/mathjaxinline]. We might want to then do some further processing, as described in section <ref/>. This is a sensible strategy <em>only</em> when the discrete values really do signify some sort of numeric quantity, so that these numerical values are meaningful. </p></li><li><p><b class="bf">Thermometer code</b> If your discrete values have a natural ordering, from [mathjaxinline]1, \ldots , k[/mathjaxinline], but not a natural mapping into real numbers, a good strategy is to use a vector of length [mathjaxinline]k[/mathjaxinline] binary variables, where we convert discrete input value [mathjaxinline]0 < j \leq k[/mathjaxinline] into a vector in which the first [mathjaxinline]j[/mathjaxinline] values are [mathjaxinline]1.0[/mathjaxinline] and the rest are [mathjaxinline]0.0[/mathjaxinline]. This does not necessarily imply anything about the spacing or numerical quantities of the inputs, but does convey something about ordering. </p></li><li><p><b class="bf">Factored code</b> If your discrete values can sensibly be decomposed into two parts (say the “make" and “model" of a car), then it's best to treat those as two separate features, and choose an appropriate encoding of each one from this list. </p></li><li><p><b class="bf">One-hot code</b> If there is no obvious numeric, ordering, or factorial structure, then the best strategy is to use a vector of length [mathjaxinline]k[/mathjaxinline], where we convert discrete input value [mathjaxinline]0 < j \leq k[/mathjaxinline] into a vector in which all values are [mathjaxinline]0.0[/mathjaxinline], except for the [mathjaxinline]j[/mathjaxinline]th, which is [mathjaxinline]1.0[/mathjaxinline]. </p></li><li><p><b class="bf">Binary code</b> It might be tempting for the computer scientists among us to use some binary code, which would let us represent [mathjaxinline]k[/mathjaxinline] values using a vector of length [mathjaxinline]\log k[/mathjaxinline]. <em>This is a bad idea!</em> Decoding a binary code takes a lot of work, and by encoding your inputs this way, you'd be forcing your system to <em>learn</em> the decoding algorithm. </p></li></ul><p> As an example, imagine that we want to encode blood types, which are drawn from the set [mathjaxinline]\{ A+, A-, B+, B-, AB+, AB-, O+, O-\}[/mathjaxinline]. There is no obvious linear numeric scaling or even ordering to this set. But there is a reasonable <em>factoring</em>, into two features: [mathjaxinline]\{ A, B, AB, O\}[/mathjaxinline] and [mathjaxinline]\{ +, -1\}[/mathjaxinline]. And, in fact, we can reasonably factor the first group into [mathjaxinline]\{ A, {\rm not}A\}[/mathjaxinline], [mathjaxinline]\{ B, {\rm not}B\}[/mathjaxinline] <span options="" class="marginote"><span class="marginote_desc" style="display:none">It is sensible (according to Wikipedia!) to treat [mathjaxinline]O[/mathjaxinline] as having neither feature [mathjaxinline]A[/mathjaxinline] nor feature [mathjaxinline]B[/mathjaxinline].</span><span>note</span></span> So, here are two plausible encodings of the whole set: </p><ul class="itemize"><li><p> Use a 6-D vector, with two dimensions to encode each of the factors using a one-hot encoding. </p></li><li><p> Use a 3-D vector, with one dimension for each factor, encoding its presence as [mathjaxinline]1.0[/mathjaxinline] and absence as [mathjaxinline]-1.0[/mathjaxinline] (this is sometimes better than [mathjaxinline]0.0[/mathjaxinline]). In this case, [mathjaxinline]AB+[/mathjaxinline] would be [mathjaxinline](1.0, 1.0, 1.0)[/mathjaxinline] and [mathjaxinline]O-[/mathjaxinline] would be [mathjaxinline](-1.0, -1.0, -1.0)[/mathjaxinline]. </p></li></ul><p> <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF">How would you encode [mathjaxinline]A+[/mathjaxinline] in both of these approaches?</span> <br/></p><p><h3>Text</h3> The problem of taking a text (such as a tweet or a product review, or even this document!) and encoding it as an input for a machine-learning algorithm is interesting and complicated. Much later in the class, we'll study sequential input models, where, rather than having to encode a text as a fixed-length feature vector, we feed it into a hypothesis word by word (or even character by character!). </p><p> There are some simpler encodings that work well for basic applications. One of them is the <em>bag of words</em> (<i class="sc">bow</i>) model. The idea is to let [mathjaxinline]d[/mathjaxinline] be the number of words in our vocabulary (either computed from the training set or some other body of text or dictionary). We will then make a binary vector (with values [mathjaxinline]1.0[/mathjaxinline] and [mathjaxinline]0.0[/mathjaxinline]) of length [mathjaxinline]d[/mathjaxinline], where element [mathjaxinline]j[/mathjaxinline] has value [mathjaxinline]1.0[/mathjaxinline] if word [mathjaxinline]j[/mathjaxinline] occurs in the document, and [mathjaxinline]0.0[/mathjaxinline] otherwise. </p><p><h3>Numeric values</h3></p><p> If some feature is already encoded as a numeric value (heart rate, stock price, distance, etc.) then you should generally keep it as a numeric value. An exception might be a situation in which you know there are natural “breakpoints" in the semantics: for example, encoding someone's age in the US, you might make an explicit distinction between under and over 18 (or 21), depending on what kind of thing you are trying to predict. It might make sense to divide into discrete bins (possibly spacing them closer together for the very young) and to use a one-hot encoding for some sorts of medical situations in which we don't expect a linear (or even monotonic) relationship between age and some physiological features. </p><p> If you choose to leave a feature as numeric, it is typically useful to <em>scale</em> it, so that it tends to be in the range [mathjaxinline][-1, +1][/mathjaxinline]. Without performing this transformation, if you have one feature with much larger values than another, it will take the learning algorithm a lot of work to find parameters that can put them on an equal basis. So, we might perform transformation [mathjaxinline]\phi (x) = \dfrac {x - \overline{x}}{\sigma }[/mathjaxinline], where [mathjaxinline]\overline{x}[/mathjaxinline] is the average of the [mathjaxinline]x^{(i)}[/mathjaxinline], and [mathjaxinline]\sigma[/mathjaxinline] is the standard deviation of the [mathjaxinline]x^{(i)}[/mathjaxinline]. The resulting feature values will have mean [mathjaxinline]0[/mathjaxinline] and standard deviation [mathjaxinline]1[/mathjaxinline]. This transformation is sometimes called <em>standardizing</em> <span options="" class="marginote"><span class="marginote_desc" style="display:none">Such standard variables are often known as “z-scores," for example, in the social sciences.</span><span>a variable </span></span> . </p><p> Then, of course, you might apply a higher-order polynomial-basis transformation to one or more groups of numeric features. <br/> <br/><span style="color:#FF0000"><b class="bf">Study Question:</b></span> <span style="color:#0000FF"> Percy Eptron has a domain with 4 numeric input features, [mathjaxinline](x_1, \ldots , x_4)[/mathjaxinline]. He decides to use a representation of the form <table id="a0000000004" class="equation" width="100%" cellspacing="0" cellpadding="7" style="table-layout:auto"><tr><td class="equation" style="width:80%; border:none">[mathjax]\phi (x) = {\rm PolyBasis}((x_1, x_2), 3) ^\frown {\rm PolyBasis}((x_3, x_4), 3)[/mathjax]</td><td class="eqnnum" style="width:20%; border:none"> </td></tr></table> where [mathjaxinline]a^\frown b[/mathjaxinline] means the vector [mathjaxinline]a[/mathjaxinline] concatenated with the vector [mathjaxinline]b[/mathjaxinline]. What is the dimension of Percy's representation? Under what assumptions about the original features is this a reasonable choice? </span> <br/></p><p> <br/></p><p> <br/></p><p><a href="/assets/courseware/v1/b5ca509c17bab346cc6252ca41a1aac7/asset-v1:MITx+6.036+1T2019+type@asset+block/notes_chapter_Feature_representation.pdf" target="_blank">Download this chapter as a PDF file</a></p><script src="/assets/courseware/v1/1ab2c06aefab58693cfc9c10394b7503/asset-v1:MITx+6.036+1T2019+type@asset+block/marginotes.js" type="text/javascript"/><span><br/><span style="color:gray;font-size:10pt"><center>This page was last updated on Friday May 24, 2019; 02:28:24 PM (revision 4f166135)</center></span></span> </div> </div> </div> </div>