0% found this document useful (0 votes)
57 views31 pages

Screenshots Semana 1

Uploaded by

Caio Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
57 views31 pages

Screenshots Semana 1

Uploaded by

Caio Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 31
Machine Learning Honor Code We strongly encourage students to form study groups, and discuss the lecture videos (including In-video questions). We also encourage you to get together with friends to watch the videos together as a group. However, the answers that you submit for the review questions should be your own work. For the programming exercises, you are welcome to discuss them with other students, discuss specific algorithms, properties of algorithms, etc.; we ask only that you not look at any source code written by a different student, nor show your solution code to other students. Guidelines for Posting Code in Discussion Forums Scenario 1: Code to delete Learner Question/Comment: "Here is the code t have so far, but it falls the grader. Please help me fit. Why Delete?: The reason is that ifthere is a simple fix provided by a student, a quick copy and paste with a small edit wil provide credit without individual effort. Learner Question: A student substitutes words for the math operators, but includes the variable names (or substitutes the equivalent greek letters (8 for theta’, etc), This student also provides a sentence-by-sentence, line by line, description of exactly what their code implements. "The first ine of my script has the equation “hypothesis equals theta times X, but | get the Following error message... ‘Why Delete?: This should be deleted. "Spelling out” the code in English 's the same as using the regular code. Scenario 2: Code not to delete Learner Question: How do | subset a matrix to eliminate the intercept? Mentor Response: This probably would be okay, especially the person posting makes an effort to not use familiar variable ames, or to use a context which has nothing to do with the contexts in the assignments, Itis clearly ok to show examples of Octave code to demonstrate a technique. Even ifthe technique itself is directly applicable to a programming problem at hand. As long as what is typed cannot be “cut and pasted” into the program at hand, Eg. how do I set column 1 of a matrix to zero? Try this in your Octave work area: >> A= magic(2) D> AGT) =0 ‘The above is always acceptable (in my understanding). Demonstrating techniques and learning the language/syntax are Important Forum activities. “A computer program is said to /Jearn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? = © Classifying emails as spam or not spam. © Watching you label emails as spam or not spam. © The number (or fraction) of emails correctly classified as spam/not spam. O None of the above—this is not a machine learning problem. What is Machine Learning? What is Machine Learning? ‘Two definitions of Machine Learning are offered, Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition, ‘Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience € with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." Example: playing checkers. the experience of playing many games of checkers. the task of playing checkers. =the probability that the program will win the next game. In general, any machine learning problem can be assigned to one of two broad classifications: ‘Supervised learning and Unsupervised learning, How to Use Discussion Forums Upvoting Posts ‘When you enter the dccussion frum for your course you wll ss an Upvot button under each post Ne encourage you to Upvote ports you fin thoughtful interesting orheipfl, hiss the best way to ensure that quality posts wil be een by other learners inthe cours, Upvoting wills increase thelikelincod that mpartant questions get adress and answer, Report Abuse Cours Case of Conduct rains: + 2ulyingortrestening ater users + Posting spam or promotional content + Posting mature content + Posting assignment solutions (or other Wolatons of the Herr Code) Pease report any posts that nringe upon copyright or are abusive offensive, or that otherwise violate Coursea’s Honor Code by using the Repor ths option found under the menu arco to the right of exch ost Following you finda particular thread intresting, clk the Follow butan under the erignal posta that thread page. When you foo a ost you wlrecelve an emall nosficaton anytime anew poss made Improving Your Posts Course discussion forums are your chance ta intract with thousands fike-mindad individual around the werd Gating their sttenton e one way o 60 wll inthis course. In any socal interaction certain rus of eiquate are expecta and cantibute to ‘more enjoyable ane proauctivecommunation The folowing are Ups fr iterating in Os course via the forums. adapted from _videines original compiled by AHA! and Chug Von Rospach & Gane Spafford 1. Stay on topic in exiting forums and treads, Offtopic posts make it hard fr other earnest findinformation they need Postin the most appropriate frum for your top, and donot pest the same thingin multe forums. Use the ers atthe top of the forum page (Latest, Top, and Unanswered to find active, interesting content LUpvote posts tha are helpful and interesting. ‘8e cil. you asagree explain your postion wth respect and retrain from any anal persona tacks, Stay on topic n particular, dont change the subject inthe mic of an existing thread - just stat anew topic ame ‘Make sure youre understood, even by nonnative English speakers. Try to tefl sentences and avoid text-message aboreviatons or slang. Be careful when you use humor and sarcasm as these messages ae eazyto msinterret, asking a question, provide as much information as posse, what youve already considered, what youve aready read, 8. Cte appropriate references when using someone elses ideas, thoughts or words 9. Dorotuse forum to promote your produc. service or busines 10. Conclude post by ining ater lasers to extend tha discussion. For example, you culd say" would love to understand vwhatotner thnk 11, Do not post personal information about other posters in the forom, 12, Report spammers For more deta, rter to Courseras Farum Code of Conduct. “These tis ad tools or interacting inthis course ia the forums were adapted from guidelines originally by The Universi of inci. You're running a company, and you want to develop learning algorithms to address each of two problems. 7 Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You'd like software to examine individual customer accounts, and for each_ account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems? O Treat both as classification problems. O Treat problem 1s a classification problem, problem 2 as a regression problem. —> O Treat problem 1 as a regression problem, problem 2 as a classification problem. O Treat both as regression problems. Supervised Learning Supervised Learning In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. Supervised learning problems are categorized into "regression" and “classification” problems. In a regression problem, we are tying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. Ina classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. Example 1: Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous ‘output, so this isa regression problem. ‘We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price.” Here we are classifying the houses based on price into two discrete categorres. Example 2: (a) Regression - Given a picture of a person, we have to predict their age on the basis ofthe given picture {(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign. Cocktail party problem algorithm [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x'); [Source: Sam Roweis, Yair Weiss & Eero Simoncelli) Unsupervised Learning Unsupervised Learning Unsupervised learning allows us to approach problems with lttle or no idea what our results should look like, We can derive structure from data where we don't necessarily know the effect of the variables. \We can derive this structure by clustering the data based on relationships among the variables in the data, ‘with unsupervised learning there is no feedback based on the prediction results. Exampl Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on. Non-clustering: The "Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (identifying individual voices and music from a mesh of sounds at a cocktail party), Model Representation Training set of _ Size in feet?(x) | Price (S) in 1000's (y) housing prices 2104 460 (Portland, OR) 1416 232 1534 315 852 178 Notation: —»>m = Number of training examples x’s = “input” variable / features y's = “output” variable / “target” variable Andrew Ne! Model Representation To establish notation for future use, we'll use sc") to denote the “input” variables (living area in this example), also called input features, and y\) to denote the “output” or target variable that we are trying to predic (price). A par (c!"), yl) is called a training example, an the dataset that well be using to learn—a list of m training examples (1), yi = 1, ..,m—is called 2 training set. Note that the superscript in the notation is simply an index int the taining set, and has nothing todo with exponentiation. We wil also use Xo denote the space of input values, and Y to denote the space of output values. inthis example, To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h:X + 50 that hix) is a "good" predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. ‘Seen pictorial, the process is therefore like this: When the target variable that we're trying to predict is continuous, such as in our housing example, we call the learning problem 2 regression problem. When y can take on only a small number of discrete values (such 2s if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call ita classification problem. Cost Function We can measure the accuracy of aur hypathesis function by using a cost funetion. This takes an average difference actually a fancier version of an average) of all te results of the hypothesis with inputs from »’s and the actual output ys. 0,94) = Ss —w)? = Uhl) —w)* Tobreakit apar, itis} # where # isthe mean ofthe squares of hg(1) — y, or the difference between the predicted value and the actual value. ‘This function is otherwise called the "Squared error function’ or "Mean squared error’. The mean is halved (3) asa convenience for the computation ofthe gradient descent, as the derivative term ofthe square function will cancel out the 4 term, The following image summarizes what the cost function does: = Brrwining Crone “sy Fes] « ay . sy y a oC 62,6. hel) = 6.404 ues 5 OE) Trop) 2d, 2. Uo ST Idea: Choose 9,41 so that a ho(=x)is ‘close to y for our Naene 3(6,8) training examples(. y) * Con bawion uy Squire) exer notion Hypothesis: ho(x) = 0 + Ax Parameters: 90,91 Cost Function: T(0.01) = Be > (hola) = y)? i=l Goal: minimize J (80,91) 01 Cost Function - Intuition | Ife ty to think of icin visual terms, aur training data cet scattered on the xy plane. We are trying to make a straight ine (Gefined by ho( 22) which passes through these scattered data point. Our objective isto get the best possible line. The best possible line willbe such so thatthe average squared vertical distances of the scattered points from the line willbe the leat. Ideally, the line should pass through all the points of our training dataset. In such a case, the value of (4,6) willbe 0, The following example shows the ideal situation where we have a cost function of 0 hte ort tn ; Bye) ee ° nos? EEGP ST Hloeder THe When @i = we get a slope of 1 which goes through every single data point in our model Conversely, when 0 = 0.5, we see the vertical distance from aur fitte the data points increase, hale) 0) rth nes het tt (hecho teem) Kat) x rr) 08005 145 2 as oda da hesin scot fssf) S 65). Sr . ste)? ‘This increases our cost function 0 056, Plotting several other points yields tothe following graph Fr) of the parameter #3) Zo @@ 9 OQ ras ‘Thus as a goa, we should try ta minimize the cost function In this case, — Lis our glabal minimum, Cost Function - Intuition !! ‘contour plots a graph that containe many cantour lines. A contour lin fata variable function has a constant value al Points ofthe same ine. n example of such graph is the one to the right below. hol) F(80,1) sm (for fixed Mo, (1, this is a function of x) (function of the parameters 4,1). iD hoo iz 3 = Taking any color and gong along te ‘crcl one would expect to get the same value ofthe cost uncon, Fr example, the three green points found on the green line above have the same value for J(u, 0h) and ae result they are fund along the same line. The cred x dspays the value ofthe ost uneton forthe graph onthe lft when fy = 800 and i= 9.15. Taking another i) and platings eantour plat one gets the following graphs: ho(w) J(60,61) (for fined %p, 1, this is a function of x) (function of the parameters 4,4) i” yar eet O*% oa ir cai 28 eo When h = 360.and@: = 0, the value of J(u, nthe contour pla gets closer tothe center thus reducing the cost functon error Now ging our hypothesis function a Slightly postive slope resus ina beter ft ofthe daa J (80,91) (funevon ofthe parameters 1) eames ‘Sede’ ‘The graph above minimizes the cost function as much as possibie and consequent the esut of andy tend tobe around 0.12 and 250 respectively. Ploting those values on our graph tothe right seems to put our pontin the center ofthe inner mast ‘irde’ Gradient descent algorithm repeat until convergence { O 0; := 0; —az-J(%,61) (for 7 =0 and j = 1) 00; } Correct: Simultaneous update temp0 := 09 — “365 9_ J(, 0;) temp] := 0; — azg- 3, J (80; 0) A := temp0 6, := temp] Devemos realizar isto repetidamente até a convergéncia, Gradient Descent ‘owe have our hypothesis function and we have away of measuring hw welt ftsinta the data, Now we need to estimate the parameter inthe hypothessfuncuon, That's where gradient scent comes in Imagine that we graph our hypothesis function based onits fils By and (actualy we ae graphing the cos function 352 function ofthe parameter estates) We ae net graphing x and y self, bu the parameter range of aur hypothes's funtion and the costresulng from selecting a particular setof parameters ‘We put oon the xaxls and onthe yas, wth the cos function on the vertical 2 aes, The paints on our graph wl be the resultof te cost function using our hypothesis with thse speci theta parameter. The graph below depets such a setup. ‘We wll know that we have succeeded when our cos function is atthe very bottom ofthe psn our graph, Le whan Is value Is the minimum, The red arrows show the minimum point inthe graph ‘The aye do this sy taking the deratve che tangential linet a function of our cost function. The slope ofthe angen is the derative at that pont and itil give usa cect to move towards. We make steps down the cost functionin the direction with the steepest descent Thesize of each sten is determined by the parameter a which calle the learning rate Foresample, the distance between each stan the graph above represents a sep determined by our parameter @ Asmalera ‘ould esutinasmater step ard aarger rete na lrger step, The tection n hich the sep taken is determine by the fara erate of (0,0) Depending on here estar onthe raph, one oul end up a ferent pois The image Sbove shows sts ferent trtng points that end up nto erent places “The gradient cescent algo repeat ut convergence 8) = 8 ache JK6,8) where |=0,1 represents te feature index number. A each tration on should simitaneously update the parameters 82, es Updating a specif parameter prior to Caleiating another one onthe ji") taration would yield toa wrong implementation Correct: Simultaneous update A veinp0 = 8 ~ a8 J(6001) 1 := 0; — age J1O,01) “emp0 tempt on Gradient descent can converge to a local minimum, even with the learning rate a fixed. \ As weapproachalocal —=_J/(41) minimum, gradient descent will automatically take smaller steps. So, no a need to decrease a over por defini¢ao, o minimo local é quando a derivada é igual a zero. Matrices and Vectors Marries ore 2cmensioalerays: [a b at © fg h iy k I The above matric as four ows and tree columns, soitis 3 4x3 matrix ‘vectors aratric with one column and many rows [ws So vectors area subset of matrices. The above vectors a 4x 1 matrix, Notation and terms: ‘+ Aly refers tothe element in the ith row and th colurnn of matrix A. + Avector with’n’ rows is referred to as an'-dimensional vector. 4+ refers to the element in the ith row of the vector. + In general all our vectors and matrices willbe indexed, Note that for some programming languages, the arrays are O- indexed. + Matrices are usually denoted by uppercase names while vectors are lowercase. + "Scalar" means that an object is a single value, nota vector or matric + Refers tothe set of scalar real numbers. +R refers tothe set of n-dimensional vectors of real numbers. Run the cell below to get familiar with the commands in Octave/Matlab, Feel free to create matrices and vactars and try out ifferent things. wal: 2a pas, Gs Fw te, a, Bove tna) 5 recut a, Redefine Gradient Descent For Linear Regression Note: [A 6:15 hi) = 900- 0.1 shouldbe *his)= 900-01 When specifically applied to the case of linear regression, anew form of the gradient descent equation can be derived. We can substitute our actual cost function and our actual hypothesis function and modify the equation to repeat until convergence: { = aL SUhalen)—w) AS rate) wed) here mis the size of the training se, 8p a constant that willbe changing simultaneously with 6 andr, ware values ofthe given training et (data. Note that we have separated out the tv cases for ino separate equations fr By and; and that for we are multiplying 2, atthe end due tothe derivative. The followings a derivation of (0) fora single example: lO = Beale) —vP 2-5 (ale) a) (lle) — 0) tse) 0) (401-1) (hale) v3, ‘The point of al this is that five start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate So, this is simply gradient descent on the original cast function. This method looks at every example in the entire training set on ‘every step, and is called batch gradient descent, Note that, while gradient descent can be susceptible to focal minim in general, the optimization prablem we have posed here for linear regression has only one glabal, and no ather local, optima; thus gradient descent always converges (assuming the learning rate ais not too large)to the glabal minimum. indeed, isa convex. {quadratic function, Here s an example of gradient descent as it is run to minimize a quadratic function ‘The ellipses shown above are the contours of a quadratic function. Also shown isthe trajectory taken by gradient descent, which \was initialized at (48,30). The x inthe figure oined by straight lines) mark the successive values ofthat gradient descent went through as it converged to its minimum. Matrices and Vectors Marries ore 2cmensioalerays: [a b at © fg h iy k I The above matric as four ows and tree columns, soitis 3 4x3 matrix ‘vectors aratric with one column and many rows [ws So vectors area subset of matrices. The above vectors a 4x 1 matrix, Notation and terms: ‘+ Aly refers tothe element in the ith row and th colurnn of matrix A. + Avector with’n’ rows is referred to as an'-dimensional vector. 4+ refers to the element in the ith row of the vector. + In general all our vectors and matrices willbe indexed, Note that for some programming languages, the arrays are O- indexed. + Matrices are usually denoted by uppercase names while vectors are lowercase. + "Scalar" means that an object is a single value, nota vector or matric + Refers tothe set of scalar real numbers. +R refers tothe set of n-dimensional vectors of real numbers. Run the cell below to get familiar with the commands in Octave/Matlab, Feel free to create matrices and vactars and try out ifferent things. wal: 2a pas, Gs Fw te, a, Bove tna) 5 recut a, Redefine ATE 23 a5, bs hee 928, 1, +a 2X0 snmrion of the mii kom = oe and 9 = le i bu Stes ta imy sisets) 7 AR A203) rector Redefine Addition and Scalar Multiplication Adaltion and subtraction are element-wis, So you simply ade or subtract each corresponding eleent: @ be dite ov 4 fetw brecty dtz] Subtracting Matrices; = be al -w bowen d To edd or subtract to matrices, thei dimensions must be the same In scalar mukiplication, we simply multiply every element by the salar value j@ be djer=laxz besces dea] Inscalar division, we simply divide every element by the scalar value: je be dj/z=[a/z W/ze/z dz] Experiment below with the Octave/Matlab commands for matrix addition and scalar multiplication, Feel free to try out different ‘commands. Try to write out your answers for each command before running the cell below. 33, 2] naa 1 Givas= Al's cn 20% uot happans Sf we have a Matrix + scalar? vada a's Redefine ata as 2 a sg euny asda ase 15 see how alenant-xise subtraction works Subse Ae 1 See how scalar mltiphication works macs AES dyke ks RRSSEBRGEERES en iowsunn seas = siete = eden Matrix-Vector Multiplication |W map the column of the vector onto each row of the matrix, multiplying each element and summing the result. [a be de fle[ay]=[avztbeycestdeyens fou] ‘The recut is 2 Veetor. The number of eolumns ofthe matrix must equal the number of FOWS ofthe vector. ‘An m xn matrix mulkiplied by an nx 1 vector results in anim x 1 vector. Below isan example of a matrix-vector multiplication. Make sure you understand how the multiplication warks. Fel free ory different matrivector mulkiplications AST ey 8 8% 8 oT ae” execute Redefine 2 AAT 28S 6:2, 8, OT sed 7 xmitigly at y recur g Redefine Matrix-Matrix Multiplication We multiply wo matrices by breaking Iino several vector multiplications an concatenatng the res fa be de fle lw ry z]=laswtbey arztbezcawtday cortdezcrwt fey cxr+fez] Am xn matrix multiplied by an nx matress in anim xa matrix Inthe above example, 32 matiktimes 22 matinresed ina 3x2 mati Ta multiply wo mates, the numberof olumns of heft atx must equal the numberof rows of the second matrix For example: Ae Ue 2 5s 61 Eincante a2 byt atric 9 resulting matrix of (3 by 2)*(2 by 1) = (3 by 2) Brecutar rule ad ane 2 Redefine X Initialise « 3 by 2 matrix 2025 3 a8, 6) Peres) Ee cect a cenuting matrix of (9 by 2402 by 8) = @ by 2) erecutr Matrix Multiplication Properties + Matrices are not commutative: AxB # Bed + Matrices are associative: (A*B)+C = Ae(B4C) The identity matrix, when multiplied by any matrix of the same dimensions, results in the original matrix. t's just ke multiplying numbers by 1. The identity matrix simply has 1's on the diagonal (upper left to lower right diagonal) and Os elsewhere [1 0 0 1 0 0 4 When multiplying the identity matrix after some matrix (As), the square identity matrx's dimension should match the other ‘matrixs columns. When multiplying the identity matrix before some other matrix A, the square identity matrix’s dimension should match the other matrix’: rows. 234,51 . B= faaje.21 ie a Tevet) . the above notation is the same as T = [1,0;8,4] BS te de equal to we? Executar 2 Redefine Bass = tue EIniishce « 2672 sentity metrin yore notation is the same es T= [1,050,4] Aeesecrecsesenvonsunn more that Ea = AC but AB I= a evecuta Redefine Inverse and Transpose ‘The inverse of s matric Ais denoted A+, Mulpying by the inverse ceslts in the identity matrix [Anon square matrix does not have an inverse matrix. We can compute inverses of matrices in octave with the pinv( A) function ‘and in Matlab withthe inv( A) function, Matrices that dan’ have an inverse are singular or degenerate. ‘The transposition of a matrix slike rotating the matrix $0* in clockwise direction and then reversingit. We can compute transposition of matrices in matlab with the transposelA) function or A fe te ae f] Palace fs In other words: Ag Amv = dow(a) 18% Uh rector A Sow Redefine fee) Linen S uhat Se an(-ayear A Snaa *snotayea eee Ama = Executar Redefine

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy