Machine Learning Flashcards
What is a feature vector?
Features chosen from the data for ‘a particular task at hand’.
What are 3 areas of ML?
Classification: Created named groups
Clustering: Simply create differentiation
Regression: Draw a slope/line for direction
First 2 look similar but technology is quite different.
What are some of the classifiers?
k-nearest neighbor: Buckets upto k items closest to each other in the space. (N-dimensional if N features)
Support vector machine (SVM): Draws an imaginary line to split points
Decision tree: e.g. Handwriting recognition by hollows and circles is amazing. (Boosted decision trees introduce new trees as data comes)
Logistic regression: When less features, just create x(Feature one) + y(Feature two), precision not very important.
What is Supervised and Unsupervised learning?
Teach computer how to do something vs let them figure out something out of data.
In supervised, right answers are given. Classification and Regression belong to this. In Unsupervised, try to make sense of data and Clustering belongs to this (e.g. Social network analysis, market segmentation, news clustering)
What is cost function in Linear Regression?
In Linear Regression we want to plot a straight line to depict the slope. Say y = Theta0 + Theta1(x)
Cost Function is a function that needs to be ‘minimized’ to best estimate Theta0 and Theta1 and plot the line.
Squared function is popular: 1/2m * Sum(square(estimate - actual)) … plot this for various Theta values and find minimum on curve.
Gradient descent algorithm for linear regression looks for next value of Thetas to reduce the cost function value and thereby reach to minimum spot.