6: Machine Learning 1 Flashcards
What is machine learning?
- Understanding the world by learning from the data
- Not so much interested in cause and mechanisms
- Interested in classification and predictions.
What are some example use of machine learning?
- Spam detection
- Community detection
- Sorting news
- Text translation
- Face recognition
- Optical character recognition (OCR)
- Suggested content and ads
What is unsupervised learning?
Techniques where the machine is NOT given labels, or corresponding outputs.
- The machine will detect patterns from the data with no example to rely on.
The dataset containing the data to learn from is called an unlabelled dataset.
What is supervised learning?
Techniques where the machine is given inputs and corresponding outputs to learn from.
- The machine will try to adjust parameters to make the best prediction of the output when given an input.
Dataset containing inputs and corresponding outputs is called a labeled dataset.
What is reinforcement learning?
The machine learns through trial and errors.
- The method includes a feedback loop with rewards. While attempting trials, the machine tries to maximise the rewards.
What is a good algorithm?
An algorithm capable of making the correct prediction.
What is the goal of machine learning and how is this tested?
- Try to identify patterns and make predictions from data.
- Does not matter if input causes output, as long as the input is enough to predict the output.
- Algorithm is trained on a TRAINING DATASET.
- Then, the accuracy of the model can be tested with a TEST DATASET.
What fit of your model don’t you want?
You do not want a model that is fitted to the random variations of your data
What is underfitting and overfitting of a model?
Under: Not enough parameters to correctly predict Y (may be linear when it should not be).
Over: To many parameters to correctly predict Y (touches every data point) = small residual
What are the three components the machine learns with?
- A decision process: recipe of calculations/steps that takes in the data and returns a “guess” at the kind of pattern in the data the algorithm is looking to find.
- An error function: method of measuring how good the guess was by comparing to known examples (when available). How to quantify how bad possible miss was?
- An updating or optimisation process: algorithm looks at miss and updates how decision process comes to final decision so next time the miss won’t be as great.
What are some challenges of machine learning?
- Biases and discrimination. If data are biased, predictions are biased.
- Privacy issues: data can be memorised and exploited by machines.
- Legitimacy and accountability: responsibility in case of failure or unattended outcome?
What is a model within unsupervised learning?
K-means clustering
What is the intuition behind K-means clustering?
- Observations belonging to the same groups must share the same characteristics.
If we know we have K groups (clusters) in our dataset, we can try to group the observations so that the distance among observations:
- within a group is the smallest possible
between groups is the largest possible.
How do we estimate in K-means clustering?
- Choose number of clusters (K).
- Randomly pick k observations from the dataset: these are centers of each cluster (centroids).
- Each observation in the dataset is assigned to the closest centroid (calculated by Euclidian distance).
- Update centroids: new is the mean of the data points in each cluster.
- Redo step 3 & 4 until observations stop changing clusters, or when max nr of iterations is reached.
How do we assess accuracy, sensitivity and specificity?
By comparing a training and a test dataset. Typically done by splitting the original dataset in two random groups: 70% to train the model, 30% to test it
Create confusion matrix to calculate