AWS Machine Learning Foundations Course - Lesson 2 Flashcards
What does log loss seek to do
Calculate how uncertain your model is about the predictions it is generating - how likely a model thinks the predictions being generated are to be correct
how would you define Hyperparameter
Settings on the model which are not changed during training, but can affect how quickly or how reliably the model trains
What does a FFNN do
structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer
How would you define discrete
refers to the outcome taking on only a finite number of values like days of the week
How would you define data vectorization
Process that converts non-numeric data into a numerical format, so that it can be used by a machine learning model
How would you define clustering
Helps to determine if there are any naturally occurring groups in the data
How would you define accuracy
The fraction of predictions a model gets right
What does CNN represent
Nested filters over grid-organized data
What are CNNs most used for
Processing images
How would you define a continuous label
Does not have a discrete set of possible values, the label, in theory, could be anything
How would you define a categorical label
Has a discrete set of possible values
What does bag of words do
Counts how many times a word appears in a document (corpus) and then transforms that information into a dataset
How would you define bag of words
technique used to extract features from the text
How would you define loss function
measurement of how close the model is to its goal
What are some aspects that can ultimately be the largest factor that affects how well you can expect your model to perform
- Outliers
- Missing or incomplete values
- Data that needs to be transformed or preprocessed so its in the correct format to be used by the model
What is the fundamental question to ask for data collection
Does the data I have collected match the machine learning task and problem I have defined
What are the four aspects of working with data
- Data collection
- Data inspection
- Summary Statistics
- Data Visualization
If your data is categorical, what task would you be working with
Classification
If your data is numerical, what task would you be working with
Continuous
What is a deep learning model composed of
Collections of neurons connected together by weights
What is a tree based model
They learn to categorize or regress by building an extremely large structure of nested if/else blacks, splitting the world into different regions at each block
What does training determine in tree based models
Where splits happen and what value is assigned at each leaf region
What are linear models good for
Giving a baseline against which to compare more complex models
What is a linear model
The relationship between a set of input numbers and a set of output numbers through a linear function
What is the end-to-end training process
- Feed the training data into the model
- Compute the loss function on the results
- Update the model parameters in a direction that reduce loss
What type of algorithm would you use to segment your customers into multiple groups
If you don’t know how to define the groups, then you can use a cluttering algorithms (unsupervised learning) to segment your clusters of similar customers
If you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised learning) and it will classify all your customers into these groups
What type of machine learning algorithm would you use to allow a robot to walk in various unknown terrains?
reinforcement learning
Do you want the RMS (root mean square) to be high or low
Low
How would you define root mean square (RMS)
Roughly the average error across the test dataset, in general as the model improves, the better the RMS result will be
Would you frame the problem of spam detection as a supervised leaning problem or an unsupervised learning problem
Spam detection is a typical supervised learning problem: the algorithm is fed many emails along with their labels (spam not spam)
How would you define impute
Refers to different statistical tools which can be used to calculate missing values from the dataset
How would you define classification
The process of using machine learning to identify different cases based on patterns found in data (example: spam not spam)
What is a common cluster finding model
K - means