AWS Machine Learning Foundations Course - Lesson 2 Flashcards
What does log loss seek to do
Calculate how uncertain your model is about the predictions it is generating - how likely a model thinks the predictions being generated are to be correct
how would you define Hyperparameter
Settings on the model which are not changed during training, but can affect how quickly or how reliably the model trains
What does a FFNN do
structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer
How would you define discrete
refers to the outcome taking on only a finite number of values like days of the week
How would you define data vectorization
Process that converts non-numeric data into a numerical format, so that it can be used by a machine learning model
How would you define clustering
Helps to determine if there are any naturally occurring groups in the data
How would you define accuracy
The fraction of predictions a model gets right
What does CNN represent
Nested filters over grid-organized data
What are CNNs most used for
Processing images
How would you define a continuous label
Does not have a discrete set of possible values, the label, in theory, could be anything
How would you define a categorical label
Has a discrete set of possible values
What does bag of words do
Counts how many times a word appears in a document (corpus) and then transforms that information into a dataset
How would you define bag of words
technique used to extract features from the text
How would you define loss function
measurement of how close the model is to its goal
What are some aspects that can ultimately be the largest factor that affects how well you can expect your model to perform
- Outliers
- Missing or incomplete values
- Data that needs to be transformed or preprocessed so its in the correct format to be used by the model
What is the fundamental question to ask for data collection
Does the data I have collected match the machine learning task and problem I have defined
What are the four aspects of working with data
- Data collection
- Data inspection
- Summary Statistics
- Data Visualization
If your data is categorical, what task would you be working with
Classification
If your data is numerical, what task would you be working with
Continuous
What is a deep learning model composed of
Collections of neurons connected together by weights
What is a tree based model
They learn to categorize or regress by building an extremely large structure of nested if/else blacks, splitting the world into different regions at each block
What does training determine in tree based models
Where splits happen and what value is assigned at each leaf region
What are linear models good for
Giving a baseline against which to compare more complex models
What is a linear model
The relationship between a set of input numbers and a set of output numbers through a linear function