Chapter 1: Machine Learning for Predictive Analysis Flashcards
What is the job of data analytics?
Extracting insights from data
What is predictive data analytics?
The art of building and using models that make predictions based on patterns extracted from historical data
What are the applications of predictive data analysis?
price prediction (businesses), dosage prediction (doctors), risk assessment (organizations), propensity modeling (predicting the likelihood or propensity of individuals to take different actions), diagnosis (doctors, engineers and scientists), document classification
What is a prediction in data analytics? How is it different from the everyday usage?
In DA a prediction is the assignment of a value to any unknown variable. In everyday usage it has a temporal aspect, we predict what will happen in the future
What two things are common in all the application examples?
in each case, a model is used to make a prediction to help make a decision AND a model is trained to make predictions based on a set of historical examples (machine learning is used to train these models)
What is machine learning?
Machine learning is an automated process that extracts patterns from data.
What is supervised machine learning used for?
We use supervised machine learning to build the models used in predictive data analytics applications
- They have labels/ classes/ events that provide us with feedback while learning
How do they work?
They automatically learn a model of the relationship between a set of descriptive features and a target feature based on a set of historical examples (instances)
What is each row of a dataset called?
training instance
What is the overall dataset called?
training dataset
When is a model consistent?
when there are no instances in the dataset for which the model does not make a correct prediction
What do machine learning algorithms do?
automate the process of learning a model that captures
the relationship between the descriptive features and the target feature in a dataset.
Why is searching for consistent models not enough to learn useful prediction models?
- When dealing with large databases there will likely be noise
- The training set represents only a small sample of the possible set of instances in the domain.
What is an ill-posed problem?
An ill-posed problem is a problem for which a
unique solution cannot be determined using only the information that is available
Why is machine learning an ill-posed problem?
A single consistent model cannot be found based on the sample training dataset alone
What is generalization?
- The ability to make predictions for queries that are not present in the data
- A prediction model that makes the correct predictions for these
queries captures the underlying relationship between the descriptive and target features
and is said to generalize well
What is the goal of machine learning?
Finding the predictive model that generalizes best
What is inductive bias?
A set of assumptions that defines the model selection criteria of a machine learning algorithm