Chapter 1: Machine Learning for Predictive Analysis Flashcards
What is the job of data analytics?
Extracting insights from data
What is predictive data analytics?
The art of building and using models that make predictions based on patterns extracted from historical data
What are the applications of predictive data analysis?
price prediction (businesses), dosage prediction (doctors), risk assessment (organizations), propensity modeling (predicting the likelihood or propensity of individuals to take different actions), diagnosis (doctors, engineers and scientists), document classification
What is a prediction in data analytics? How is it different from the everyday usage?
In DA a prediction is the assignment of a value to any unknown variable. In everyday usage it has a temporal aspect, we predict what will happen in the future
What two things are common in all the application examples?
in each case, a model is used to make a prediction to help make a decision AND a model is trained to make predictions based on a set of historical examples (machine learning is used to train these models)
What is machine learning?
Machine learning is an automated process that extracts patterns from data.
What is supervised machine learning used for?
We use supervised machine learning to build the models used in predictive data analytics applications
- They have labels/ classes/ events that provide us with feedback while learning
How do they work?
They automatically learn a model of the relationship between a set of descriptive features and a target feature based on a set of historical examples (instances)
What is each row of a dataset called?
training instance
What is the overall dataset called?
training dataset
When is a model consistent?
when there are no instances in the dataset for which the model does not make a correct prediction
What do machine learning algorithms do?
automate the process of learning a model that captures
the relationship between the descriptive features and the target feature in a dataset.
Why is searching for consistent models not enough to learn useful prediction models?
- When dealing with large databases there will likely be noise
- The training set represents only a small sample of the possible set of instances in the domain.
What is an ill-posed problem?
An ill-posed problem is a problem for which a
unique solution cannot be determined using only the information that is available
Why is machine learning an ill-posed problem?
A single consistent model cannot be found based on the sample training dataset alone
What is generalization?
- The ability to make predictions for queries that are not present in the data
- A prediction model that makes the correct predictions for these
queries captures the underlying relationship between the descriptive and target features
and is said to generalize well
What is the goal of machine learning?
Finding the predictive model that generalizes best
What is inductive bias?
A set of assumptions that defines the model selection criteria of a machine learning algorithm
What are the types of inductive bias?
- Restriction bias
-Preference bias
What is restriction bias?
- It constrains the set of models that the algorithm will
consider during the learning process - Similar to choosing your go-to study method
- Tells us what our model is able to represent
What is preference bias?
-It guides the learning algorithm to
prefer certain models over others
- Choosing out convergence/satisfaction mechanism
- I like group study but prefer to be the leader than weakest link
- Algorithm’s belief about what makes a good hypothesis
What are examples of restriction bias?-
- In multivariable linear regression with gradient descent we only consider models that produce description based on a linear combination of the descriptive features
- In Iterative Dichotomizer 3 we only consider tree-like prediction models where each branch encodes a sequence of checks on individual descriptive features
What are examples of preference bias?
- In MLR with GD we linearly combine the descriptive features using only weights that were found though our gradient descent approach
- In ID3 we are preferring shallower (less complex) trees over larger/deeper trees
Why is inductive bias necessary for learning beyond the dataset?
Without it we could only perform memorization of our training dataset without generalization capacity
What is model induction?
The creation of models from data
What is the difference between classification problem and regression problem?
- Classification problem has the target as a category
- Regression problem has the target as a number
What is another name for dataset?
-One whose form is the same is a table/relation of a database
- Worksheet of a spreadsheet
- Array in math
What is an instance
- Row, tuple, record of a database table
- Case in statistics
- Object of a class in programming
- Datapoint or vector in math
What is an independent variable?
- It is the attribute supplied as input
- Also known as explanatory variable, inputs, predictors
-Features are the table’s columns
What is dependent variable
The target variable whose values are to be predicted.
Aka class or label or output
What are some confusing facts about independent and dependent variables?
- Independent variables may not be independent on each other or anything else
- Dependent variables does not always depend of all the independent variables
Facts about the target variable
- Sometimes it is considered to be included in the set of features, sometimes it is not
- The target variable is not used to predict itself
- Prior values may be helpful to predict future values and may be included as input features
What is the process of building a model (or training your classifier) from historical data?
Induction, learning, training or generalization
When does the real value of machine learning become apparent?
When we want to build prediction models from large datasets with multiple features
How do you know the number of possible prediction models?
- There are three descriptive features so there are 2^3 possible combinations of descriptive feature values
- For each descriptive feature there are 3 possible target feature values
- There are 3^8 = 6,561 possible prediction models
What is the ability to memorize a training dataset?
Consistency
What does Occam’s Razor say about simplicity?
With all things being equal the simplest explanation tends to be the right one (upper bound)
What does albert Einstein say about simplicity
Everything should be made as simple as possible but not simpler (lower bound)
What are the sources of information that guide machine learning algorithms?
- Training data
- Inductive bias of the algorithm
What can go wrong with machine learning?
- Inappropriate inductive bias which leads to mistakes
What does no free lunch mean?
if an algorithm does well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems
What happens if we choose the wrong inductive bias?
- Underfitting (the prediction model is oversimplifies)
-Overfitting (the prediction model is so complex that it becomes too sensitive to noise in the data, it memorizes)
What is a Goldilocks model?
- A model that is just right and strikes a good balance between overfitting and underfitting
- it is found by using algorithms with appropriate inductive biases
What is CRISP-DM
Cross Industry Standard Process for Data Mining is a data mining process model that describes commonly used approaches that data mining experts use to tackle problems
What are the phases?
-Business Understanding- defining customers’ needs, understanding project objectives
-Data Understanding- collection and data familiarity
-Data Preparation- construct final dataset from raw data
-Modeling- select machine learning techniques relevant to the problems and their parameters are calibrated to optimal values
-Evaluation- outcome collection, compare obtained model with business objectives
-Deployment- put into production, organize and present knowledge gain in a way that the customer can use it
List other data life cycle models
-Semma- Sample, Explore, Modify, Model, Assess
- Data Mining and Knowledge Discovery from Data (KDD)- mostly used in the real world
What is supervised machine learning based on?
The assumption that data does not change over time. They create models that distinguish between classes present in the dataset they are induced from