Quiz 1 Flashcards
tasks of classification and prediction as well as pattern discovery
predictive analytics
examine unknown categories or predict future
classification
predicting value of a numerical variable (ex go up 5%)
prediction
designed to find general association patterns between items in large databases, for whole population
association rules/affinity analysis
uses individual users’ preferences and tastes given their historic measurable behaviors indicative of preference, for user
collaborative filtering
analytical methods in PA
classification, prediction, collaborative filtering
process of consolidating a large number of records/cases into a smaller set (rows)
data reduction
methods of reducing the number of cases
clustering
reducing the number of variables in a set (columns)
dimension reduction
what is one of the earliest stages of engaging with data?
exploring it
exploration of data by creating charts and dashboards
data visualization/visual analytics
used in classification and prediction, must have data available in which the value of the outcome of interest is known (yes/no)
supervised learning algorithms
data from which classification or prediction algorithms learns about the relationship between prediction variables and outcome variable
training data
data which the algorithm is applied to to test how well it does and fixing any issues
validation data
data used for evaluating chosen model
test data
used when there is no outcome variable to predict or classify, no learning from when outcome variable is known
unsupervised learning algorithms
methods are trained on a set of training data and then their performance is evaluated on a separate set of validation data
data partioning
predicted values of training data, they are for the records on which the model was fit
fitted values
what does SEMMA stand for?
sample, explore, modify, model, assess
nominal categorical variables that have been decomposed into a series of binary variables (yes/no)
dummy variables
including all four dummy variables in an algorithm
one-hot encoding
knowledge of the particular application being considered, not the algorithm
domain knowledge
two ways to standardize data
- subtract the mean from each value and divide by standard deviation
- subtract by minimum and divide by range (for 0 to 1 standardization)
entire set of operations that was performed on a dataset
workflow
maintenance triggered by advance warnings that are predictive of failure
proactive preventive maintenance
steps in machine learning process
- develop understanding of ML project
- obtain dataset
- explore, clean, and preprocess data
- reduce data dimensions if necessary
- determine ML task
- partition data (for supervised tasks)
- choose ML techniques
- use algorithms to perform task
- interpret results
- deploy the model