Final Flashcards
(46 cards)
What is probability?
How likely an event will occur
What is conditional probablity?
The probability that A occurs given that B already occured
What is an unsupervised technique?
Finds relationships between groupings of data points
What is support
How frequently does the item occur in the dataset
What is confidence?
How often a rule is found to be true?
How do support and confidence thresholds work?
“-Select minimum acceptable values for support and confidence
- find association rules with support and confidence above chosen thresholds
- items with high support are called frequent”
Why should you use association rules?
“-simple data model
-understandable and actionable rules “
What is the apriori technique?
“-reduces number of calculations
- If a bundle is frequent then all of its subsets are frequent
- if a bundle is infrequent then all of the supersets are infrequent”
What is lift?
“-confidence/expected confidence
-the ratio that the actual probability of a transaction occuring both item A and B to the probabillity that A and B would occur if they were independent “
What is supervised method?
A way to describe the relationship between input attributes and a target attributes
What is regression?
estimating the relationship between variables
What is correlation?
The strength of the linear relationship
What are some output types for data mining techniques?
“-regression
- classification
- ordinal “
What is a regression analysis
looks at numerical range
What is a classification analysis
factor or binary output like yes or no
What is an ordinal technique
classfication with output
What technique would you use for grouping things by similarity?
clustering
What techinique is used to determine the relationship between input and output variables?
regression
What technique would you use to assign labels to data based on charachterisitcs?
Classification
What technique would you use to determine if there was a relationship between variables in the data?
association rules
What technique would you use to find structure in a temporal data set.
time series
What is a parametric model?
makes an assumption about the form or the shape of our data and then estimate the parameters of that function
What is a non parametric model?
does not make an explicit assumption as to the function
what is model stability?
process of finding a model that give accurate predictions for the whole population and not just individual samples