Final Flashcards
What is probability?
How likely an event will occur
What is conditional probablity?
The probability that A occurs given that B already occured
What is an unsupervised technique?
Finds relationships between groupings of data points
What is support
How frequently does the item occur in the dataset
What is confidence?
How often a rule is found to be true?
How do support and confidence thresholds work?
“-Select minimum acceptable values for support and confidence
- find association rules with support and confidence above chosen thresholds
- items with high support are called frequent”
Why should you use association rules?
“-simple data model
-understandable and actionable rules “
What is the apriori technique?
“-reduces number of calculations
- If a bundle is frequent then all of its subsets are frequent
- if a bundle is infrequent then all of the supersets are infrequent”
What is lift?
“-confidence/expected confidence
-the ratio that the actual probability of a transaction occuring both item A and B to the probabillity that A and B would occur if they were independent “
What is supervised method?
A way to describe the relationship between input attributes and a target attributes
What is regression?
estimating the relationship between variables
What is correlation?
The strength of the linear relationship
What are some output types for data mining techniques?
“-regression
- classification
- ordinal “
What is a regression analysis
looks at numerical range
What is a classification analysis
factor or binary output like yes or no
What is an ordinal technique
classfication with output
What technique would you use for grouping things by similarity?
clustering
What techinique is used to determine the relationship between input and output variables?
regression