Final Flashcards by gab mart

What is probability?

How likely an event will occur

How well did you know this?

Not at all

Perfectly

What is conditional probablity?

The probability that A occurs given that B already occured

How well did you know this?

Not at all

Perfectly

What is an unsupervised technique?

Finds relationships between groupings of data points

How well did you know this?

Not at all

Perfectly

What is support

How frequently does the item occur in the dataset

How well did you know this?

Not at all

Perfectly

What is confidence?

How often a rule is found to be true?

How well did you know this?

Not at all

Perfectly

How do support and confidence thresholds work?

“-Select minimum acceptable values for support and confidence

find association rules with support and confidence above chosen thresholds
items with high support are called frequent”

How well did you know this?

Not at all

Perfectly

Why should you use association rules?

“-simple data model

-understandable and actionable rules “

How well did you know this?

Not at all

Perfectly

What is the apriori technique?

“-reduces number of calculations

If a bundle is frequent then all of its subsets are frequent
if a bundle is infrequent then all of the supersets are infrequent”

How well did you know this?

Not at all

Perfectly

What is lift?

“-confidence/expected confidence
-the ratio that the actual probability of a transaction occuring both item A and B to the probabillity that A and B would occur if they were independent “

How well did you know this?

Not at all

Perfectly

What is supervised method?

A way to describe the relationship between input attributes and a target attributes

How well did you know this?

Not at all

Perfectly

What is regression?

estimating the relationship between variables

How well did you know this?

Not at all

Perfectly

What is correlation?

The strength of the linear relationship

How well did you know this?

Not at all

Perfectly

What are some output types for data mining techniques?

“-regression

classification
ordinal “

How well did you know this?

Not at all

Perfectly

What is a regression analysis

looks at numerical range

How well did you know this?

Not at all

Perfectly

What is a classification analysis

factor or binary output like yes or no

How well did you know this?

Not at all

Perfectly

What is an ordinal technique

classfication with output

How well did you know this?

Not at all

Perfectly

What technique would you use for grouping things by similarity?

clustering

How well did you know this?

Not at all

Perfectly

What techinique is used to determine the relationship between input and output variables?

regression

How well did you know this?

Not at all

Perfectly

What technique would you use to assign labels to data based on charachterisitcs?

Study These Flashcards

Classification

What technique would you use to determine if there was a relationship between variables in the data?

Study These Flashcards

association rules

What technique would you use to find structure in a temporal data set.

Study These Flashcards

time series

What is a parametric model?

Study These Flashcards

makes an assumption about the form or the shape of our data and then estimate the parameters of that function

What is a non parametric model?

Study These Flashcards

does not make an explicit assumption as to the function

what is model stability?

Study These Flashcards

process of finding a model that give accurate predictions for the whole population and not just individual samples

What is overfitting?

model error where the results to closely fit the data set

What is cross validation?

looking at how results will effect a certain data set

What are posterior probabilities ?

The statistical probability that a hypothesis is true calculated in the light of revelant observations

What is sensitivity

The true positive rate. the proportioni of positives that are correctly identified.

What is specifity?

The true negative rate. the proportion of negatives that are correctly identified as such

What is discriminant analysis?

Used to seperate groups from each other

What are decision trees?

"Allows you to develop classification systems to predict or classify current and future observations based on a set of decision rules divide up a large collection of records into successively smaller sets of records by appying binary rules "

What are the benefits of decision trees?

"-the input data and be ocntinous or discrete - the underlying assumption of of relationship beteen indpenedent and dependent variable - suited for classification and regression - easy to interpret "

Why perform cluster analysis ?

find patterns in data

WHat are challenges with cluster analysis?

"-how to we define similar? | -how do we handle otuliers "

How do we define similarity?

"-symmetry | -triangle inequality"

What is euclidean distance

distance between centroid and individual data point

What is hieratchical clustering?

determine clusters based on some arbitary maximum distance a cluster object can be from another cluster object

What is centroid based clustering

data is a part of a centroid

What is confidence?

how certain you are that your results are accurate

What is lift?

how well the model is performing

What is inference vs prediction

"-inference used when we want to understand relationships between variables -prediction is used to predict "

CRISP DM cycle

``` "-Business Understanding -Data Understanding -Data Prep -Modeling -Evaluation Deloyment " ```

Which of the following metrics measures a model's ability to correctly identify positive values (select all that apply).

"-sensitivity - recall - true positive rate "

What is a rule about association rules?

D. A large `confidence` in an association rule, will typically result in a higher lift when support is low

Which of the following are true of Parametric Models? Select all that apply.

"A.Inferences can usually be made from a smaller number of predictors than with non-parametric models B.They are often simpler than non-parametric models D.They are usually less prone to overfitting than non-parametric models"

Describe the Hold-Out approach to Cross Validation. | Why it is performed / why is it necessary?

You randomly select some parts of the data to use for test and you keep another subset for use it for training. Once you train the model you validate with the test set. You cross validate by repeatedly taking subsets to become training sets and test sets. It is performed to predict the accuracy and will tell how well a model will generalize to future observations

Final Flashcards

(46 cards)