final exam Flashcards
The classification trees classification algorithm:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
2
The naive Bayes classification algorithm:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
3
the knn classification alg:
- estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
- uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
- predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
- utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes
1
classification algorithms that do not use assumptions abt the structure of teh data are ___ algorithms
data driven
a good use of classification alg would be:
- estimating the net profit for dishwashers for a major manufacturer
- identifying the seasonal salws for wood stoves over the last 3 yrs
- forecasting sales for a new product
- upselling or cross selling to cuts thru an online store when a cust makes a purchase
4
in a CART model classification rules are extracted from
the decision tree
the knn techique is what type of technique
a classification technique
in setting up the knn model:
- the user allows XLminer to select the optimal value of k
- the optimal k is set by the user at 10
- the data is normalized in order to take into account the categorical variables
- it is necessary to set an optimal value for k
1
Below are the 8 actual values of the target variable in the training position:
(0,0,0,1,1,1,1,1)
What is the entropy of the target variable?
-5/8 log2(5/8)-3/8 log2(3/8)
5/8 log2(5/8)-3/8 log2(3/8)
-3/8 log2(3/8)+5/8 log2(3/8)
-5/8 log2(3/8)+log2(5/8)
1
Classification programs are distinguished from estimation problems in that
- classification problems require the output attribute to be numerical
- classification problems require the output attribute to be categorical
- classification problems do not allow an output attribute
- classification problems are designed to predict future outcomes
2
Which statement is true about the decision tree attribute selection process:
- a categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once
- a numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once
- both numeric and categorical may appear in several tree nodes
- numeric and categorical attributes may appear in at most 1 tree node
2
What is the ensemble enhancement that is a method of creating psudo-data from the data in an og data set? partitioning overfitting sampling bagging
bagging
What is the ensemble enhancement that is an iterative technique that adjusts the weight of any record based upon the last classification bootstrapping boosting sampling bagging
boosing
What is the most often used ensemble enhancement
bagging
What are the 3 most popular methods for creating ensembles?
- sampling, summarizing, random forest
- bagging, boosting, random forest
- bagging, boosting, clustering
- overfitting, clustering, sampling
2
What is one benefit of using an ensemble model?
- it better establishes the relationship bw 1 dep. varaible and multiple ind. variables
- it strengthens the relationship bw the multiple ind. var
- it reduces the number of errors that results
- it is more efficient at adding and removing predictors
3
What is the most common uses of clustering algorithms?
- to min variance and bias error
- to segment cust
- to determine how effectively the model can reorder the data set
- to validate the data set
2
in logit P/(1-p) represents:
the odds of sucess
In a naive bayes model it is necessary that:
-all attributes are categorical
-to partition the data into 3 parts (training, validation, scoring)
-to set cutoff values to less than .75
to have a continuous target variable
1 (ie gender, blood type); can never have cont. variables
Generally, an ensemble method works better, if the individual base model have _____
Assume each indiv. base models have accuracy greater than 50%
-less correlation among predictors
-high correlation amond predictors
-correlation does not have any impact on ensemble output
-none of the above
1
a dendogram is used w which analytics algorithsm? text mining clustering ensemble models all of the above
clustering
What is a bootstrap?
- procedure that allows the data scientists to reduce the dimensions of the training data set
- this is one of many classification type algorithms
- it is a procesure for aggregating many attributes into a few attributes
- it is based on repeatedly and systematically sampling w/out replacement from the data
4
what is clustering
- ensemble algorithm for improving the accuracy of classification models
- could be thought of as a set of nested algorithms whose purpose is to choose weak learners
- it is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another
- none of the above
3
Which of the following are not types of clustering?
- k means
- hierarchal
- agglomerative
- splitting
4
a major part of text mining is to
- reduce the dimensions of the data
- generalize the use of modifiers
- screen the articles from the data set
- reduce the word count of the text actually used
1
semantic processing seeks to
- extract meaning
- group indiv. terms into bins
- eliminate “extra” or unnecessary terms from an analysis
- uncover undefined words or terms in a set of textual data
1
what is the process of extracting token words from a block of text after performing cleanup procesures
tokenization
What would normalized text look like?
- all duplicate words are removed
- all stop words removed
- all spelling errors corrected
- all text is converted to lower case
4
What would the result be if you were asked to use stemming to these terms: agreed, agrees, agreeable, agreeing?
all terms would change to agree
What type of standard diagnostic is used for text mining algorithms?
lift chart and confusion matrixes
A model that goes beyond a bag of words analysis and assigns and defines consumer sentiment to words would be a ___ model
NLP
Which of the following are other procedures that could be used to reduce the text dimensions to prepare for analysis?
- # and items that appear to be monetary values are removed
- words of more than 20 letters in length are removed
- headers and page numbers are removed
- duplicates of all words are removed
1,2,3
What is entity extraction?
identifying a group of words as a single item
the words extracted from a black of text after the cleanup procedures have been performed are
tokens
latent semantic indexing:
- uses svd to identify patterns in the relationship bw terms and concepts
- reduces the dimensions of the text by trateing all versions of the same (or a very similar) cncept identically
- ccollates the most common words and phrases and identifies them as keywords
- identifies a group of words as a single item
1,3
what is a method for clearing away clutter in raw tect documents and extracting useful char. to serve as attirbutes?
dimension reduction
What algorithm takes a large # of words and compresses them into a much smaller number of linear combinations
SVD
Which of the folliwng best describe target leakage?
- it is diff to detect and harder to eliminate
- it is the diff bw the expected prediction of a model and the correct value that is targeted
- it allows alg to make predictions that are too good to be true
- it is the intro of info about the text mining target that hsould not legit be available to the alg.
1,3,4
the process of collecting data from websites is
web scraping