final exam Flashcards by Chloe Jacobs

The classification trees classification algorithm:

estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes

How well did you know this?

Not at all

Perfectly

The naive Bayes classification algorithm:

estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes

How well did you know this?

Not at all

Perfectly

the knn classification alg:

estimates how likely data point is to be a member of one group or the other depending on what group the data points nearest to it are in
uses a tree like structure to illustrate the choices available for each possible decision and its estimated outcome by showing them as separate branches of the tree
predicts the prob that an instance is member of a certain class by basing the technique on the bayes thm
utilizes an equation based ont he ordinary least squares ression that can predict the prob of the possible categorical outcoes

How well did you know this?

Not at all

Perfectly

classification algorithms that do not use assumptions abt the structure of teh data are ___ algorithms

data driven

How well did you know this?

Not at all

Perfectly

a good use of classification alg would be:

estimating the net profit for dishwashers for a major manufacturer
identifying the seasonal salws for wood stoves over the last 3 yrs
forecasting sales for a new product
upselling or cross selling to cuts thru an online store when a cust makes a purchase

How well did you know this?

Not at all

Perfectly

in a CART model classification rules are extracted from

the decision tree

How well did you know this?

Not at all

Perfectly

the knn techique is what type of technique

a classification technique

How well did you know this?

Not at all

Perfectly

in setting up the knn model:

the user allows XLminer to select the optimal value of k
the optimal k is set by the user at 10
the data is normalized in order to take into account the categorical variables
it is necessary to set an optimal value for k

How well did you know this?

Not at all

Perfectly

Below are the 8 actual values of the target variable in the training position:
(0,0,0,1,1,1,1,1)
What is the entropy of the target variable?
-5/8 log2(5/8)-3/8 log2(3/8)
5/8 log2(5/8)-3/8 log2(3/8)
-3/8 log2(3/8)+5/8 log2(3/8)
-5/8 log2(3/8)+log2(5/8)

How well did you know this?

Not at all

Perfectly

Classification programs are distinguished from estimation problems in that

classification problems require the output attribute to be numerical
classification problems require the output attribute to be categorical
classification problems do not allow an output attribute
classification problems are designed to predict future outcomes

How well did you know this?

Not at all

Perfectly

Which statement is true about the decision tree attribute selection process:

a categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once
a numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once
both numeric and categorical may appear in several tree nodes
numeric and categorical attributes may appear in at most 1 tree node

How well did you know this?

Not at all

Perfectly

What is the ensemble enhancement that is a method of creating psudo-data from the data in an og data set?
partitioning
overfitting
sampling
bagging

bagging

How well did you know this?

Not at all

Perfectly

What is the ensemble enhancement that is an iterative technique that adjusts the weight of any record based upon the last classification
bootstrapping
boosting
sampling
bagging

boosing

How well did you know this?

Not at all

Perfectly

What is the most often used ensemble enhancement

bagging

How well did you know this?

Not at all

Perfectly

What are the 3 most popular methods for creating ensembles?

sampling, summarizing, random forest
bagging, boosting, random forest
bagging, boosting, clustering
overfitting, clustering, sampling

How well did you know this?

Not at all

Perfectly

What is one benefit of using an ensemble model?

it better establishes the relationship bw 1 dep. varaible and multiple ind. variables
it strengthens the relationship bw the multiple ind. var
it reduces the number of errors that results
it is more efficient at adding and removing predictors

How well did you know this?

Not at all

Perfectly

What is the most common uses of clustering algorithms?

to min variance and bias error
to segment cust
to determine how effectively the model can reorder the data set
to validate the data set

How well did you know this?

Not at all

Perfectly

in logit P/(1-p) represents:

the odds of sucess

How well did you know this?

Not at all

Perfectly

In a naive bayes model it is necessary that:
-all attributes are categorical
-to partition the data into 3 parts (training, validation, scoring)
-to set cutoff values to less than .75
to have a continuous target variable

1 (ie gender, blood type); can never have cont. variables

How well did you know this?

Not at all

Perfectly

Generally, an ensemble method works better, if the individual base model have _____
Assume each indiv. base models have accuracy greater than 50%
-less correlation among predictors
-high correlation amond predictors
-correlation does not have any impact on ensemble output
-none of the above

How well did you know this?

Not at all

Perfectly

a dendogram is used w which analytics algorithsm?
text mining
clustering
ensemble models
all of the above

clustering

How well did you know this?

Not at all

Perfectly

What is a bootstrap?

procedure that allows the data scientists to reduce the dimensions of the training data set
this is one of many classification type algorithms
it is a procesure for aggregating many attributes into a few attributes
it is based on repeatedly and systematically sampling w/out replacement from the data

How well did you know this?

Not at all

Perfectly

what is clustering

ensemble algorithm for improving the accuracy of classification models
could be thought of as a set of nested algorithms whose purpose is to choose weak learners
it is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another
none of the above

How well did you know this?

Not at all

Perfectly

Which of the following are not types of clustering?

k means
hierarchal
agglomerative
splitting

How well did you know this?

Not at all

Perfectly

a major part of text mining is to - reduce the dimensions of the data - generalize the use of modifiers - screen the articles from the data set - reduce the word count of the text actually used

semantic processing seeks to - extract meaning - group indiv. terms into bins - eliminate "extra" or unnecessary terms from an analysis - uncover undefined words or terms in a set of textual data

what is the process of extracting token words from a block of text after performing cleanup procesures

tokenization

What would normalized text look like? - all duplicate words are removed - all stop words removed - all spelling errors corrected - all text is converted to lower case

What would the result be if you were asked to use stemming to these terms: agreed, agrees, agreeable, agreeing?

all terms would change to agree

What type of standard diagnostic is used for text mining algorithms?

lift chart and confusion matrixes

A model that goes beyond a bag of words analysis and assigns and defines consumer sentiment to words would be a ___ model

NLP

Which of the following are other procedures that could be used to reduce the text dimensions to prepare for analysis? - # and items that appear to be monetary values are removed - words of more than 20 letters in length are removed - headers and page numbers are removed - duplicates of all words are removed

1,2,3

What is entity extraction?

identifying a group of words as a single item

the words extracted from a black of text after the cleanup procedures have been performed are

tokens

latent semantic indexing: - uses svd to identify patterns in the relationship bw terms and concepts - reduces the dimensions of the text by trateing all versions of the same (or a very similar) cncept identically - ccollates the most common words and phrases and identifies them as keywords - identifies a group of words as a single item

1,3

what is a method for clearing away clutter in raw tect documents and extracting useful char. to serve as attirbutes?

dimension reduction

What algorithm takes a large # of words and compresses them into a much smaller number of linear combinations

SVD

Which of the folliwng best describe target leakage? - it is diff to detect and harder to eliminate - it is the diff bw the expected prediction of a model and the correct value that is targeted - it allows alg to make predictions that are too good to be true - it is the intro of info about the text mining target that hsould not legit be available to the alg.

1,3,4

the process of collecting data from websites is

web scraping

What is the goal of text analytics? | to reduce the dimenstion of the ___ text to manageable attributes that ca be used in data mining alg

unstructured

What is the most diff decision to make when considering the use of text as data? - to know the attributes of the data - the format of the data - to define the prob you are trying to solve - teh data mining alg that will be used

what looks at unprocessed text as a collection of words w./out regard to grammar

bag of words

Which of the following describe stemming? - stem would be the token remembered and used in place of all other forms of this word - stemming reduces words to their stem - a stem word needs to be a real word, not a made up word - reduces the dimensions of the text by treating all versions of the same concept identically

1,2,4

what uses svd to identify patterns in the relationships bw the terms and concpets

latent semantic indexing

"the erupting guyser with diet coke and memtos is a fun experiment for kids working at home" what is the final step in dimension reduction? -correct the spelling -eliminate the stop words -identify associations -perform stemming

"the erupting guyser with diet coke and memtos is a fun experiment for kids working at home" what is the 2nd step in dimension reduction? -correct the spelling -eliminate the stop words -identify associations -perform stemming

What two main problems is the random forest methodology used to address? - settings where the # of attributes is much smaller than the # of records - assess and rank attributes w respect to their ability to predict the classification - construct classification rules for a learning problem - yield a classification given the attributes for current observations

2,3

What clustering alg is an iterative process using some best criteria in the multiple passes to see if it can improve the clusters? k-means hierarchial method

k-means

What are the two basic types of clustering algorithms?

k-means and hierarchial

Which of the following would be correct about the random forest algorithm? - it is limited to a random subset of attributes at each stage of the alg - it is based on applying bagging to a decision tree algorithm that also samples attributes in addition to records - it produces more accurate predictions than a simple CART - it is a collection of 1 cart tree that is ind. when constructed

1,2,3

What is the most common form of unsupervised learning?

clustering

What is a decision stump? It is a decision tree with _____ root nodes

Which of the following correctly compare the relationship bw traditional CART model and random tree? - over fitting can be a prob. w both CART and rando forest - Rando forest produces more accurate predictions than a simple CART - Random forest samples records like CART but it also samples the attributes - CART and random forest have the same number of steps

1,2,3

What is the k-means clustering alg?

iterative process using some of the best criteria in the multiple passes to see if it can improve the clusters

If the correlation coefficient is .983, it indicates that

there appears to be a strong positive linear association bw x and y

What collates common words and identifies them as keywords?

latent semantic index

3 steps in dimension reduction

1. eliminate stop words 2. perform stemming 3. correct spelling errors

What reduces phrase to basic identity?

phrase reduction

what is the process of extracting "token" words from a block of text after performing cleanup?

tokenization

What takes a large number of words and compresses them into a small number of linear combinations

sing. value decomp

error due to the diff bw the expected prediction of our model and the correct value were trying to predict

bias

error due to the variability of a models prediction for a given point

variance

what clustering technique adjusts the weight of any given record based upon the last classification

boosting

what is a method of creating psudo-data from data in an original set

bagging; boot strap

What are the 4 types of classification models?

KNN, classification trees (CART, decision tree, regression tree), naive bayes, logit

the naive bayes classification thecnique could best be describe by: - its comparable in performance to decision trees - is fast and accurate - uses statistical classifiers - uses the same alg. as the regression decision tree

1,2,3

How do data mining trees look?

upside down tree w leaves at bottom and root on top

A good classification tree will make the best split first followed by decision rules that are made up w: - succesively larger and larger numbers of training records? - succesively smaller and smaller numbers of training records?

The most important distinction bw a logit and ordinary regression is that the dependent variable is

categorical not continuous

classification is ___ learning

supervised

What is a mathematical concept that measures the uncertainty associated w random variables

info entropy

What does the k in KNN refer to - # of nearest neighbors used in determining a category correctly - # of char. of the unknown used in determining a category correctly - # of unknowns in a category - # of attributes nearest the unknown

What classification technique predicts numeric quantities?

classification tree

What are the steps in the data mining process?

``` Sample Explore Modify Model Assess ```

What are the 4 characteristics of data mining

1. volume- large sizes of data bc w analytics can use unstructured data; size of data set 2. velocity- rate at which we expect the data to arrive has to be fast. Algorithms must be fast in order to be useful 3. variety- unstructured and structured data. Not just data from excel sheets; types of data available 4. value- a lot of the data in the past was not useful. Job of data scientist to determine what is useful data and what is not; valuation of data

What is the common form of prediction in data mining

classification tool

What is the primary goal for data mining? - to model the noise in the data - to overfit the model so there's a low misclassification rate - to have accuracy and fit as characteristics of the model - to do a good job of representing our known data set

3,4

Which correctly define data warehouse - a firms central repository of integrated historical data - a location where data is stored - the memory of the firm - collective info on every aspect of what has happened in the past

1,3,4

Which of th following describes a data mart: - collective info on every aspect of what has happened in the past for a comp - holds info that is specialized and has been grouped or chosen specifically - a frims central repository of integrated hist. data - a subset of a data warehouse

2,4

What does data mining refer to - the physical tools used to access data and make predictions - knowlege gained from mass data - patterns in a mass of data - tools that are used in the large scale or big data arena

Which of the following are true regarding data mining and biz forecasting? - in data mining u simultaneous search for diff patterns in parallel, but in biz forecasting search for set patterns - for biz forecasting, the expectation is that the data will contain some level of variation where in data mining patterns are not pre specified - in data mining you are searching for seasonal variability, but in biz forecasting yu are searching for trend patterns only

1,2,3

Which of the following correctly contrasts data mining w database management? - queries are well defined in database mngment but less structured in data mining - data mining is more forward looking where database mngment is more past focused - a query in database mngmt would be "find all cust in atlanta", in data mining it would be "group all cust w sim buying habits" - database mngment is extracting useful info from large, unstructured databases where dataminign is extracting specialized or grouped data

1,2,3

classification tools distinguish bw: - data concepts and objects - data objects and classes - data classes or fields - data classes or concepts

``` target algorithm feature/attribute record score data mining terms ```

``` dep var forecasting model exp variable observation forecast ```

What are the reasons for sampling or partitioning in data mining? - it is common practice in database management to set aside a portion of the data - partitioning and testing for accuracy are standard practice in analytics - it has its roots in the "holdout" and "holdbacks" used for standard forecasting models - in most cases, the entire data set is not needed to build a model

2,3,4

what is the process called of transforming text into numbers

datafication

bag of words analysis looks at what?

unprocessed text as a collection of words w/out regard to grammar

the frequency of any single word is inversely propotional to its rank in the frequency table is what law?

zipfs law

what are the most common mistake in text mining?

-target leakage: intro of info that should not be available to the alg

In the universal bank data in this chapter, only 10% of the records represented customers who had taken out a personal loan (the target variable). If we were to score a new cust based upon the attributes we used in the alg we would be accurate in the prediction about 90% of the time if we always scored the indiv as "not accepting a loan" bc that is indeed what most cust have done in the past. WHy not accept being right 90% of the time w this v simple decision rule?

Because with data mining we have access to the information and tools that can help us do better than predicting correctly 90% of the time. So in this scenario, we could look at our lift chart and find the customers with the highest probability of accepting a personal loan, and market to them, in order to have a better chance of finding people who will accept the loan.

Data has the characteristic of being nonrivalrous. Explain its importance

Nonrivalry: characteristic that means that one person’s use of the good to create value does not diminish the value another can extract from the data. It’s important to realize that data has this characteristic so more researchers and data scientists can use data, because every time the data set is used, it can be used to obtain different results. Every researcher can use a data set with a different purpose and get different conclusions.

The lift chart and the confusion matrix are both standard diagnostic tools used to evaluate a data mining algorithm. Don't the two measures display the same info? explain diff bw the two measures

Both the confusion matrix and the lift chart provide information about model performance but display the information in different ways. Confusion matrix: This shows model performance. There is a confusion matrix for both the validation data and the training data. Most often, the results from the validation model are most relevant since they show how the model performed on unseen data. The validation confusion matrix shows model performance in classification on data that was not used to build the model. Gives results for the amount of correct classifications and the misclassifications. Lift chart: This is the standard for accuracy in data mining. These charts help to determine how effectively the model can reorder the data set, by placing the individuals who have the highest probability of success on top, and those with the lowest probability of success on bottom. By looking at the chart, you can determine how well your model is doing compared to a naïve model. confusion- what u got right and wrong lift- for each % of data, how much u got right

Why do we need to sample the data?

Bc we need to see how model works on this current data set and how it will work in the real world The risk in ignoring this step is creating bias. If a data scientist uses the same data to both build and test the model, and that model is overfit, then most likely the results will also be overfit.

Structured vs unstructured

Structured data: data that does have a predefined model Unstructured data: Data that does not have a predefined data model Unstructured data is a more prevalent form of data because it comes in many different forms which we are expose to daily Excel spreadsheet: structured data A thousand text files: unstructured A thousand video images: unstructured A thousand audio filed: Unstructured

Some data mining algorithms work so well they have a tendency to overfit the data. What does this mean and what difficulties does overlooking it cause for the data scientist?

Overfitting: When we put too many attributes (or try to account for too many patterns) in a model, including some unrelated to the target. If a data scientist overfits their data they will incorrectly explain some variation in the data that is nothing more than a chance variation. In other words, they will have mislabeled the noise in the data as part of the “true signal”

To find if the coefficients are statistically significant..

t-stat>2= stat. sig

What does R2 say?

Explains ___% of the variation in the data

How do you monitor autocorrelation?

if DW is bw 1.5 and 2.5 you have no 1st order serial correlation

final exam Flashcards

(98 cards)