Stampen Flashcards

1
Q

K nearest neighbor requires three things

A

Set of stored records
Distance metric to compute distance
Value of k neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages of KNN

A

Simple method
lack of parametric assumption
Good performance with large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disadvantages of knn

A

Can be time consuming
Prohibits real time prediction of large dataset
Curse of dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages naive bayes

A

Computational efficient

Good classification prediction with many predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Disadvantages of naive bayes

A

Requires large amount of records to obtain good results

Independence assumption may not hold for some variabele

Only categoriaal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Snowflake characteristics

A

Less restricted
Ability to store aggregations
Smaller data volumes

However,

Not easily understood by end users
High number of joins
Not predictable framework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dashboard conforms to three levels

A

Levels of perception
Comprehension of the current situation
Future status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros decision

A

Easily understood

Easy to generate rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cons decision trees

A

May suffer from overfitting
Does not handle correlated features well
can become quite large (needs pruning)
Does not handle streaming data easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When to stop expanding decision tree nodes

A

Stop expanding when all records have the same value
Stop expanding when all the records have similar attributes
stop if expansion does not improve inpurity measures
Watch out for overfitting, look for knee point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

4 ways to validate clusters

A
Cluster interpretability
Cluster stability (doesn't change when rows altered)
Cluster seperation (intra en inter good divided)
Number of clusters (useful number)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of clustering

A

Does not require specifications of clusters
Purely data driven
Dendrograms easy to understand and interpret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Limitations of clustering

A

Requires the computation and storage of nxn distance matrix
Makes only one pass through the data
Tends to have slow stability
Issue with respect to chosen distance metric
Hierarchical clustering sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dimensions hierachies characteristics

A

Time related
unbalanced
Multiple branches of different types
Conforming dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly