R Flashcards

1
Q

How do you start with data

A

Data Description, Data Segmentation, Data Correlation, Data Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you summarise categorical variables

A

By Counts or By Percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What two types of categorical variables exist and give an example for each

A

Ordered (low/medium/high)

Unordered (male/female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the motivation behind data description

A

To find the shape of the data set, and to summarise the columns and relationships. Find out what the column headers are, and what type of variables are within the columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe what data segmentation means

A

To segment data into different subgroups as there are too many subjects in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the ABC method of analysis

A

A: 80%, B: 15%, C: 5%%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you visualise data after you segment it

A

Pareto Plot, Pie Chart, Bar Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is data correlation

A

Finding out the relationships between the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is more “scatter” around the line indicative of higher or lower correlation?

A

Lower. A perfect positive correlation would see all dots on the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

plot(creditdata[,c(5,9:14)]). Explain this line of code

A

plot the credit data graphs, all the rows, with columns 5, 9 to 14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are decision trees?

A

They help you make decisions, mainly for categorical responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of decision trees

A

They help classify records by assigning it to a likely class or category. Can also calculate probability of event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Are decision trees non linear or linear

A

Non linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What algorithm do decision trees use

A

Recursive Greedy algorithm. you start off with an empty tree. decide which feature to split on. if nothing more to split, then you can make prediction. if not, think about when to stop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to decide which feature to split

A

see which one has the lowest classification error (number of mistakes/total)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to decide when to stop

A

all data agrees on one condition, or you have no more features to split on

17
Q

how do you split continuous variables?

A

use midpoints of data, use thresholds with lowest classification error

18
Q

how to code a decision tree in R

A
install package tree
library(tree)
credit = tree(...)
summary(credit)
plot(credit)
text(credit, pretty = 0)
19
Q

how to summarise continuous data

A

use mean and standard deviations. most observations should lie within 2 standard deviations of the mean.

20
Q

what does 95% confidence interval mean?

A

it means based on normal distribution, that the data is 1.96 standard deviations above and below the mean

21
Q

what would be useful to summarise categorical data?

A

pareto plots,percentages, bar charts, pie charts, abc classifications