R Flashcards
How do you start with data
Data Description, Data Segmentation, Data Correlation, Data Classification
How do you summarise categorical variables
By Counts or By Percentage
What two types of categorical variables exist and give an example for each
Ordered (low/medium/high)
Unordered (male/female)
What is the motivation behind data description
To find the shape of the data set, and to summarise the columns and relationships. Find out what the column headers are, and what type of variables are within the columns
Describe what data segmentation means
To segment data into different subgroups as there are too many subjects in the data
What is the ABC method of analysis
A: 80%, B: 15%, C: 5%%
How do you visualise data after you segment it
Pareto Plot, Pie Chart, Bar Graph
What is data correlation
Finding out the relationships between the variables
Is more “scatter” around the line indicative of higher or lower correlation?
Lower. A perfect positive correlation would see all dots on the line
plot(creditdata[,c(5,9:14)]). Explain this line of code
plot the credit data graphs, all the rows, with columns 5, 9 to 14
What are decision trees?
They help you make decisions, mainly for categorical responses
What is the purpose of decision trees
They help classify records by assigning it to a likely class or category. Can also calculate probability of event.
Are decision trees non linear or linear
Non linear
What algorithm do decision trees use
Recursive Greedy algorithm. you start off with an empty tree. decide which feature to split on. if nothing more to split, then you can make prediction. if not, think about when to stop.
how to decide which feature to split
see which one has the lowest classification error (number of mistakes/total)