QDA Flashcards

1
Q

3 ways to numerically summarise a categorical variable

A
  1. Frequencies or counts
  2. Relative frequencies
  3. Relative cummulative frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can categorical variables be summarised visually?

A

Bar and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are bar charts for and what are the y and x axis

A

Representing frequencies of each of the different categories, the y axis is the frequencies and the x axis are the categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are pie charts for?

A

Representing the frequencies of each of the different categories as a slice of pie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When describing the contents of a numerical variable we can look at different aspects of its distribution such as:

A

Measures of location such as the mean
Measures of spread and variability
Extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is a t.test used?

A

When variables are independent and the errors are normally distributed. Use the mean to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Wilcoxon rank sum test?
How does it work

A

Non-parametric alternative to a t.test. (Used when we cannot assume a normal distribution)

Puts all measurements into one column and assigns a value to each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a scatter plot do?
What to look for and how to interpret

A

Display two numerical variables of interest along the x axis (independent) and y axis (dependent)

Whether it has a positive relation, linear, quadratic or exponential, strong relation, clear relation or outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Two main types of analysis

A

Descriptive - Describing data using numerics or graphical

Inferential - Using sample data to make a conclusion on larger populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main data types?

A

Categorical - Attributes observes for sampling unit. Binary categories

Numerical - Numerical value on a discrete, ordinal or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a confidence interval?

A

The likely range the mean/proportion would fall in if the exercise was repeated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

P value rule

A

P value <= a = Reject the null (significant)
P value > a = Fail to reject null (not significant)

(P value should be less than 0.05 for any difference to be significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean to test a null hypothesis?

A

It is what you’re trying to disprove. It is the given facts

The mean has a specific value against an alternative hypothesis.

H0: u = u0
H1: u =/ u0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the type 1 and 2 error probabilities

A

a = p(type 1 error) = p(reject H0 | H0 is true)

B = p(type 2 error = p(fail to reject H0 | H0 is false)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to test for normality

A

Quantile - Quantile (Q-Q plot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Numerical tests for normality

A

Kolmogorov-Smirnov (K-S) test
Shapiro-Wilks test

17
Q

How do you test for variance?
Give the definition of each test

A

ANOVA is the main test for variance as it used to determine if there is a statistically significant difference between two or more categorical groups by testing the differences of means by using variance

Fishers F test which involves dividing the larger variance by the smaller variance

18
Q

What is a prop test?

A

A test to find the confidence interval for the mean of a population from a sample (proportion)

Testing the proportions in several groups are the same by using their means

19
Q

What can we use for hypothesis testing?

A

T.test. The mean for a sample from a population

20
Q

What is a correlation test used for?
Provide an example

A

Used for numerical data as a pre-step to linear regression

Eg speed vs distance

21
Q

What type of variables does linear regression use?

A

Two continuous variables that are numeric for both the independent and the dependent

22
Q

What is a pearsons chi squared test (x2)

A

Used to discover whether there is a relationship between two categorical variables

23
Q

What is the difference between a one way ANOVA and two way?

A

One way ANOVA is a parametric test used to determine whether there are any significant differences between the means of two or more independent variables

Two way ANOVA is testing the effect of two independent variables on a dependent variable

24
Q

How to graphically show the variance in a categorical and continuous variable?

A

Box plots

25
Q

Name 4 diagnostic plots to test lm models

A

Residual vs fitted
QQ plot
Scale location
Residuals vs leverage

26
Q

What look for in residual vs fitted diagnostic plot

A

It should look scattered otherwise suggest issues with model assumptions

27
Q

What to look for in a QQ plot

A

Needs to be a straight line for all plotted values

28
Q

How to analyse models in lm

A

Discuss coefficients
Linear relations
Significant SSR and SSE ratio
R2 value
Outliers
Unwanted patterns in residuals requiring transformation
Check if they fit the assumption of homoscedascity

29
Q

What different types of models are there?

A

Linear regression
Multiple regression
ANOVA and ANCOVA
Logistic regression

30
Q

When do you use ANOVA?
When do you use One way and multi-way

A

When all explanatory variables are categorical

One way is used when there is one factor or categorical independent variable

Multi way is when there’s more than one categorical independent variable

31
Q

Different types of transformation techniques for models

A

Log dependent variable
Square the independent variable
1/ the independent variable
Joining categories

32
Q

How to graphically represent a fully numeric dataset
What does it do?

A

Through the plots()

It plots all the numeric variables all at once with each other

33
Q

When do you use a logistic regression?

A

When all the variables are categorical and the dependent variable is binary

34
Q

What do you need to use a chi squared test for?

A

When you want to find out if two variables are independent

If the expected frequencies of the categorical variables are less than 5 then use a fishers exact test