QDA Flashcards by Tio Adesola

3 ways to numerically summarise a categorical variable

Frequencies or counts
Relative frequencies
Relative cummulative frequencies

How well did you know this?

Not at all

Perfectly

How can categorical variables be summarised visually?

Bar and pie charts

How well did you know this?

Not at all

Perfectly

What are bar charts for and what are the y and x axis

Representing frequencies of each of the different categories, the y axis is the frequencies and the x axis are the categories

How well did you know this?

Not at all

Perfectly

What are pie charts for?

Representing the frequencies of each of the different categories as a slice of pie

How well did you know this?

Not at all

Perfectly

When describing the contents of a numerical variable we can look at different aspects of its distribution such as:

Measures of location such as the mean
Measures of spread and variability
Extreme values

How well did you know this?

Not at all

Perfectly

When is a t.test used?

When variables are independent and the errors are normally distributed. Use the mean to calculate

How well did you know this?

Not at all

Perfectly

What is Wilcoxon rank sum test?
How does it work

Non-parametric alternative to a t.test. (Used when we cannot assume a normal distribution)

Puts all measurements into one column and assigns a value to each value

How well did you know this?

Not at all

Perfectly

What does a scatter plot do?
What to look for and how to interpret

Display two numerical variables of interest along the x axis (independent) and y axis (dependent)

Whether it has a positive relation, linear, quadratic or exponential, strong relation, clear relation or outliers

How well did you know this?

Not at all

Perfectly

Two main types of analysis

Descriptive - Describing data using numerics or graphical

Inferential - Using sample data to make a conclusion on larger populations

How well did you know this?

Not at all

Perfectly

What are the main data types?

Categorical - Attributes observes for sampling unit. Binary categories

Numerical - Numerical value on a discrete, ordinal or continuous

How well did you know this?

Not at all

Perfectly

What is a confidence interval?

The likely range the mean/proportion would fall in if the exercise was repeated

How well did you know this?

Not at all

Perfectly

P value rule

P value <= a = Reject the null (significant)
P value > a = Fail to reject null (not significant)

(P value should be less than 0.05 for any difference to be significant)

How well did you know this?

Not at all

Perfectly

What does it mean to test a null hypothesis?

It is what you’re trying to disprove. It is the given facts

The mean has a specific value against an alternative hypothesis.

H0: u = u0
H1: u =/ u0

How well did you know this?

Not at all

Perfectly

What are the type 1 and 2 error probabilities

a = p(type 1 error) = p(reject H0 | H0 is true)

B = p(type 2 error = p(fail to reject H0 | H0 is false)

How well did you know this?

Not at all

Perfectly

How to test for normality

Quantile - Quantile (Q-Q plot)

How well did you know this?

Not at all

Perfectly

Numerical tests for normality

Study These Flashcards

Kolmogorov-Smirnov (K-S) test
Shapiro-Wilks test

How do you test for variance?
Give the definition of each test

Study These Flashcards

ANOVA is the main test for variance as it used to determine if there is a statistically significant difference between two or more categorical groups by testing the differences of means by using variance

Fishers F test which involves dividing the larger variance by the smaller variance

What is a prop test?

Study These Flashcards

A test to find the confidence interval for the mean of a population from a sample (proportion)

Testing the proportions in several groups are the same by using their means

What can we use for hypothesis testing?

Study These Flashcards

T.test. The mean for a sample from a population

What is a correlation test used for?
Provide an example

Study These Flashcards

Used for numerical data as a pre-step to linear regression

Eg speed vs distance

What type of variables does linear regression use?

Study These Flashcards

Two continuous variables that are numeric for both the independent and the dependent

What is a pearsons chi squared test (x2)

Study These Flashcards

Used to discover whether there is a relationship between two categorical variables

What is the difference between a one way ANOVA and two way?

Study These Flashcards

One way ANOVA is a parametric test used to determine whether there are any significant differences between the means of two or more independent variables

Two way ANOVA is testing the effect of two independent variables on a dependent variable

How to graphically show the variance in a categorical and continuous variable?

Study These Flashcards

Box plots

Name 4 diagnostic plots to test lm models

Residual vs fitted QQ plot Scale location Residuals vs leverage

What look for in residual vs fitted diagnostic plot

It should look scattered otherwise suggest issues with model assumptions

What to look for in a QQ plot

Needs to be a straight line for all plotted values

How to analyse models in lm

Discuss coefficients Linear relations Significant SSR and SSE ratio R2 value Outliers Unwanted patterns in residuals requiring transformation Check if they fit the assumption of homoscedascity

What different types of models are there?

Linear regression Multiple regression ANOVA and ANCOVA Logistic regression

When do you use ANOVA? When do you use One way and multi-way

When all explanatory variables are categorical One way is used when there is one factor or categorical independent variable Multi way is when there’s more than one categorical independent variable

Different types of transformation techniques for models

Log dependent variable Square the independent variable 1/ the independent variable Joining categories

How to graphically represent a fully numeric dataset What does it do?

Through the plots() It plots all the numeric variables all at once with each other

When do you use a logistic regression?

When all the variables are categorical and the dependent variable is binary

What do you need to use a chi squared test for?

When you want to find out if two variables are independent If the expected frequencies of the categorical variables are less than 5 then use a fishers exact test

QDA Flashcards

(34 cards)