New Flashcards
What methods should you use to summarise ordinal categorical data
Median
Interquartile range
What is the purpose of a pie chart
To show frequencies/proportions/percentages
What is purposive sampling
Sampling when the researcher uses their expertise to choose a sample that is most useful for the purpose of the research
How many and what kind of variable would you use with a means plot
One scale (aka continuous) variable or two categorical variables
What is root cause analysis
It is a method used to solve problems by first identifying the root cause of the problem.
What methods should you use to summarise continuous normally distributed data
Mean
Standard deviation
In kurtosis what numbers should the score be between to show the data is not too skewed
+1 and -1
How do the standard error and the margin of error relate
As the standard error increases, the margin of error also increases.
What overall method of test would you use when working with a skewed continuous dependent variable
Non-parametric test
What specific test would you use when comparing three or more measurements on the same subject when the data is not normally distributed
Friedman test
What is stratified sampling
The population is divided into subpopulations (strata) with key differences eg gender, age
What is the purpose of a means plot
Looks at the combined effect of two categorical variables on the mean of one scale variable
What methods are used to determine outliers
Standard deviation/ z score
Interquartile range
Generally, when can ordinal data be analysed with parametric tests
When there are 7 or more categories and the data is approximately normally distributed
Why is mean imputation considered bad
it completely removes the accountability for feature correlation. This also means that the data will have low variance and increased bias, adding to the dip in the accuracy of the model, alongside narrower confidence intervals.
What specific test would you use when comparing the averages of three or more independent groups when the data is normally distributed
One way ANOVA
What is the meaning of covariance
Covariance is the measure of indication when two items vary together in a cycle. The systematic relation is determined between a pair of random variables to see if the change in one will affect the other variable in the pair or not.
What is observational data
Observational data correlates to the data that is obtained from observational studies, where variables are observed to see if there is any correlation between them
How do you find degrees of freedom
How many independent variables you have minus one
What overall method of test would you use when working with a normally distributed continuous dependent variable
Parametric test
What is selection bias
Selection bias is a phenomenon that involves the selection of individual or grouped data in a way that is not considered to be random
What are ordinal variables
Categorical variables with an obvious order
Eg most - least likely
What are continuous scale variables
Variables that can take any variable
Eg height
What is the purpose of a scatter graph
Shows the relationship between two variables and helps detect outliers
What is the purpose of a histogram
To show the distribution of results
What specific test would you use when comparing three or more measurements on the same subject when the data is normally distributed
Repeated measures ANOVA
What is the relationship between the confidence level and the significance level in statistics?
The significance level is the probability of obtaining a result that is extremely different from the condition where the null hypothesis is true.
The confidence level is used as a range of similar values in a population.
Both significance and confidence level are related by the following formula:
Significance level = 1 − Confidence level
How many and what kind of variable would you use with a scatter graph
Two scale (aka continuous) variables
When would it be better to use the median than the mean to study data
When there are a lot of outliers that can positively or negatively skew data
What is a survivorship bias
The survivorship bias is the flaw of the sample selection that occurs when a dataset only considers the ‘surviving’ or existing observations and fails to consider those observations that have already ceased to exist.
What methods should you use to summarise nominal categorical data
Mode
What are the two types of scale variables
Continuous
Discrete
What are 5 ways of handling missing data
Winsorizing the data
Prediction of missing values
Deletion or rows with missing data mean/median imputation
What are the two main types of categorical variables
Ordinal
Nominal
What is the central limit theorem
The central limit theorem states that the normal distribution is arrived at when the sample size varies without having an effect on the shape of the population distribution
What are right skewed distributions
A right-skewed distribution is one where the right tail is longer than the left one. But, here the mean > median > mode.
What specific test would be used for assessing the relationship between two categorical variables when the data is not normally distributed
Chi-Squared test
What kind of summarising statistics would you get from a pie chart
Class percentages