Statistics Flashcards

1
Q

What is descriptive statistics?

A
  1. Summarises and described data

2. Are concerned with measures of central tendency and measures of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a variable?

A

Is an attribute that has two or more divisions, characteristics or categories that can be measured or observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a constant?

A

An attribute that does not change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Levels of measurement of variables

A
  1. Nominal - city of birth
  2. Ordinal - pain scale
  3. Interval - test scores
  4. Ratio - age, height, weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the methods to summarising data?

A
  1. Tables
  2. Graphs
  3. Charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the members of central tendency?

A
  1. Mean
  2. Median - middle-ranking number
  3. Mode - the most frequently occurring number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are measures of dispersion?

A
  1. Percentiles
  2. Range
  3. Variance/standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is range?

A

The difference between the largest and smallest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are percentiles?

A

Percentiles are numbers that divide a distribution or area of a histogram into 100 parts of equal area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is variance?

A

The variance is the average of the squares of the deviation of the observation from their mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is standard deviation?

A

The SD is the square root of the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is normal distribution?

A
  1. Symmetrical, unimodal, bell-shaped distribution of values
  2. The peak occurs at the mean value
  3. The median, mode and mean all coincide at the same point
  4. Mean is 0, and the standard deviation is 1
  5. Normal distribution is a good descriptor of real data
  6. It is a good approximation of results that occur by change
  7. Many statistical procedures are based on normal distributions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different shapes of frequency distributions?

A
  1. Bell-shaped distribution of values
  2. Asymmetric distribution of values (skewed to the left or right)
  3. Kurtosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is kurtosis?

A

Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is probability?

A

A value defined to be between 0 and 1.

Measures ‘how likely’ it is that an event occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a binomial distribution?

A

Counts the number of ‘successes’ in a series of trials.

- Only two possible outcomes; “success” and “failure”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Poissin distribution?

A

Counts the number of events occurring in a fixed time period

  • events occur at an average period
  • events occur independently of the time since the last event
  • approximates to Binomial distribution when N is large and 𝜋 is small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a continuous random variable?

A

A continuous random variable is a random variable that takes on an infinite range of values.

Continuous data is described by a probability density function - a smooth curve between two points on the horizontal axis signifies probability of an observation failing between those points.

The probabilities are associated with intervals rather than single points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Steps in hypothesis testing

A
  1. Identify the research question
  2. Specify the null (Ho) and alternative (Ha) hypothesis
  3. Select the appropriate test statistic
  4. Collect data
  5. Perform required calculations
  6. Evaluate findings and report
  7. Develop appropriate interpretations of the conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the null hypothesis?

A

The null hypothesis always states that there is no differences between groups, between treatments, or that one factors does not depend on the other.
We want to prove the opposite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the alternative hypothesis?

A

This is what we want to prove

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the consequences of the null hypothesis and alternative hypothesis?

A

If we can determine that the results of an experiment are unlikely to have occurred by sampling error, we are inclined to reject the null hypothesis.

If the results are likely to have occurred by sampling error, we are inclined not to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Directional vs non-directional hypothesis

A
  1. Non-directional hypothesis - only looks for a difference

2. Directional hypothesis - looks for at the direction of difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a p-value?

A

The p-value is the probability of obtaining a test statistic with a value as extreme or more extreme than the one determined by the sample data.

The decision about whether there is enough evidence to reject the null hypothesis is made by comparing the p-value to the value if a (the level of significance of the test).

Common values for a are 0.05 or 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Type 1 errors

A

Incorrectly rejecting a true null hypothesis.

  • False positive
  • Level as significance (a), is the probability of making a type 1 error (usually set at 0.05)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Type II errors

A

Failing to reject the null hypothesis when is should be rejected

  • rate of false negative
  • beta, the probability of a Type II error is usually set at 0.2 or 0.1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is power?

A

The probability that the null hypothesis is rejected when the null hypothesis is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What factors does power depend on?

A
  1. The true deviation from H0
  2. The choice of a
  3. The variance of the test statistic
  4. The sample data
29
Q

Explain levels of significance

A

The a level determines the critical value, Tcrit, of the test statistic T.
If the observed value of T is greater than Tcrit, it is considered significant and we can therefore reject H0. The most frequently used a level is 0.05

30
Q

What is the relationship between a and β?

A

With the same sample size, if you decrease your chance of making a type 1 error you increase the chance of making a type 2 error and vice versa

31
Q

What does MCID stand for?

A

Minimal Clinically Important Difference is the smallest treatment effect that is of clinical significance.

32
Q

What is the p-value?

A

The significance level (p-value) is the probability that a statistical result as extreme as the one observed would occur assuming that the null hypothesis is true.

33
Q

What is the relationship of the p-value with hypothesis testing?

A

P-value can be the basis for deciding whether or not to reject the null hypothesis.
The decision about whether there is enough evidence to reject the null hypothesis is made by comparing the p-value of a, the level of significance of the test

34
Q

What is the central limit theorem?

A

Central limit theorem states that the distribution of means of a sample taken from a distribution of any shape tends to be a normal distribution.

35
Q

What are the key assumptions of t-tests?

A
  1. Normality
  2. Homogeneity of variance
  3. Independence of observations

When these assumptions are violated - other tests are used.

  • Non-parametric tests
  • Transformations
  • Hierarchical or clustered models
36
Q

What are confidence intervals?

A

The confidence interval (CI) is a range of values that’s likely to include a population value with a certain degree of confidence.

37
Q

When would you use a two-tailed test?

A

A two-tailed test is appropriate if you want to determine if there is any difference between the groups you are comparing. For instance, if you want to see if Group A scored higher or lower than Group B, then you would want to use a two-tailed test. This is because a two-tailed test uses both the positive and negative tails of the distribution. In other words, it tests for the possibility of positive or negative differences.

38
Q

When would you use a one-tail test?

A

A one-tailed test is appropriate if you only want to determine if there is a difference between groups in a specific direction. So, if you are only interested in determining if Group A scored higher than Group B, and you are completely uninterested in possibility of Group A scoring lower than Group B, then you may want to use a one-tailed test.

39
Q

What is the advantage of using a one-tail test over a two-tail test?

A

The main advantage of using a one-tailed test is that it has more statistical power than a two-tailed test at the same significance (alpha) level. In other words, your results are more likely to be significant for a one-tailed test if there truly is a difference between the groups in the direction that you have predicted. This is because only one tail of the distribution is used for the test.

40
Q

What type of test do you use? One tail-test or two-tail test?

A

When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction about the direction of the difference (e.g., Group A scoring higher than Group B), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Group A scoring lower than Group B)

41
Q

Test statistic equation

A

T = observed value - expected value/ (standard error)

A test statistic is a number calculated by a statistical test. It describes how far your observed data is from the null hypothesis of no relationship between variables or no difference among sample groups.

42
Q

T values for a two-tailed t-test

A

tn-1(1-a/2) or tn-1(a/2)

43
Q

What is ANOVA?

A

Analysis of Variance (ANOVA)

44
Q

ANOVA vs t-test

A

T-tests compares means from 1 independent variable with two levels - example. comparing mean income by education level (secondary vs tertiary).

ANOVA compares 1 independent variable with more than two levels - primary, secondary, tertiary.

45
Q

Why use ANOVA?

A

If we wanted to test if the groups are different using t-test, we would need to perform 3 different comparisons, each associated with an a of 0.05.

However, this approach increases the probability of making a type 1 error.

46
Q

Basic principles behind ANOVA

A

Analysis of variance, or ANOVA, is a statistical method that separates observed variance data into different components to use for additional tests. A one-way ANOVA is used for three or more groups of data, to gain information about the relationship between the dependent and independent variables.

47
Q

Basic assumptions of ANOVA

A
  1. Dependent variable must be measured in interval or ratio scale
  2. The variance is the same within each group
  3. Residuals are normally distributed
  4. Independence of observations
48
Q

What are residuals?

A

Differences between observed and fitted values.
Used for checking model assumptions;
- normality
- homogeneity of variances

49
Q

Tests of normality

A
  1. Shapiro-Wilk and Kolmogorov-Smirnov statistics
    - tests that null hypothesis that the residual are normally distributed
    - A significant results implies that there is evidence that the residual are not normally distributed
50
Q

Homogeneity of variance

A

Levene statistics

  • tests equality for the dependent variable across all level combinations of the factors
  • a significant result implies that there is evidence of heterogeneity of variances
51
Q

Multiple comparisons tests

A

Once you have determined that significant differences exist among the means, post hoc range tests and their pair-wise multiple comparisons can determine which means are significantly different from which other means.

52
Q

What is Kruskal-Wallis?

A

The Kruskal–Wallis test (1952) is a nonparametric approach to the one-way ANOVA. The procedure is used to compare three or more groups on a dependent variable that is measured on at least an ordinal level.

53
Q

When is the Kruskal-Wallis test used?

A
  1. Samples are not normally distributed
  2. Groups do not have equal variances
  3. Data consists of ranks - ordinal
54
Q

Analysing Kruskal-Wallis results

A

The null hypothesis is rejected when H is large

55
Q

What is a two-way univariate ANOVA?

A

The two-way ANOVA compares the mean differences between groups that have been split on two independent variables (called factors). The primary purpose of a two-way ANOVA is to understand if there is an interaction between the two independent variables on the dependent variable.

Efficient in that several questions are addressed within one study.

56
Q

What is main effect?

A

An effect attributable to a single factor (IV)

57
Q

What is interaction?

A

Tests whether the effect of one factor is the same at each level of the other factor.

58
Q

What is repeated measures ANOVA?

A

The repeated measures ANOVA compares means across one or more variables that are based on repeated observations under different treatments/conditions.

59
Q

What is F-ratio?

A

The F ratio is the ratio of two mean square values. If the null hypothesis is true, you expect F to have a value close to 1.0 most of the time. A large F ratio means that the variation among group means is more than you’d expect to see by chance.

60
Q

What is ANCOVA?

A

ANCOVA removes any effect of covariates, which are variables you don’t want to study. For example, you might want to study how different levels of teaching skills affect student performance in math; It may not be possible to randomly assign students to classrooms.

ANCOVA can then be used as a means to eliminate unwanted variance on the dependent variable. This allows the researcher to increase test sensitivity. Adding reliable and necessary variables to these models typically reduces the error term.

ANCOVA reduces within-group error variances and eliminates confounds.

61
Q

What are the assumptions for using ANCOVA?

A
  1. Samples are independent
  2. Errors are normally distributed
  3. Variances are homogenous
  4. Relationship of the DV to the covariate must be linear with each treatment group
  5. Regression coefficients are homogenous
    - slope of regression lines for each of the groups considered separately are all approximately the same
  6. Treatment has no effect on the covariate
62
Q

Measures of association

A
  1. Correlation

2. Regression

63
Q

What is correlation?

A

The association between two variables

64
Q

What is regression?

A

Regression analysis is a statistical method that helps us to analyze and understand the relationship between two or more variables of interest.

65
Q

What are measures of association?

A

Measures of association quantify how changes in the values of the dependent variable relates to changes in the values of the independent variable.

66
Q

Correlation vs. regression

A

Correlation describes the magnitude and direction of the linear relationship between two continuous variables.

Regression describes the process of predicting one variable from the other variable.

67
Q

Linear correlation

A

When there is a strong linear relationship, the correlation will be close to +1 or -1, depending on whether the gradient of the line of best fit is positive or negative.

When there is hardly any linear relationship between the two variables the correlation will be close to zero.’

Points can have perfect association but the correlation can be zero if the association is not linear.

68
Q

What is spearman rank correlation?

A

Alternative measure of association that is more robust, less likely to be influenced by a smaller number of outliers

Tests for monotomous relationships for at least ordinally sealed parameters

Calculated by first ranking X and Y, then using the Pearson correlation coefficient formula on the ranks.

69
Q

What is statistical interference?

A

The purpose of statistical inference is to estimate this sample to sample variation or uncertainty.