Stats Flashcards

1
Q

Continuous variable

A

Can take on any value within a given range.
An infinited number of poosible values, limited only by our ability to measure them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete variable

A

Can only take on certain distinct values within a certain range.
The scale is still meaninful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Ranked variable

A

A categorical variable in which the categories imply some order or relative posistion.
Numerical values are usually assigned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical variable

A

One in which the “value” taken by the variable is a non-numerical category or class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Dot plot

A

Like a bar graph but with dots.
One dot per data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Frequency table

A

Divide the number line into intervals.
Count the number of data points within each interval - frequency.
Relative frequency is the proportion of weights in each interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Guidelines for forming class intervals

(3)

A
  1. Use intervals if equal lengths with middpoints at convenient round numbers.
  2. For a small data set, use a small number if intervals.
  3. For a large data set, use more intervals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stem and leaf

A

ie:
2 1234557
3 033456
4 1234555667
5 1233

stem = tens digit
leave = list of units that take than tens digit - should be in order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Summary statistics

A

Any set of measurements has two properties: the central or typical value and the spread about that value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mean

A

Average
Sum of data / number of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Median

A

The value in the middle of all the data if it is ordered from smallest to largest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mode

A

Most common value in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interquartile range

A

Data are split into 4 groups.
How far apart groups 1 and 4 are.
Sort of medians but for quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Box and Whisker plot

A

Median and interquartile range shown as the box.
Whiskers are extended to the furthest point that isnt an outlier.
Outliers are points further than 1.5x the IQR and are shown as dots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard deviation

A

Measure of spread around the mean.
1. calculate mean
2. Calculate difference between mean and each value
3. square differences
4. Sum the squares
5. Divide by n-1
6. Square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sample variance

A

Better measure of spread around the mean than standard deviation.
1. calculate mean
2. Calculate difference between mean and each value
3. square differences
4. Sum the squares
5. Divide by n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Z scores

A

Shows how many standard deviations above the mean something is.
z = (data - mean)/std

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Bernouli Trial

(3)

A
  1. Result of each trial is a successs or failure
  2. Probability p of success is the same in every trial
  3. Trials are independent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Binomial random variable

A

x = number of successes
n = no. of repeated Bernouli trials
p = probability of success

p^x (1-p)^(n-x) times the binomial coefficient nPr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Finding binomial coefficient

A

___n!___
(k! (n - k)!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Normal/Gaussian distribution

(4)

A
  1. Symmetrical about the mean
  2. Bell shaped
  3. mean, median and mode are the same
  4. The two tails never touch the horizontal axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mean in binomial distribution

A

mean = np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Sample variance in binomial distribution

A

sample variance = np(1-p)

24
Q

Null hypothesis

A

What we assume to be true.
i.e: there is no significant difference

25
Q

Alternative hypothesis

A

What we are testing.
ie: There is a significant difference

26
Q

Type 1 error

A

Incorrect rejection of the null hypothesis

27
Q

Type II error

A

Incorrect acceptance of null hypothesis

28
Q

Chi-squared test

A

Do the number of data in different categories fit the null hypothesis?
Look up test stat on table
Degrees of freedom = categories - 1

29
Q

Chi squared equation

A

Chi squared = sum (O-E)^2 / E

O = observed frequency
E = expected frequency

30
Q

Limits on expected numbers - Chi squared

(3)

A
  • No expected category should be less than 1
  • No more than a fifth of the expected values should be less than 5.
  • It doesn’t matter what the observed values are.
31
Q

What to do if expected numbers don’t fit

A
  • Collect larger samples
  • Amalgamate categories
32
Q

Regression Analysis

A

Fits a straight line to a scatterplot.
* x is the independent variable
* y is the dependednt variable.

33
Q

SSE

A

SSE = Sum of squared differences between actual and predicted y values according to the regression lines

Sum of Squares Error

34
Q

Finding regression line equation

A

m = sum of each (x-mean)(y-mean) all over sum of each (x-mean)^2.
Gives gradient of line.
Can find intercept using mean values of x and y

35
Q

Regression line always goes through…

A

The means of x and y
(xmean, ymean)

36
Q

Sum of x-mean = y-mean =

A

0

37
Q

ANOVA

A

Analysis of variance.
Compares difference between the predicted and the mean (regression) and the actual value and the predicted value (error). These values are squared and summed then the SSR is divided by the Total to give R^2.

38
Q

SEM

A

Standard error of the mean
SD divided by square root of sample size.

39
Q

Correlation coefficient

A

Square root of r^2 given by the equation in ANOVA. GIven the sign of the slope.

40
Q

Non-parametric tests

(5)

A
  • Spearman’s rank correlation
  • Mann-Whitney test
  • Wilcoxon paired sample test
  • Kruskal-Wallis
  • Friedman
40
Q

Non-parametric tests

Definition

A

For when data are not normally distributed

41
Q

Spearman’s rank test

(4)

A
  • Measures the strength of association between two variables
  • Non-parametric
  • Use when the variables are not normally distributed, or the data are ordinal.
  • Gives an r value
42
Q

Mann-Whitney test

A
  • Non-parametric equivalent to the unpaired t-test.
  • Tests for significant differences between medians of two independent group.
  • Uses ranking
  • Uses table value of U
  • If calculated is lower, we reject Ho
43
Q

Wilcoxon paired sample

A
  • non-parametric equivalent to the paired t-test
  • Tests for significant differences between medians of two paired observations.
  • Uses table value
  • If test stat smaller, we reject Ho
44
Q

Kruskal-Wallis

A
  • non-parametric one way analysis of variance
  • Alternative to one-way ANOVA
  • Detect differences in the medians between 3 or more treatments of different subjects.
  • Extension of Mann-Whitney for more groups
  • Sample size doesn’t need to be the same.
  • Test statistic is compared with Chi squared distribution.
45
Q

Friedman’s

A
  • non-parametric two way analysis of variance
  • non-parametric alterantive to two-way ANOVA
  • detect difference in the medians between 3 or more treatments of the same subjects
  • size of the sample must be the same
  • Gives H value that is compared to chi squared table
46
Q

Negative skew

A

mean < median < mode

47
Q

Positive skew

A

Mode < median < mean

48
Q

t-test

A
  • unpaired
  • assumes equal variances
49
Q

One sample t-test

A

Is there a diffreence between the group and the population?
Is the mean what it should be?

50
Q

Two sample t-test

A

Are the means the same?

51
Q

Paired samples t-test

A

is there a difference between the mean at two points in time
Can counterbalance to remove extraneous variables.

52
Q

Shapiro test

A

Checks if data is normal

53
Q

ANOVA of multiple groups

A

compares means of 3 or more groups.
Tells us if there is a difference, but doesn’t tell us where.
If you get a significant result you can run a hsd test instead.

54
Q

Welch test

A

student t-test assumes an equal variance, if variance is not equal, we do a WElch t-test instead.

55
Q

Bonferoni Correction

A

When multiple tests are done, the p value compounds and the probability of type I error increases.
Bonferroni corrected p value = original / no tests