Week 19 Flashcards
What are three types of values?
Categorical - value comes from one of n non-numeric categories, e.g. favourite colour
Numerical - numerical values (no shit), e.g. height of student
Ordinal - one of n numerical categories, e.g. number of stars rating
What are mean, median and mode known as?
Measures of central tendency
What are descriptive statistics?
Describe/summarize the data e.g. compute an average. But don’t extrapolate
What are measures of variability?
Measure how spread out the data is around the mean
What are inferential statistics?
Statistics which make conclusions that go beyond the sample data
What are the two interpretations of probability?
Relative frequency: how often something happens on average
Degree of belief: subjective opinion of some individual regarding how certain an event is to occur (not really repeatable experiment)
What is the sample space and event?
Sample/outcome space: Set of all possible outcomes (e.g. {1,2,3,4,5,6} for rolling a die)
Event: subset of sample space (e.g. event E of getting a value less than 4: E={1,2,3}
How is a random variable used with probability?
Takes unique value for each event. E.g. experiment where 3 coins are tossed:
Y = number of heads
Range is 0-3
Y=0 corresponds to {TTT}
What are the two types of random variables?
Discrete: Takes countable values, e.g. number of heads
Continuous: real value e.g. 1.534
What is a discrete probability distribution?
P(X=x) gives probabilities for each possible value of x
What is a continuous probability distribution?
Defined by probability density function giving the probability X is in a certain range,
With normal distributions,
__% of the data is within the first standard deviation from the mean
__% is within two stddev
__% is within three stddev
68%
95%
99.7%
What is a standard normal distribution?
When mean = 0 and stddev = 1
How can a normal distribution X be converted to a Z distribution
Z = (X - mean) / stddev
What is the difference between population and sample?
Population is a universe of individuals you’re interested in (e.g. all people in Colchester, infinite number of coin flips)
Sample is a subset of a population that should be representative (e.g. 100 coin flips, 100 people from Colchester)
Difference between true mean and true s.d. and sample mean and sample s.d.
True = performed on population
Sample = performed on sample
What is the primary concern with using sample statistics?
Variability - hard to get a representative sample
What is the sampling distribution?
Taking a very large number of samples of size N, and plotting the sample statistic.
Then, random variable is the sample statistic, not actual values
What is the standard error?
The standard deviation of the sampling distribution
Aka, the uncertainty of the sample means. If I take different samples, how much do the means vary
What happens to the standard error as the sample size increases
Standard error decreases
How can you approximate the standard error from sample standard deviation?
s / sqrt(N)
where s = sample stddev, N = number of samples
What does the central limit theorem say?
As N becomes large the sampling distribution can be approximated by a normal distribution
~30+ samples reveal a normal dist
What are the implications of the central limit theorem?
Get one sample
Compute sample mean
Get probability of the sample mean under the sampling distribution
- Can get this probability without doing the sampling many times
What is the Z test?
Collect N samples, compute sample mean and standard error
z = (sample mean - mean) / standard error
Reject null hypothesis if z value < -1.96 or > 1.96
This represents 95% (1.96 standard deviations from mean)
What is the p value?
The probability beneath which you reject the null hypotheses
Look up the p value on a table, e.g. for 2 tailed hypothese, p<0.05 = 0.42
What is a common use of the z test?
Check if a sample mean is close to the population mean
What is a paired t-test?
Compares two population means where you have two samples in which observations in one sample can be paired with observations in the other sample
e.g. student’s scores before and after a module or course
Between-group testing is a form of ___________ testing
Non-paired
What is involved in between-group testing?
An experiment with two or more group that each have different testing conditions
E.g. control group and test group
Repeated-measures design is also known as
and is a form of ________ testing
within group design
paired testing
What is repeated-measures testing?
Using the same subjects with every branch of research
i.e. a longitudinal study where each testing condition is done at some point for each subject
What are the advantages and disadvantages of between groups design?
Advantages: multiple variables can be tested at the same time
Can save time
Disadvantages: potential scale can be impractical due to limited resources
selection of subjects may not be representative