Week 19 Flashcards
What are three types of values?
Categorical - value comes from one of n non-numeric categories, e.g. favourite colour
Numerical - numerical values (no shit), e.g. height of student
Ordinal - one of n numerical categories, e.g. number of stars rating
What are mean, median and mode known as?
Measures of central tendency
What are descriptive statistics?
Describe/summarize the data e.g. compute an average. But don’t extrapolate
What are measures of variability?
Measure how spread out the data is around the mean
What are inferential statistics?
Statistics which make conclusions that go beyond the sample data
What are the two interpretations of probability?
Relative frequency: how often something happens on average
Degree of belief: subjective opinion of some individual regarding how certain an event is to occur (not really repeatable experiment)
What is the sample space and event?
Sample/outcome space: Set of all possible outcomes (e.g. {1,2,3,4,5,6} for rolling a die)
Event: subset of sample space (e.g. event E of getting a value less than 4: E={1,2,3}
How is a random variable used with probability?
Takes unique value for each event. E.g. experiment where 3 coins are tossed:
Y = number of heads
Range is 0-3
Y=0 corresponds to {TTT}
What are the two types of random variables?
Discrete: Takes countable values, e.g. number of heads
Continuous: real value e.g. 1.534
What is a discrete probability distribution?
P(X=x) gives probabilities for each possible value of x
What is a continuous probability distribution?
Defined by probability density function giving the probability X is in a certain range,
With normal distributions,
__% of the data is within the first standard deviation from the mean
__% is within two stddev
__% is within three stddev
68%
95%
99.7%
What is a standard normal distribution?
When mean = 0 and stddev = 1
How can a normal distribution X be converted to a Z distribution
Z = (X - mean) / stddev
What is the difference between population and sample?
Population is a universe of individuals you’re interested in (e.g. all people in Colchester, infinite number of coin flips)
Sample is a subset of a population that should be representative (e.g. 100 coin flips, 100 people from Colchester)
Difference between true mean and true s.d. and sample mean and sample s.d.
True = performed on population
Sample = performed on sample
What is the primary concern with using sample statistics?
Variability - hard to get a representative sample
What is the sampling distribution?
Taking a very large number of samples of size N, and plotting the sample statistic.
Then, random variable is the sample statistic, not actual values
What is the standard error?
The standard deviation of the sampling distribution
Aka, the uncertainty of the sample means. If I take different samples, how much do the means vary
What happens to the standard error as the sample size increases
Standard error decreases
How can you approximate the standard error from sample standard deviation?
s / sqrt(N)
where s = sample stddev, N = number of samples
What does the central limit theorem say?
As N becomes large the sampling distribution can be approximated by a normal distribution
~30+ samples reveal a normal dist
What are the implications of the central limit theorem?
Get one sample
Compute sample mean
Get probability of the sample mean under the sampling distribution
- Can get this probability without doing the sampling many times
What is the Z test?
Collect N samples, compute sample mean and standard error
z = (sample mean - mean) / standard error
Reject null hypothesis if z value < -1.96 or > 1.96
This represents 95% (1.96 standard deviations from mean)