Statistics for Dummies Flashcards
What % constitutes a high response rate to a survey?
the number of respondents divided by the number of surveys sent
70%
———-
Some statisticians will settle for nothing less.
Rarely does it get that high, but the lower the response rate the less credible the results
A 20% (say) rate could easily mean that more people in the population feel differently than the respondents.
In the research statistics process, there are 6 main steps. Starting with:
1) what is the question to be answered?
2) design a study
3) determine the group of people to be studied
…
What are the next 3?
4) collect the data
5) organise, summarise and analyse the data
6) draw conclusions from the summaries and graphs to answer the question
What is a ‘Parameter’?
Usually we only have a statistic from a sample, which is then said to ‘estimate the parameter
What is a ‘Statistic’?
data from a population is a ‘census’, and data summarised to a stat from a census is called a ‘parameter’ because it refers to the population.
What are ‘Categorical Data’?
Sometimes categories are recorded using numbers, like 1 for male and 2 for female but they don’t have any specific meaning
What are ‘Numerical Data’?
Also referred to as Quantitative or measurement data.
Examples: Height, Weight, IQ, Blood pressure
What is a ‘standard score’
Like +2 or -1
What is the ‘central limit theorem?
The ‘crown jewel’ of all statistics’
What are Z-Values
The distribution is then called a ‘standard normal distribution’ or ‘Z-Distribution’
What is a ‘Standard Normal Distribution’
It is the Z-Distribution
Useful for determining percentiles, and what data fall between two values
What is a ‘Z Distribution’?
It is the ‘Standard Normal Distribution’
Useful for determining percentiles, and what data fall between two values
a single sentence that describes an ‘Experiment’
And often their environment.
The purpose is to pinpoint a cause-and-effect relationship between two variables (drug vs health or placebo vs health)
What is a ‘blind experiment’
Where bias is controlled because the subjects do not know if they are in the control group or the treatment group
What is a ‘double blind experiment’?
Where bias is controlled on the part of both patients and researchers because none of them know
What is the purpose of sample statistics
To produce an ‘estimate’ of a population parameter
Explain a 10% Probability vs. ‘9 to 1 odds’
If a horse has a 1 in 10 chance of winning then it is 1/10 so 0.10 = 10%
But bookies use 9/10 vs 1/10. So the 10’s cancel out leaving 9/1 or ‘9 to 1’
Describe the ‘law of averages’
In the long term, results will average out to their expected value. In the short term no one knows what will happen.
In hypothesis testing, a ‘statistically significant result’ is related to chance in what way?
it means a result with a small probability of happening by chance
What is the range for a p-value?
Between 0 and 1
——
a small p-value indicates strong evidence AGAINST the null hypothesis
What does a small p-value indicate?
strong evidence AGAINST the null hypothesis
———
p-values are between 0 and 1
On a distribution graph or a histogram, if a long tail of data is on the right, which way is the data skewed?
skewed to the right
On a distribution graph or a histogram, if a long tail of data is on the left, which way is the data skewed?
skewed to the left
Does a histogram deal with numerical or categorical data?
You need to decide your own groups to put the numbers into
What does the empirical rule state?
In a normally distributed data set, about 68% of values are within 1 SD of the mean, 95% within two standard deviations and about 99.7% within three SD
——-
Following from Chebyshev’s inequality, it seems that even in non normal sets, about 89% of values lie within 3 SDs
How do you calculate a percentile? E.g 90th percentile of n values?
Order the data ascending. Multiply n x kth percentile desired. Round UP the result to nearest whole number. That number in the n set represents the kth percentile
——
Eg n is 1,2,3,4,5,6,7,8,9,10,11
Kth sought is 90%
11 * .9 = 9.9
Round up = 10
So 10 is at the 90th percentile
Is probability calculation effective in predicting short term behaviour?
No, it is effective when predicting long term behaviour
If only two outcomes are possible, what are the chances of one of them occurring?
It’s not necessarily 50/50.
The chances are based on the probability of the particular event, not because there are only two outcomes (otherwise every penalty goal attempt would be 50/50
Is probability calculation effective in predicting short term behaviour?
No, it is effective when predicting long term behaviour
If only two outcomes are possible, what are the chances of one of them occurring?
It’s not necessarily 50/50.
The chances are based on the probability of the particular event, not because there are only two outcomes (otherwise every penalty goal attempt would be 50/50
Any probability is a number between what and what..?
0 means impossible
1 means certain
All of the probabilities for all possible outcomes must add up to..?
the probability that an outcome does NOT happen is 1 minus the probability that the outcome DOES happen
in probability, what is an event a combination of..?
equal to the sum of the probabilities of the individual outcomes that make up the event
What is a ‘confidence interval’
A statistic, plus or minus a margin of error
How can you roughly work out the sample size needed to achieve a particular desired margin of error ?
1 / √n * 100 gives the + or - error so * 2 gives the margin.
In reverse pick your margin of error , say 5% / 2 = 2.5% / 100 = 0.025
(1/0.025)^2 = 1600 is the rough estimate of sample size needed for a +/- 2.5% margin
Presumably assuming normally distributed population and no bias in the sample