Test 1 Flashcards
Null Hypothesis
no difference between groups
Alternative Hypothesis
Why you might see the opposite or different result in your data
Scale Variable
ratio-interval scale with true zero point. Numbers can be compared as multiples of each other
example: years
interval–meaningful numbers with no true zero point. equally sized intervals
example: temperature
Nominal Variable
- nominal categories are differentiated
- male and female
Ordinal Variable
- information is ranked, placed in order of its position on a scale
- sometimes originally scale data and changed to ordinal like small medium large
- not quantitative
Continuous Data
- range depending on ability to measure
- ratio, ordinal and interval may be continuous or discrete
example: height of plants
Discrete Data
- integers
- nominal falls here
example: number of children, leaves
Models
- the explanation of an observed pattern is a model–thought or word or math
- series of statements that explain why observations have occurred
- verbal are non-math, but can be quantified with math
- empirical models are math equations to predict
- theoretical models study the processes themselves
Sample
- collection of observations
- the number is called the sample size
- measured characteristics is called statistics
- characteristics are called parameters
Statistical Inference
-inferring about a whole population from a sample
Simple Random sampling
-basic method of collecting
have to build this in
-have a machine pick a number for you to sample
Stratified Sample
-where you know there’s heterogeneity
-divide into homogeneous subgroup, no overlap of cluster groups
HETERO
Systematic sampling
-establish groups, take the nth group every time. HOMO
Clustered groups
- mix of stratified and systematic
- establish clusters and know they are HOMO within so can take systematic samples
What is random sampling
-underlying assumption of essentially are inferential stats
-all possible measures in population must have equal chance of being chosen
BIASED
-when certain measures are more likely to appear in sample than others
-can be ignored if known to have no effect
Significant Figures
-number you use tells you your range of error
rule 1: keep as many as possible to reduce rounding error
rule 2: for a mean, report one more place than data (lat one is uncertain) example: you say 2 cm. means one cm of error. so 1.5 to 2.459
DON’T overstate your accuracy. Naughty.
Frequency Distribution
- number of observations per category
- can visually graph and see that there is a normalish distribution and move forward
- understand underlying shape and identify outliers
Measure of central tendency
- stat that describes the concentration of middle of sample
- mean and median usually
Arithmetic mean
- mean or average for a population
- informs us of a central point of sample, hopefully population matches
Weighted arithmetic mean
- allows you to weight for frequency of sample values
example: average grades per class period , but may be most students in middle section so weight that one heavier than the others.
Probability
- likelihood of a given event expressed as either relative frequency or knowledge of given system
- if you repeat infinite number of times, what percentage will turn out this way?
Relative Frequency
- freq of all events/# of all events
- ranges 0 to 1 (0 to 100%)
- may be done with or without replacement
Permutation
arrangement in specific sequence
- linear n! or circular n!/(2n)
- taken X at a time n!/(n-X)!
- number combos n!/X!(n-X)!
Mutually exclusive events
- don’t overlap. ex. You can’t be on time and late at once
- ADDITION
Mot mutually exclusive events
- do intersect
- ADDITION minus the subset’s shared elements
- both can happen at same time
- MULTIPLY
ex. two kings in a deck of cards probability?
Type I error
If you reject Ho and Ho is true
Say there is a difference and there is not.
alpha error
false positive
Type II error
If you don’t reject Ho and Ho is false.
Say there is no difference and there is one.
beta power
false negative
Statistical Power
the probability of correctly rejecting the null when its false with respect to the sample population (1-B)
What affects error?
- larger size
- smaller variance
- type I and II error trade-off
- difference in means is large
- one tailed test has more power than a two tailed one
Confidence limits
- the limits of where we can expect 95 % of means from repeated samples to fall between
- measure of how precise the data is
Symmetry
-skewness measure of asymmetry
can be positive and negative
-can be symmetrical and not normal
Kurtosis
-heaviness of the tail
normal = 0
positive = long tail
negative = short tail
Deduction
theory–>hypothesis–>observe–>confirm
Induction
observe–>pattern–>hypothesis–>theory
Karl Popper
Advocated for the scientific method!
popperian falsification
trying to falsify a hypothesis
Ronald Fisher
invented the ANOVA
promoted testing hypothesis with the null hypothesis
Neyman and Pearson
say need a null and an alternative
Modern Hypothesis Testing steps
- specify Ho, Ha, and appropriate test stat
- specify significance level
- collect data by one or more random samples from population
- calculate value of stat, test whether Ho is true
- If probability of value or one greater is less than the specified significance level than conclude that Ho is false and reject
- If probability of value is greater than or equal to sig level conclude no evidence Ho is false and accept
Hypothesis must be:
- specific
- testable
- produce predictions
1sigma
68.27 % of measurements
2sigma
95.44 % of measurements
2.5sigma
98.76 % of measurements
3sigma
99.73 %
50 % of measurements
0.67sigma
95% of measurements
1.96sigma
99.9 %
3.29sigma
variance
- subtract each from mean, square and divide by n-1
- sum of squares
- often use square root which is STDEV
Standard error of mean
- STDEV/sqrt(n)
- error bars on histogram
- variation in sample mean
- if high, more sampling is needed and might produce different sample mean. not much confidence mean represents pop
- if low probably close to parametric value
Coefficent of variation CV
(SD/mean)*100
- unitless comparison
- independent proportion
Interquartile distance
the distance from center to the first fourth and the third fourth
Degrees of Freedom
n-1
- numbers of values in the final calculation free to vary
- depends on set parameters.
example: we set mean in STDEV so we subtract one