Stats and experimental design Flashcards
Define correlational studies
Observing what naturally goes on in the world without directly interfering with it
Define experimental studies
one or more variable is systematically manipulated to see their effect (alone or in combination) on an outcome variable. Statements can be made about cause and effect
Define longitudinal studies
This term implies that data come from people at different age points, with different people representing each age point.
What are the 4 rules about “cause”?
- Cause is the producer of an effect, while an effect is produced by a cause.
- The cause can be a person, object, situation, or event that can result in something, while an effect is the result of the actions of the person or the outcome of some chain of events that have happened.
- The cause will in a way explain the reason why the effect happened in the first place.
- The cause naturally precedes an effect, while the effect will always follow it.
What is a simple hypothesis?
Where there is a relationship between two variables one is called independent variable or cause and other is dependent variable or effect.
What is a complex hypothesis?
Where a relation between variables exists and there is more than one dependent variable.
What is an empirical hypothesis?
When the theory is put to test using actual observation and experiment
What is a null hypothesis?
No relationship between dependent and independent variables
What is an alternative hypothesis?
The opposite of a null hypothesis - when there are multiple hypotheses and one is selected which more workable and most efficient
What is a logical hypothesis?
When the hypothesis is verified logically. The four cannons are: agreement, disagreement, difference and residue
What is a statistical hypothesis?
When the hypothesis can be verified statistically. Will always be regarded as statistical regardless of it being logical or illogical
Continuous data - interval variable
The difference between 1 and 2 is equivalent to the difference between 99 and 100
Continuous data - ratio variable
The same as an interval variable, but the ratios of scores on the scale must also make sense
Categorical data - binary variable
There are only two categories e.g. dead or alive/sink or swim/0 or 1
Categorical data - Nominal variable
There are more than two categories e.g. omnivore, vegetarian, vegan, or fruitarian.
Categorical data - Ordinal variable
The same as a nominal variable but the categories have a logical order, e.g. whether people got a fail, a pass, a merit or a distinction in their exam.
Quantitative data
Measurable/objective measures on numeric value
Qualitative data
Pain indices; subjective assessments of “condition”; Morphotypes
Measurement error
The discrepancy between the actual value we’re trying to measure, and the number we use to represent that value
e.g. You weigh 75 kg. You stand on your bathroom scales and they say 80 kg. The measurement error is 5 kg.
Index scatterplot
A simple plot () of data points.
Can be used to detect “runs” or can be plotted in a sorted form.
Frequency distribution/histogram
A bar graph where all the bars touch. Values of observations on x and a bar showing how many times each value occurred in the data set.
More continuous than a bar graph
Dotplot
A simple stacking of dots at each measure - histogram but dots that don’t touch rather than bars that touch
Boxplot
Displays the interquartile ranges, the mean and the median
Calculate the Z-score
Z = (sample(x) - mean(xbar))
__________________
standard deviation (s)
Define kurtosis and what types there are
Kurtosis = the 'heaviness' of the tails Leptokurtic = heavy tails (thin peaked line) Mesokurtic = middle Platykurtic = lights tails (wide flat line)
What is a skew and how would a positive and a negative skew be describe?
The skew is the symmetry of the distribution. A positive skew has bunched data at the low values with the tail pointing to the high values and a negative skew vice versa
How to calculate the degrees of freedom
Degrees of freedom is the sample size (n) minus the number of parameters estimated from the data
(usually df = n - 1)
Calculate variance
variance = sum of squares/degrees of freedom
What is a student’s t-distribution and how does it relates to parameter estimation
Student’s t test is used for comparing two sample means, where the samples are independent, the variances are constant, and errors are normally distributed. The test statistic is the number of standard errors by which the two sample means are separated
Describe a Type I Error
Occurs when we believe that there is a genuine effect in our population when, in fact, there isn’t. The probability is the α-level (usually .05)
Describe a Type II Error
Occurs when we believe that there is no effect in the population when, in reality, there is. The probability is the β-level (often .2)
What are true positives and true negatives?
If it is said that there is an effect on the population and there in-fact is, this is a true positive.
This is vice versa for a true negative.
Relative Risk Ratio equation
(a/(a+b)) / (c/(c+d))
Odds Ratio equation
(a/b) / (c/d)
use when any of the values are below 30
Odds Risk Probability value meanings
OR < 1 = event less likely in group 1
OR = 1 = event equally as likely in both groups
OR > 1 = event more like in group 1
Relative Risk Ratio value meanings
RR = 0 = none of the cases in group 1 had the event occur while x number of cases in group 2 had the event occur RR = 1 = neutral result: the chance of an event occurring for one group is the same for an event occurring for the other group
Sum of squares (SS)
(X1 - Xbar)*
Variance
(X1 - Xbar)* / (n-1)
Covariance
(X1 - Xbar)(Y1 - Ybar) + (X2 - Xbar)(Y2 - Ybar) + (X3 etc) / n -1
Standard Deviation (SD/S)
sqrt ( (X1 - Xbar)* / (n-1) )
Standard Error (SE)
s / sqrt(n)
Confidence Interval (CI)
Xbar +/- t ( s/sqrt(n) )
Chi test
SUM OF ( (O-E)* / E )
where O = observed frequency and E = expected frequency
Mean Square (MS)
sum of squares (ss) / degrees of freedom (df)
F-value (F)
mean square of row 1 / mean square over row 2