Statistics 1 Flashcards
What is the definition of a population?
Every member with selected characteristics and sharing common property in a specific region
What is the definition of a sample?
A representative sub-set of a given population, unrelated and chosen at random
What is the difference between the response (dependent) variable and the explanatory (independent) variable?
The response (dependent variable is of interest in an experiment, it depends on another factor (independent/explanatory) variable to cause change.
What are the two sub-sets of qualitative data?
Nominal Data - categorical information that lacks inherent order or ranking
Ordinal Data - information with order or ranking, differences between values are not quantifiable e.g. survey responses or educational levels
What are the two sub-sets of quantitative data?
Discontinuous - obtained by counting integers
Continuous - (Most used) obtained by measurement e.g. height, BMI
What type of data is:
Number of carbon atoms in a molecule
Discontinuous quantitative
What type of data is:
Mass of a chemical compound weighed on a balance
Continuous quantitative
What type of data is:
Absorbance measured using a spectrophotometer
Continuous quantitative
What type of data is:
Gender of students in a class
Nominal Qualitative
What type of data is:
Educational levels of students in a class
Ordinal Qualitative
Define Accuracy
Closeness of measurements to the true value
Define precision
Closeness of repeated measurements to eachother
Define Data Set
Collection of information based on an experiment or research question, collected in term of observations and variables, ready to be processed, analyzed, distributed or shared.
Define descriptive statistics
Summarize a set of data values in terms of center and spread
What does average show?
The general tendency of the data
What distribution of data can you find the true mean?
Normally distributed data
Define variance
Average squared deviation from the mean
Define Standard Deviation
Variability or spread of the data from the mean of the sample
Define Standard Error
Deviation from the mean of the populations, this tends to be estimations used to calculate confidence
What is the Confidence Interval?
What percentage confidence you are that if someone repeated the test with a different sample, you would get the same results
Give the basic principles of coefficient of variance (CoV)
- Larger the number the larger the spread
- Normally expressed as a percentage of the mean
- Useful for comparisons of 2 data sets in different units
Give the formula for Coefficient of variance (CoV)
CoV = (SD/mean)*100
Define H0
The null hypothesis - there is no correlation/ difference/ association
Define H1
Quantitative or alternative hypothesis
there is a correlation
H1 and H0 are mutually exclusive
What is the P value
The probability (chance) that the null hypothesis is true with 95% confidence. 0.05 (5%) is the statistical cut off of rejection of the H0.
What is the true cutoff for the P value?
0.05/number of predictor variables
Why is it best t under go 2-tailed tests rater than on-sided
A hypothesis can either be one sided or 2 sided and you can test for statistical significance in both directions. If you only test in one direction you may miss an effect in the other direction!
What is the odds ratio?
A value indicating the strength of the relationship between 2 variables in data. Compared the relative odds of the occurance of the outcome of interest (cancer vs no cancer), given the exposure to the variable of interest (age)
What does Odds Ratio mean in relationship to 1
- OR = 1 variable does not effect the odds of the outcome
- OR > 1 variable associated with higher odds of an outcome (Increase the risk of the response variable)
- OR < 1 variable associated with lower odds of an outcome (Decrease the risk of the response variable)
What is the Z score ?
Odds ratio / standard error of the odds ratio
What are statistics tests used for?
To test the probability that the null hypothesis is true
When would you use z-test?
When the sample size is small (n<30) and/or the population variance is known
When would you use t-test?
When the sample size is small (n<30) and/or the population variance is unknown
When would you use Chi-squared?
Goodness of fit - examine whether the observed results are in order with the expected values (categorical data)
When would you use Fisher Exact?
Goodness of fit - gauge if there is a significant difference between proportions of the categories in two group variables
When would you use F-test?
Compare variances of 2 samples or the ratio of variances between multiple groups
When would you use ANOVA?
Uses F-tests to statistically test the equality of means on 3 or multiple groups of quantitaive variables
When would you use Wilcoxon Rang
Test the equality of means on 3 or multiple groups - used when data is not normally distributed
What does the result of a t- statistic mean?
The higher the value, the lower the chance that the two samples means are from the same population
The higher the value of t the more likely that the two samples means are to be different.
What is a Type I error?
False positive
Occurs if you reject the H0 while you are supposed to accept it due to data bias
What is a Type II error?
False negative
Occurs when you accept the null hypothesis when you were supposed to reject it due to a lack of power