Me, Myself and I Flashcards
what does a double mean in stats
describes a continuous variable
what is a histogram
an accurate representation of the distribution of numerical data
what is the median
the middle value - the 50% value
what is the mean
the average - the sum of the collection of numbers divided by the number of numbers
what is the mode
the value that occurs most frequently in a set of data
what is the first quartile
middle number between the smallest number and the median of the data set. - the 25% value
what is the second quartile
the median value
what is the third quartile
middle number between the median and the maximum value - the 75% value
what is a population
all members of a defined group
what is a sample
a small subset of the population
what is a point estimate
the best estimate we have - more accurate with a larger subset or a subset closer to the total population - the sample average
give an example of a 2 sided hypothesis
there is a 1:1 ratio of males and females
give an example of a 1 sided hypothesis
there are more females than males
what is a one sided hypothesis
A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis.
what is a two sided hypothesis
A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis – the direction does not matter.
what is chi squared
where the number of observations in each class are compared to those expected under the null hypothesis
what does a very small p value indicate
either that the null hypothesis is incorrect or that there is some sort of bias in the sample - we are inclined to believe that our data diverge from the null hypothesis (an alternative hypothesis may explain our data)
what is the p value
the probability that we would have observed a given deviation if the null hypothesis were correct
what are type one errors
false positives i.e. p = 0.049
what are type two errors
false negative i.e. p = 0.051
describe the different parts of a box plot
middle box - the median
lower box - lower quartile
upper box - upper quartile
whiskers - 1.5x the interquartile range or the max and min values
dots out with the whiskers - the max and min values
what is an incidental finding
an observation made in the data that we were not intending to find
what is a t test
determines if the mean of one group is statistically different to the mean of another group
what is the null hypothesis
default expectation that there is not connection between variables (or that there is no difference between them)
what is the 95% confidence interval
A 95% confidence interval has a 0.95 probability of containing the population mean. 95% of the population distribution is contained in the confidence interval.
so we are 95% confident that our values lie within the defined range
the range within which we are reasonably confident the population average is located
what is a regression model
describes relationships between the dependent and independent variables - implies a cause and effect relationship
what is a line of best fit
one line which explains the data best
all the points are as close to the line as possible - minimizing the residual space
what are residuals
the difference between the observed values and those predicted by the regression line of best fit
what is multiple r squared
varies from 0 - 1 - tells you the percentage accuracy of predictions based on the data
what is adjusted r squared
the same as multiple r squared except it accounts for other factors and is usually lower than multiple r squared
which statistical test provides the gradient and y intercept of data
t test
what is a proptable
shows proportions rather than raw numbers
if 95% confidence intervals over lap what does this mean
may not be a significant difference between the 2 categories - it provides insight into the robustness of any differences
explain the difference between 95% confidence intervals in large and small samples
large sample - interval is narrow
small sample - interval is wide
what is a multivariate model
uses multiple variables to forecast possible outcomes
what is a fisher test
used when the sample size is small - when there are less than 5 data entries in a category
what kind of condition is colour blindness
X linked recessive - males more likely to have it because they don’t have masking by second X chromosome like females do
what is multiple testing
measuring one variable against several others
what is cherry picking
making something look like there is an association rather than using proper experimental design
what is a cross sectional study
observational study that analyses data from a representative subset at a specific point in time
uses different samples for successive observations
what is a longitudinal study
involves repeated observations of the same variables over longer periods of time (can be a short timeframe too as long as observations are repeated)
uses same sample for successive observations
what does a boxplot show you
a visual representation of the quartiles within numeric data.