Me, Myself and I Flashcards

1
Q

what does a double mean in stats

A

describes a continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a histogram

A

an accurate representation of the distribution of numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the median

A

the middle value - the 50% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the mean

A

the average - the sum of the collection of numbers divided by the number of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the mode

A

the value that occurs most frequently in a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the first quartile

A

middle number between the smallest number and the median of the data set. - the 25% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the second quartile

A

the median value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the third quartile

A

middle number between the median and the maximum value - the 75% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a population

A

all members of a defined group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a sample

A

a small subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a point estimate

A

the best estimate we have - more accurate with a larger subset or a subset closer to the total population - the sample average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

give an example of a 2 sided hypothesis

A

there is a 1:1 ratio of males and females

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

give an example of a 1 sided hypothesis

A

there are more females than males

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a one sided hypothesis

A

A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a two sided hypothesis

A

A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis – the direction does not matter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is chi squared

A

where the number of observations in each class are compared to those expected under the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does a very small p value indicate

A

either that the null hypothesis is incorrect or that there is some sort of bias in the sample - we are inclined to believe that our data diverge from the null hypothesis (an alternative hypothesis may explain our data)

18
Q

what is the p value

A

the probability that we would have observed a given deviation if the null hypothesis were correct

19
Q

what are type one errors

A

false positives i.e. p = 0.049

20
Q

what are type two errors

A

false negative i.e. p = 0.051

21
Q

describe the different parts of a box plot

A

middle box - the median
lower box - lower quartile
upper box - upper quartile
whiskers - 1.5x the interquartile range or the max and min values
dots out with the whiskers - the max and min values

22
Q

what is an incidental finding

A

an observation made in the data that we were not intending to find

23
Q

what is a t test

A

determines if the mean of one group is statistically different to the mean of another group

24
Q

what is the null hypothesis

A

default expectation that there is not connection between variables (or that there is no difference between them)

25
what is the 95% confidence interval
A 95% confidence interval has a 0.95 probability of containing the population mean. 95% of the population distribution is contained in the confidence interval. so we are 95% confident that our values lie within the defined range the range within which we are reasonably confident the population average is located
26
what is a regression model
describes relationships between the dependent and independent variables - implies a cause and effect relationship
27
what is a line of best fit
one line which explains the data best | all the points are as close to the line as possible - minimizing the residual space
28
what are residuals
the difference between the observed values and those predicted by the regression line of best fit
29
what is multiple r squared
varies from 0 - 1 - tells you the percentage accuracy of predictions based on the data
30
what is adjusted r squared
the same as multiple r squared except it accounts for other factors and is usually lower than multiple r squared
31
which statistical test provides the gradient and y intercept of data
t test
32
what is a proptable
shows proportions rather than raw numbers
33
if 95% confidence intervals over lap what does this mean
may not be a significant difference between the 2 categories - it provides insight into the robustness of any differences
34
explain the difference between 95% confidence intervals in large and small samples
large sample - interval is narrow | small sample - interval is wide
35
what is a multivariate model
uses multiple variables to forecast possible outcomes
36
what is a fisher test
used when the sample size is small - when there are less than 5 data entries in a category
37
what kind of condition is colour blindness
X linked recessive - males more likely to have it because they don't have masking by second X chromosome like females do
38
what is multiple testing
measuring one variable against several others
39
what is cherry picking
making something look like there is an association rather than using proper experimental design
40
what is a cross sectional study
observational study that analyses data from a representative subset at a specific point in time uses different samples for successive observations
41
what is a longitudinal study
involves repeated observations of the same variables over longer periods of time (can be a short timeframe too as long as observations are repeated) uses same sample for successive observations
42
what does a boxplot show you
a visual representation of the quartiles within numeric data.