Me, Myself and I Flashcards

1
Q

what does a double mean in stats

A

describes a continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a histogram

A

an accurate representation of the distribution of numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the median

A

the middle value - the 50% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the mean

A

the average - the sum of the collection of numbers divided by the number of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the mode

A

the value that occurs most frequently in a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the first quartile

A

middle number between the smallest number and the median of the data set. - the 25% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the second quartile

A

the median value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the third quartile

A

middle number between the median and the maximum value - the 75% value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a population

A

all members of a defined group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a sample

A

a small subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a point estimate

A

the best estimate we have - more accurate with a larger subset or a subset closer to the total population - the sample average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

give an example of a 2 sided hypothesis

A

there is a 1:1 ratio of males and females

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

give an example of a 1 sided hypothesis

A

there are more females than males

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a one sided hypothesis

A

A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a two sided hypothesis

A

A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis – the direction does not matter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is chi squared

A

where the number of observations in each class are compared to those expected under the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does a very small p value indicate

A

either that the null hypothesis is incorrect or that there is some sort of bias in the sample - we are inclined to believe that our data diverge from the null hypothesis (an alternative hypothesis may explain our data)

18
Q

what is the p value

A

the probability that we would have observed a given deviation if the null hypothesis were correct

19
Q

what are type one errors

A

false positives i.e. p = 0.049

20
Q

what are type two errors

A

false negative i.e. p = 0.051

21
Q

describe the different parts of a box plot

A

middle box - the median
lower box - lower quartile
upper box - upper quartile
whiskers - 1.5x the interquartile range or the max and min values
dots out with the whiskers - the max and min values

22
Q

what is an incidental finding

A

an observation made in the data that we were not intending to find

23
Q

what is a t test

A

determines if the mean of one group is statistically different to the mean of another group

24
Q

what is the null hypothesis

A

default expectation that there is not connection between variables (or that there is no difference between them)

25
Q

what is the 95% confidence interval

A

A 95% confidence interval has a 0.95 probability of containing the population mean. 95% of the population distribution is contained in the confidence interval.
so we are 95% confident that our values lie within the defined range
the range within which we are reasonably confident the population average is located

26
Q

what is a regression model

A

describes relationships between the dependent and independent variables - implies a cause and effect relationship

27
Q

what is a line of best fit

A

one line which explains the data best

all the points are as close to the line as possible - minimizing the residual space

28
Q

what are residuals

A

the difference between the observed values and those predicted by the regression line of best fit

29
Q

what is multiple r squared

A

varies from 0 - 1 - tells you the percentage accuracy of predictions based on the data

30
Q

what is adjusted r squared

A

the same as multiple r squared except it accounts for other factors and is usually lower than multiple r squared

31
Q

which statistical test provides the gradient and y intercept of data

A

t test

32
Q

what is a proptable

A

shows proportions rather than raw numbers

33
Q

if 95% confidence intervals over lap what does this mean

A

may not be a significant difference between the 2 categories - it provides insight into the robustness of any differences

34
Q

explain the difference between 95% confidence intervals in large and small samples

A

large sample - interval is narrow

small sample - interval is wide

35
Q

what is a multivariate model

A

uses multiple variables to forecast possible outcomes

36
Q

what is a fisher test

A

used when the sample size is small - when there are less than 5 data entries in a category

37
Q

what kind of condition is colour blindness

A

X linked recessive - males more likely to have it because they don’t have masking by second X chromosome like females do

38
Q

what is multiple testing

A

measuring one variable against several others

39
Q

what is cherry picking

A

making something look like there is an association rather than using proper experimental design

40
Q

what is a cross sectional study

A

observational study that analyses data from a representative subset at a specific point in time
uses different samples for successive observations

41
Q

what is a longitudinal study

A

involves repeated observations of the same variables over longer periods of time (can be a short timeframe too as long as observations are repeated)
uses same sample for successive observations

42
Q

what does a boxplot show you

A

a visual representation of the quartiles within numeric data.