statistics Flashcards

1
Q

random error

A

lecture : can be conceptualised as sample variability

notes: can’t avoid it only way is to test the whole population which is impossible but you can minimise by increasing sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bias (systemic error)

A

a difference between the observed value and true value due to all causes except sampling variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

random sample

A

each member of the population has an equal chance of being chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

properties of a good sample

A

representative by structure
random
representative by number of cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to select a sample

A
  1. select a sampling method
  2. define target population
  3. determine sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

high power

A

large sample size
little scatter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

low power

A

small sample size
scatter is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

definition of paired data

A

when 2 or more measurements are made on the same observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

descriptive statistics

A

organising and summarising the data , tables, histograms, pie charts etc, tables (frequency distributions and relative frequency distributions)

measures of central tendency (mean , median, mode)

central tendency describes location and variation describes SPREAD (red book lec 2 )

measures of variability ( range , variance, standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

inferential statisics

A

using the sample that you worked with to make a general conclusion

uses probability to determines how confident we can be that the conclusions we derive are correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are measures ovariation in descriptive statisitcs

A

IQ range
variance
SD
range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mean

A

it’s the balance point

can be heavily affected by outliers so outliers can make the mean a bad measure of central tendancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

median

A

it’s the middle value when the variables are ranked in order
its the point that divided a distribution into 2 equal halves

its unaffected by outlierss? not sure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if you have normal distrubted data ( symmetric) how does this affect the central tendancy

A

mean and median will be the same and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what happens in skewed data to the central tendency

A

the mean lies further towards the skew than the median does (because rememeber mean is affected by outliers)

in skewed date the median and mean are more towards the skew than the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

mode

A

the most common data point. its possible to have more than 1. if all values are unique there is no mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SD

A

takes into account all individual deviations

the larger the SD , the greater the variation around the mean

google is a measure of the amount of variation or dispersion of a set of values.[1] A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

when is SD zero

A

only when all values are the same

19
Q

is SD affected by outliers

A

YES

20
Q

can range be affected by outliers

A

yes

21
Q

IQ range

A

a measure of variability based on dividing data set into quartiles

ranke data set

22
Q

what is the best to use in symmeterical data

A

mean + SD

23
Q

what is best to use in skewed data

A

median and IQ (as not affected by outliers )

24
Q

3 sigma rule - NB

A

ONLY WORKS for normal distrubted date

68% of the data lie within 1 SD of mean above and below

95% data lie within 2 SD of the mean
above and below

99% data lie within 3 SD of the mean above and below

25
Q

central limit theoreom

A
  1. create a population with a known distribution that is not normal
  2. randomly select many samples of equal size from that population
  3. tabulate the means of these samples and graph the frequency distrubtion

t states that if your samples are large enough the distribution of the means will approximate a normal distribution, even if the population is not normal or ‘gaussian’

26
Q

what can be the cause of outliers

A

inavalid data entry
biological diversity
random chance
experimental error
skewed distrubtion

27
Q

tests use to eliminate outliers from results

A

chauvenet’s criterion
grubbs test
pierce criterion

28
Q

confidence interval for population mwans what do we assume

A
  1. normally distributed
  2. random representative sample
  3. independent observations
29
Q

what does confidence interval depends on

A
  1. sample mean (leads to your population mean )
  2. SD
  3. sample size
  4. degree of confidence
30
Q

what is the purpose of confidence interval for the mean

mistakes

A

it gives a range of values around the mean where the true population is expected to be located

31
Q

heal

A
32
Q

regression analaysis models

A

statistical models - describe the relationship between 2 variables
deterministic - hypothesize exact relationships
probabilistic- hypothesize 2 components

  1. deterministic
  2. random error
33
Q

types of regression models

A

simple ( 1 explanatory variable) divide into simple and linear

multiple ( 2+ explanatory variables divide into simple and linear

34
Q

regression modelling

A
  1. determine the problem
  2. specify model
  3. collect data
  4. do descriptive data analysis
  5. estimate unknown parameter
  6. evaluate model
  7. use model for predict

(remember model is 25 )

35
Q

defintion of regression analayss

A

R A is helpful in assessing specific forms of relationships between variables and the ultimate objective is to predict the or estimate the value of 1 variable corresponding to a given value of another variable .

36
Q

advantages of mean

A
  1. very sensitive measure
  2. can be combined with the means of other groups to give the overall mean
  3. considers all the available information
37
Q

disadvantages of mean

A
  1. affected by outliers
  2. can only be used on interval or ratio data
  3. can only be used if you have a normal distrubtion
38
Q

advantges of median

A
  1. unaffected by outliers
  2. can be used with non numerical date
39
Q

disadvatnages mediana

A
  1. only takes into account order - value is ignored
40
Q

disadvantage of mode

A

is a terminal statistic - a given subgroup could make this measure unrepresentative

41
Q

advantges of mode

A
  1. quick and easy
  2. unaffected by outliers
  3. can be used at any level of meaures
41
Q

advantges of mode

A
  1. quick and easy
  2. unaffected by outliers
  3. can be used at any level of meaures
42
Q

assumptions about bivaradte date For each value of X there is a normally distributed subpopulation of Y values.

A
  1. For each value of X there is a normally distributed subpopulation of Y values.

2.For each value of Y there is a normally distributed subpopulation of X values.

  1. The joint distribution of X and Y is a normal distribution called the bivariate normal
    distribution.
  2. The subpopulations of Y values all have the same variance.
  3. The subpopulations of X values all have the same variance.