statistics Flashcards by Maryam Garkuwa

random error

lecture : can be conceptualised as sample variability

notes: can’t avoid it only way is to test the whole population which is impossible but you can minimise by increasing sample size

How well did you know this?

Not at all

Perfectly

bias (systemic error)

a difference between the observed value and true value due to all causes except sampling variability

How well did you know this?

Not at all

Perfectly

random sample

each member of the population has an equal chance of being chosen

How well did you know this?

Not at all

Perfectly

properties of a good sample

representative by structure
random
representative by number of cases

How well did you know this?

Not at all

Perfectly

how to select a sample

select a sampling method
define target population
determine sample size

How well did you know this?

Not at all

Perfectly

high power

large sample size
little scatter

How well did you know this?

Not at all

Perfectly

low power

small sample size
scatter is large

How well did you know this?

Not at all

Perfectly

definition of paired data

when 2 or more measurements are made on the same observational unit

How well did you know this?

Not at all

Perfectly

descriptive statistics

organising and summarising the data , tables, histograms, pie charts etc, tables (frequency distributions and relative frequency distributions)

measures of central tendency (mean , median, mode)

central tendency describes location and variation describes SPREAD (red book lec 2 )

measures of variability ( range , variance, standard deviation)

How well did you know this?

Not at all

Perfectly

inferential statisics

using the sample that you worked with to make a general conclusion

uses probability to determines how confident we can be that the conclusions we derive are correct

How well did you know this?

Not at all

Perfectly

what are measures ovariation in descriptive statisitcs

IQ range
variance
SD
range

How well did you know this?

Not at all

Perfectly

mean

it’s the balance point

can be heavily affected by outliers so outliers can make the mean a bad measure of central tendancy

How well did you know this?

Not at all

Perfectly

median

it’s the middle value when the variables are ranked in order
its the point that divided a distribution into 2 equal halves

its unaffected by outlierss? not sure

How well did you know this?

Not at all

Perfectly

if you have normal distrubted data ( symmetric) how does this affect the central tendancy

mean and median will be the same and mode

How well did you know this?

Not at all

Perfectly

what happens in skewed data to the central tendency

the mean lies further towards the skew than the median does (because rememeber mean is affected by outliers)

in skewed date the median and mean are more towards the skew than the mode

How well did you know this?

Not at all

Perfectly

mode

the most common data point. its possible to have more than 1. if all values are unique there is no mode

How well did you know this?

Not at all

Perfectly

takes into account all individual deviations

the larger the SD , the greater the variation around the mean

google is a measure of the amount of variation or dispersion of a set of values.[1] A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

How well did you know this?

Not at all

Perfectly

when is SD zero

Study These Flashcards

only when all values are the same

is SD affected by outliers

Study These Flashcards

YES

can range be affected by outliers

Study These Flashcards

yes

IQ range

Study These Flashcards

a measure of variability based on dividing data set into quartiles

ranke data set

what is the best to use in symmeterical data

Study These Flashcards

mean + SD

what is best to use in skewed data

Study These Flashcards

median and IQ (as not affected by outliers )

3 sigma rule - NB

Study These Flashcards

ONLY WORKS for normal distrubted date

68% of the data lie within 1 SD of mean above and below

95% data lie within 2 SD of the mean
above and below

99% data lie within 3 SD of the mean above and below

central limit theoreom

1. create a population with a known distribution that is not normal 2. randomly select many samples of equal size from that population 3. tabulate the means of these samples and graph the frequency distrubtion t states that if your samples are large enough the distribution of the means will approximate a normal distribution, even if the population is not normal or 'gaussian'

what can be the cause of outliers

inavalid data entry biological diversity random chance experimental error skewed distrubtion

tests use to eliminate outliers from results

chauvenet's criterion grubbs test pierce criterion

confidence interval for population mwans what do we assume

1. normally distributed 2. random representative sample 3. independent observations

what does confidence interval depends on

1. sample mean (leads to your population mean ) 2. SD 3. sample size 4. degree of confidence

what is the purpose of confidence interval for the mean mistakes

it gives a range of values around the mean where the true population is expected to be located

heal

regression analaysis models

statistical models - describe the relationship between 2 variables deterministic - hypothesize exact relationships probabilistic- hypothesize 2 components 1. deterministic 2. random error

types of regression models

simple ( 1 explanatory variable) divide into simple and linear multiple ( 2+ explanatory variables divide into simple and linear

regression modelling

1. determine the problem 2. specify model 3. collect data 4. do descriptive data analysis 5. estimate unknown parameter 6. evaluate model 7. use model for predict (remember model is 25 )

defintion of regression analayss

R A is helpful in assessing specific forms of relationships between variables and the ultimate objective is to predict the or estimate the value of 1 variable corresponding to a given value of another variable .

advantages of mean

1. very sensitive measure 2. can be combined with the means of other groups to give the overall mean 3. considers all the available information

disadvantages of mean

1. affected by outliers 2. can only be used on interval or ratio data 3. can only be used if you have a normal distrubtion

advantges of median

1. unaffected by outliers 2. can be used with non numerical date

disadvatnages mediana

1. only takes into account order - value is ignored

disadvantage of mode

is a terminal statistic - a given subgroup could make this measure unrepresentative

advantges of mode

1. quick and easy 2. unaffected by outliers 3. can be used at any level of meaures

advantges of mode

1. quick and easy 2. unaffected by outliers 3. can be used at any level of meaures

assumptions about bivaradte date For each value of X there is a normally distributed subpopulation of Y values.

1. For each value of X there is a normally distributed subpopulation of Y values. 2.For each value of Y there is a normally distributed subpopulation of X values. 3. The joint distribution of X and Y is a normal distribution called the bivariate normal distribution. 4. The subpopulations of Y values all have the same variance. 5. The subpopulations of X values all have the same variance.

statistics Flashcards

(43 cards)