DATA DESCRIPTION Flashcards
2 types of statistics
DESCRIPTIVE: describe study population
INFERENTIAL: what we know to infer what we don’t know
3 key factors in designing a research
- type of variables
- level of measurements
- extraneous + confounding variables
research design model (6)
- current knowledge
- choose hypothesis to test
- design experiment
- do experiment
- statistical analysis
- interpret + report
5 factors involved in good experimental research design
1) sample size and type of sample
2) accurate variables to reduce error
3) valid measuring instrument
4) practical experiment?
5) cost
why is it important to use research design
- smooth operation
- efficiency
- blueprint for planning
- reduce erros
- reliability
what makes good research design? (3)
1) reliability
2) replication
3) validity
4 types of validity
measurement
internal
external
ecological
Type of variable (3)
CONTINUOUS - temp (figure on a scale)
DISCRETE - no. of symptoms
CATEGORICAL - ethnicity, gender
measurement variables (type of scale) (4)
INTERVAL
RATIO
NOMINAL
ORDINAL
interval scale
order of magnitude
equal intervals on scale
ratio scale
order of magnitude
equal intervals
absolute zero point
nominal scale
attributes only named
e.g: gender - male female
ethnicity - white, black, asian
ordinal scale
attributes only ordered
e.g: 1st, 2nd, 3rd
difference between EXTRANEOUS variables
and
CONFOUNDING variables
EXTRANEOUS: may effect other variables, not acknowledging in study
CONFOUNDING: type of extraneous, directly effects our variables
calculate media formula
(n+1) / 2
what does data look like when its:
1) + skewed
2) normally distributed
3) - skewed
1) to the left
2) equal on both sides
3) to the right
what is a factor
e.g: two categories: undergrad v post grad
to compare their media, mode etc
MAKING DECISION
if both variables are categorical use…
a contingency table
MAKING DECISION
if you have one categorical variable and one continuous use…
compare means/medians
or
collapse and use contingency tables
what type of data is
1) mean
2) Median
best with
1) normal
2) skewed
how to calculate a percentile value
percentile
————— X (n+1)
100
n = number of observations
what is RANGE
difference between highest and lowest value
what is INTERQUARTILE RANGE
difference between upper and lower quartile
what is STANDARD DEVIATION
measures average deviation from mean
what is VARIANCE
standard deviation squared
what are the upper and lower fences
if values are either side of these they are outliers
how to calculate upper and lower fence
Lower fence:
LQ - (1.5 X IQR)
Upper fence:
UQ - (1.5 X IQR)
elements of a box plot: (top to bottom)
5
1) biggest observation below UF
2) UQ
3) Median
4) LQ
5) smallest observation above lower fence
UQ– LQ = IQR
What does SD show
large SD and small SD
spread of data
LARGE SD: data more spread out
SMALL SD: data closer to mean
equation for standard deviation
square root of:
no. of observation - 1
difference between categorical v continuous data
CATEGORICAL
data adds to a whole
e.g: BMI categories
CONTINUOUS
data on individuals over time
ratio/interval e.g: height
what scale is categorical data measured on
nominal
ordinal
what scale is continuous data measured on
ratio
interval
graphs for categorical data (3)
1) bar chart
2) stacked bar chart
3) pie chart
graphs for continuous data (6)
1) stem and leaf plot
2) histogram
3) box plot
4) bar chart w error bars
5) scatterplots
6) line graph for time series data
when should a scatterplot be used
2 continuous variables
what is an adjusted v non adjusted axis
unadjusted = start from zero
adjusted = start from e.g: 40 as that’s the lowest figure
calculate standard error
square root of number of observations
when should
1) SD
2) SE
be used
1) describe data you have
2) show how confident you are in estimate of the mean
what does a histogram show
distribution of data
puts into categories e.g: age 1-5, 5-10
x axis = categories
y = frequency in each category
bars touch if continuous
how does a stem and leaf diagram work
stem : all but last digit
leaf : last digit
e.g: 43, 46, 47, 53, 54, 62
4| 3 6 7
5| 3 4
6| 2
adding or subtracting by constant number to each value in data when scaling
1) __________ SD
2) __________ mena
1) doesn’t change
2) changes mean by amount added or subtracted
when multiplying or dividing by scale
1) SD __________
2) mean _________
1 and 2) increases/decreases by proportion x or / by
when should
SCALING
and
STANDARDISATION
be used
SCALE: one person weight in lbs , one in kg
STANDARDISE: a boy and girl at 26 months weight is 10kg
standardise using gender
what does z score show
number of SD’s an observation is from the mean
+ Z score =
- Z score =
+ = observation is above the mean
- = observation is below the mean
what does it mean if the Z score is zero?
observations equals the mean
Z score equation
SD
1) mean of Z score =
2) SD of z score =
only when ….
1) 0
2) 1
working with whole data set they were collected from
in normal distribution curve
what is the % from -1SD to +1SD
68.2%
imagine a normal distribution curve split into 6 ‘columns’ ,
name the % of each column going up then down
0.13%
2.15%
13.6%
34.1%
34.1%
13.6%
2.15%
0.13%
on a ‘NORMAL DISTRIBUTION TABLE’ what does each column mean
along the left side: first digit in number
along the top: second digit in number
e.g: 0.66
0.6 along left side
0.06 along top
when can you use a normal distribution table
e.g Q: what proportion of data lies between mean and 0.66
what’s the difference between a|:
SAMPLE
and
POPULATION
SAMPLE: selection from population
POPULATION: whole, large group, everyone fit criteria
theory of sampling (3)
1) STATISTICAL ESTIMATION
point/interval estimate
2) TESTING HYPOTHESIS
accept/reject null
3) STATISTICAL INFERENCES
general population statement
limitations of sampling (5)
- less accurate
- changing of units
- misleading conclusion
- need special knowledge
- is sampling possible?
probability sampling methods (4)
1) simple random sampling
2) stratified sampling
3) systematic sampling
4) multistage sampling
non probability sampling methods (4)
1) deliberate sampling
2) convenience sampling
3) snowball sampling
4) quota sampling
PROBABILITY sampling methods:
+ and -
+:
detailed info of pop
measure precisely, unbiased
-:
require skill + expertise
time to plan
cost
simple random sampling
characteristic
everyone has equal chance of being chosen
random number generator
stratified sampling
what are strata and what should they have
population split into strata (similar groups)
strata needs homogeneity
same ratio in each strata
systemic sampling
+ and -
order population, e.g: every 5th person
+: simple
smaller variance v
ordered population
- : estimate error
summarise multistage sampling
e.g:
1) randomly select region
2) randomly select school in region
3) randomly select children in school
multistage sampling + and -
+:
complete pop list not needed
only need info on selected sample
cheaper if geographically defined
-:
larger errors
NON PROBABILITY SAMPLING + and -
+:
include important units
practical
representative of importantance
-:
risk of bias
not reliable
convenience sampling
when to use (3)
use when:
- no clear population
- sampling not clear
- complete list of source not available
snowball sampling
contact few people in target group
get more people contacts from these
quota sampling
non random
select categories then quota e.g: 40% men 60% women
actively look for people to fit this
bias
cheaper
factors that effect reliability of sample (5)
size of sample
representativeness
homogeneity
unbiased
parallel sampling - another sample for test
3 errors in samples
1) SAMPLING VARIABILITY - diff samples from sam pop have diff SD + mean
2) SAMPLING ERROR - mean of sample different to mean of pop
3) NON SAMPLING ERROR - error when asking / recording results
SE formula
√ number in sample
when to use SE instead of SD
when using sample means
to determine precision
The Central Limit theorem (3)
1) will have ‘normal distirbution’
2) mean of sample means = mean of population
3) SE = SD