introduction Flashcards
a variable is
a set of characteristics that describe ana spect of participants in research e.g. gender, blood pressure
two main types of variable
categorical
quantitative
categorical data
usually facts rather than numerical
- participants are classified into categories
two types of categorical
nominal and ordinal
nominal variables
unordered labelled characteristics
- binary variables (just two categories)
- observations can be assigned a code in the form of a number where the numbers are simply labels
- can count but not order
example of a nominal variable
blood group: A, B, AB, O
ordinal variable
- small set of ordered categories
- categories might be labels or numbers
- obs can be ranked
e. g. house numbers and swimming level
example of ordinal variable
disease severity: none, mild, mod, sevre
categorical data are recorded as
numbers which represent specific categories e.g.
Gender: (1) male, (2) female
quantitative variables
valies have quantitative meaning. The higher the number the more there is of the concept
e.g. the tiger number for age means you’re older
quantitative variables are also known as
continuous
distribution def
refers to the diff values that occur and the frequency with which they occur for a given variable
categorical data can be described using
- frequency tables
- bar charts
how to describe quanitiaitve data
average. variation, symmetry
average
what value characterises the middle of distribution
variation
speed, dispersion, how far apart the values are from each other
symmetry
for each person that has a score below the average is there a corresponding person with the score the same distance about the avergae
types of average
mean, median mode
mean
sum of soccer divided by the number of scores
median
rank the scores in order and the median is the value that divides the data in 2
mode
the most frequently occurring score
when is mode not useful
in quantitive data-there may be more than one mode, each value in study might appear only once- mode could be low/high
disadvantages of the range
- sensitive to unusually extreme values (outliers)
- dependent on sample size- as sample size gets larger the range cannot get smaller, but it can get larger.
symmetry of quantitative variable
o Examples of symmetrically distributed date:
• 1,2,3,4,5
• -2,0,2
• 3,3,10,17,17
graphical summary of quantitative data
- histograms
- dotplots
- box and whisker plot
what is the definition of standard deviation
the spread of data around the mean
- the average difference between the spare and the mean
standard error
The standard error is the estimated standard deviation or measure of variability in the sampling distribution of a statistic. A low standard error means there is relatively less spread in the sampling distribution. The standard error indicates the likely accuracy of the sample mean as compared with the population mean.
the lower the standard deviation
the more accurate the data
Inter-quartile rane
IQR spans the middle 50% of score i.e. range between the lower and upper quartiles
lower quartile
is the value below which 1/4 of socks lie
1/4(n+1)th score (25th percentile)
upper quartile
is the value above which 1/4 of score lie
3/4(n+1)th score (75th percentile)
histogram
a graph where the heights of the rectangular bars/bins are used to indicate the relative frequency which with values in specific ranges occur
for histograms to show a clear shape, at least…
50 observations are required
symmetrical histogram
bell curve
positive skewed histogram
skewed to the left
negative skewed histogram
skewed to the right
bimodal histogram
will have two peaks
uniform histogram
rectangle shape
dot plot
is like a histogram that is turned back to front and flipped on its side
in a dot plot each dot represents
an Ob
the length of the brass of dot plots indicate
how common the value is
when are dot plot especially useful
when plotting distributions for small sample sizes
The Box and whisker lot graph indicated
- median
- lowe quartile
- upper quartile
- rage that contains most values
- outliers
outliers
extreme observations which are either low or high values
in box and whiskers what would represent a positive skew
if the top part of the box is thicker than the bottom part and top whisker is slightly longer than the bottom
what sort of summary statistics should be used for symmetric data (bell curve)
summary statistics that make use of all the data
what sort of summary statistics should be used for asymmetric data (positive or negatively skewed)
summary stats that are insensitive to extreme values