lec 5 variables and scales Flashcards
what is a variable
a condition/ characteristic that can change or have different values
defining characteristics
- attribute that describes a person/ place/ thing
- value can vary betw/ diff entity’s
*
qualitative vs quantitative variables
qualitative: values that are names/ labels
quantitative: numeric variables that measure quantity
discrete vs continuous variables
continuous variable: a variable that can have any value bet/ it’s minimum and max values
discrete: variable that can’t have any value betw/ min and max
univariate vs bivariate data
Univariate data: when a study consists of only one variabe
Bivariate data: when a study examines the relationship bet/ 2 variabels
what is a nominal scale
- lowest statistical measurement level
- this scale is given to items that are divided into categoris without any order or structure
- e.g.
- gender
- eye colour
- blood type
- e.g.
what is the ordinal scale
consists of variables that have an inherent order to the relationship among diff categories
- a ranking of responses that may have diff meaning among individuals
- allows gross order but not the relative distance between them as the distance is not equal
- properties of ordinal scale:
- 1)Identity: quality being measured
- 2) Magnitude: amount of the quality being measured gives a quantitative distance betw/
*
what is the interval scale
variables that have a constant and equal distances between values but the zero point is arbitrary
properties:
- identity
- magnitude
- equal distance: shows how the difference bet/ points
e.g. IQ score, pain scale w/ no,
what is a ratio scale
top level of measurement with all the properties of abstract an abstract number system but with an absolute zero
properties
- identity
- magnitude
- equal distance
- absolute zero: allows how many times greater one case is from another
- allows use of all mathematical operations
- e.g.
- wieght,
- pulse rate
- respiratory rate
- e.g.
what is a measure of central tendency / central location
a single value that attempts to describe a set of data by identifying the central position within the data set
- mean
- median
- mode
describe the mean §
- most familiar measure of central tendancy
- most common value in the data set even though its not one of the values=> min error
- used wi/ discrete and continous data, latter most common
- sum of all the values divided by no of values in data set to min error
- includes every value of data set
- only central tendency w/ the sum of deviations of each value from mean = 0
- sample mean = X bar
- populaiton mean = µ
what is the main disadvantage of the mean
very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)
mean can be skewed by these values
if so the median is a better measure of central tendency
when not to use the mean and use the median instead
presence of outliers
_skewed distributio_n- the mean moves away from the centre but the median stays central and is least influenced
- in normal distribution: mean= median=mode
what is the median
the middle score for a set of data that has been arranged in order of magnitude
- least affected by outliers and skewed data
- order the values and find te middle, if even no. find mean of the two
what is the mode
most frequent score in the data set.
- highest bar on histogram
- used for categorical data when the most popular option is sought after
*
problems with the mode
-
non unique,
- causes problems when 2 values are equally popular
- even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
- if the mode is far from the rest of the data in the set then it’s inaccurate
which data sets are best used in normal and sxewed distributions
normal distribution
- mean or median can be used but mean ideal because:
- as it has least amount of error since it includes all values in data set
- any change in the scores affects the value of the mean but not mode /meadian
skewed distribution
- the mean is dragged in the direction of the skew so the median is best
- increased skew increases the ddx bet/w mean and median
normal distribution w/ NON NORMAL DATA SETS by normality tests
- median> mean as a rule of thumb unless there’s no reall dx betw/ median and mean
match the variable to the type of central tendency preffered
nominal= Mode
Ordinal=Median
interval/ratio(non skew)= Mean
interval/ratio skew=Median
what is a measure of sprad // measure of dispersion
describes the variability i a sample/ population
- used wlongside measures of central tendency to give a describtion of the overall data
what is the purpose of measuring a data spread
- shows how well the central tendency represents the data
- large spread suggests large diff betw individual scores and vv for small spread
- consists of
- range
- quartiles
- absolute deviation
- standard deviation
what is the range
the difference between the highest and lowest scores in a data set and is the simplest measure of spread
- range =max value-min value
- sets the boundraries for scores
- useful for measuring critilically high or low thresholds
- detects errors when inputing data
what are quartile and interquartile ranges
quartiles: breaks data into quarters
even numbers: finds the mean of the 2 scores at the quarterly places in the data set
odd number: the value at 25th, 50th and 75th, positions are the quartiles
Q2 i=median
benefits of qurtiles and what is interquartile range
- less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets
-
interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
- Q3-Q1= interquartile range
- semi interquartile range: half the interquartile range= (Q3-Q2) /2
Drawback of quartiles
they dont rake into account every score in the data set
what is the absolute/ variance/ standard deviation
how to calculate absolute & mean absolute deviation
shows the amount of deviation/variation that occurs around the mean score
total variability: addition of the deviation of each score/ by the number of scores
the choice of absolute deviation, variance and standard deviation depends on the type of statistic
- easiest way to calc deviation = individual score minus mean score
- values above mean are +ve and below are -ve
- total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation
how to calc variance
achieves positive values of the deviations from the mean by squaring them
addition of te squared deviations gives the sum of squares
the sum of squares is divided by n
- if the values in the data are spread out from the mean then the variance is a large number
- if the values are closer to the mean then the variance is small
- problems with variance
- squaring gives more values to extreme scores so is susceptible to outliers
- the units of variance are squared so they differ from the units of the data set so they can’t be directly related to data set values
- calulating the standard deviation solves this problem
what is the standard deviation
a measure of the spread of scores w/in a data set
sample SD’s divver from population SD’s in their calculation
when to calculate the pop SD
- if data on entire pop is present
- if the sample is all you’re interested in and don’t want to generalize your result
when to use the sample SD: if you have sample data and wish to generalize to population
NB: the sample SD is not a deviation of the sample itself but an estimate of the pop SD based on sample date
which type of data of data should be used to calculate SD
- SD is used along w/ the mean to summarize continous data NOT CATEGORICAL DATA
- anly appt if the data is normally distributed/ non skewed
define the EMPIRICAL RULE
: for a normal distribution nearly all of the data will fall within three standard deviations of the mean
what are the three parts of the empirical rule
- 68% of data falls inthe 1st SD from the mean: µ ± 1xSD
- 95% falls w/in 2 SD’s: µ ± 2xSD
- 99.7% fall w/in 3SD;s: µ ± 3xSD
aka the 3 sigma rule
when is the 3 sigma rule used
used for giving an esitmateof the data collection if the entire pop was surveyed for when the right datta is diffucult/impossibe to get
applies to a random variable applied to the normal distribution (bell curve)
doesn’t apply to non normal distributions but chebyshev’s theorem can be used for those