Measurements and Descriptive Stats Flashcards
Define ‘population’
- every member with selected characteristic
- e.g. humans born in UK
Define ‘sample’
- subset of given sample which represents the population
- unrelated
- chosen at random
Define ‘variable’
- any characteristic or property that can take one of a range of values
Define ‘parameter’
- numerical constant in any particular instance
Define ‘data’
- refers to items of information
- singular = datum, or data value
Name the 3 types of data
- quantitative
- ranked
- qualitative
Define ‘quantitative data’
- characteristics whose differing states can be described by ‘real’ numbers
Define ‘ranked data’
- ordinal scale, ranked in order of magnitude
- e.g. order of birth of children in a family
Define ‘qualitative data’
- categorical; not measured against numerical scale nor ranked
- non numerical and descriptive
Name the 3 types of quantitative data
- continuous
- discontinuous
- derived data
Define ‘continuous data’
- obtained by measurement
- usually measured against numerical scale
- significant figures/decimal places
Define ‘discontinuous data’
- obtained by counting
- data must be whole numbers
- e.g. number of colonies on Petri dish
Define ‘derived data’
- calculated from direct measurements
- e.g. ratios, percentages, rates etc.
Name 4 types of measurement scales
- nominal
- ordinal
- interval
- ratio
What is a nominal scale?
- classifies objects into categories based on descriptive characteristic
- only scale suitable for qualitative data
What statistics are used with a nominal scale?
- only those based on frequency of counts made: contingency tables, frequency distributions etc.
- Chi-squared test
What is an ordinal scale?
- classifies by rank
- used with ranked data
What statistics are used with ordinal scales?
- non-parametric methods, sign tests
- Mann-Whitney U-test
What is an interval scale?
- numbers on equal-unit scale are related to arbitrary zero point
- used for quantitative data
What statistics are used with interval scales?
- almost all types of test; t-test, analysis of variance (ANOVA) etc.
What is a ratio scale?
- similar to interval scale, except that the zero point now represents an absence of that character (i.e. it is an absolute zero)
What statistics are used with ratio scales?
- almost all types of test; t-test, ANOVA etc.
Define ‘accuracy’
- closeness of measurements to true value
Define ‘precision’
- closeness of repeated measurements to each other
Define ‘bias’
- consistent non-random divergence from accuracy
- can be subjective, personal, or from incorrectly calibrated instruments
Define ‘mean’
- average value of data
- obtained from sum of all data values divided by number of observations
Advantages and disadvantages of the mean
Advantages
- good measure of centre of symmetrical frequency distributions
- uses all of the numerical values of sample, therefore incorporates all information content of data
Disadvantages
- value of mean is greatly affected by presence of outliers (values much smaller or bigger than most data)
Define ‘median’
- mid-point of observations when ranked in increasing order
- represents location of main body of data better than mean when distribution is asymmetric or when there’s outliers in sample
Define ‘mode’
- most common value in sample
- provides rapidly and easily found estimate of sample location, unaffected by outliers
- however is affected by chance variation in shape of samples distribution, may lie distant from obvious centre of distribution
Define ‘range’
- difference between largest and smallest data values in sample
Advantages and disadvantages of range
Advantages
- easy to determine
Disadvantages
- greatly affected by outliers, makes it a poor measure of dispersion
Steps in calculating a semi-interquartile range for a data set
- rank observations in ascending order
- find values of 1st and 3rd quartiles
- subtract value of 1st quartile from 3rd quartile
- halve this number
Advantages and disadvantages of semi-interquartile range
Advantages
- appropriate measure of dispersion with the median being the appropriate stat to describe location
Disadvantages
- can only be estimated for data grouped in classes
- takes no account of distribution shape at its edges
-
What is the ‘five-number summary’?
- consists of 3 quartiles and 2 extreme values; commonly presented as box-and-whisker plot
- upper extreme, upper quartile, median, lower quartile, lower extreme
Define ‘sample variance’
- sum of squared deviations of each data value from the mean divided by n - 1 (where n is sample size)
Define ‘standard deviation’
- positive square root of sample variance
- SD=√ (Σ(X – mean) 2) ÷ (n - 1)
Define ‘coefficient of variance’ (CV or CoV)
-dimensionless measure of dispersion
- expresses SD as a percentage of sample mean
- (SD ÷ mean) x 100
- e.g. mean = 5; SD = 2; CoV = (2÷5) x 100 = 40%
Define ‘unimodal distribution’
- one peak
- may be symmetrical or asymmetrical
Define ‘bimodal distribution’
- two peaks (two unimodal distributions
- 2 populations are being sampled
Define ‘polymodal distribution’
- more than two peaks/unimodal distributions
- more than two populations being sampled
Define ‘positive skewness’
- longer tail of distribution occurs for higher values of measured variable
Define ‘negative skewness’
- longer tail occurs for lower values
Define ‘kurtosis’
- name given to pointedness of frequency distribution
Name the 2 types of kurtosis and what they mean
- platykurtic; flattened peak
- leptokurtic; pointed peak