Measurement and Graphical Representations of Data Flashcards
What do descriptive statistics tell us?
What type of variable we have and what are their values in our sample
When we want to study a _________, we ____ what happens in the __________ by studying a ______________
- characteristic
- infer
- population
- representative sample
What is categorical data?
Qualitative data that can be divided into groups, usually based on the limited and fixed number of possible values it can have (e.g. colours)
What are the two main types of categorical data?
Nominal and Ordinal
What is a nominal variable?
Categorical data with no inherent order (e.g. ethnicity). It cannot be a number or measured but may be coded for ease of analysis
What is an ordinal variable?
Categorical data that can be ranked, although not necessarily easily spaced (e.g. SES status). Numbers may represent an order but have no mathematical meaning.
What is numerical data?
Quantitative data with values that are always expressed in number form
What are the three main types of numberical data?
Interval, discrete, and continuous data
What is an interval variable?
Data which is ordinal but with equidistant and meaningful spaces, usually with 5 or more categories
What is discrete data?
A countable variable that involves a specific, limited number of possible integer values (e.g. number of kids, shoe size). It may have decimals to include halves if logical.
What is continuous data?
A variable that can be measured. It is not fixed and can have an infinite number of possible values in a prespecified interval (e.g. height, weight, age)
What is the least informative type of variable?
Categorical nominal
What is the most informative type of variable?
Numerical continuous
If a variable has four equidistant categories, what type of variable is it?
Categorical ordinal - there is not enough information to approximate the underlying variable
If a variable has five equidistant categories with mathematical meaning, what type of variable is it?
Numerical interval - we can treat it as continuous as we can sufficiently estimate the underlying continuum
If a variable has ordered values and we know that the difference between two values is meaningful, what type of data is this?
Numerical interval
A measure of agreement asks participants to rate their agreement 1) strongly disagree, 2) disagree, 3) agree, 4) strongly agree. What type of variable is this?
Categorical ordinal
A survery on fruit intake asks participants to rate how often then eat fruit per week: 1-2 days per week, 3-4 days, 5-6 days, or everyday. What type of variable is this?
Numerical interval
A survery asks participants to rate their agreement with 1) I am not sure, 2) I agree to some extent, 3) Depends on the occasion, 4) I am not informed. What type of variable is this?
Categorical nominal
A survery asks participants to rate quality of communication on a scale where 1=very poor and 10=very good. What type of variable is this?
Numerical interval/continuous
What is a valid percentage?
The frequency of a category amongst those who responded (excluding those who did not)
What are the best descriptive indices for categorical data?
Frequencies and percentages
What are the best types of graphical representation for categorical data?
Bar charts or pie charts
What is the best use of pie charts in research?
For nominal data with 2 or more categories
What goes on the X and Y axis of a bar chart?
X=category
Y=count/frequency
What are the two main measures used to describe numerical data?
Location (central tendency) and dispersion (variability)
What is the variance used for?
To understand how far values are from the mean (the average squared distance)
What does the standard deviation measure?
How spread out a group of numbers are from the mean
If you have the variance, how do you calculate the standard deviation?
Take the square root
If you have the standard deviation, how do you calculate the variance?
Square it
Why do you divide by n-1 when working out the variance or SD when using a sample?
To obtain an unbiased estimate for the population
What are examples of measures of location and central tendency?
Mean, median, and mode
What are examples of measures of dispersion?
Standard deviation, minimum and maximum, range and IQR
What are the best types of graphical representation for numerical data?
Histograms and box plots
What do the bins represent in a histogram?
Intervals, not values
What can you add to a histogram to help visualise the spread compared to what may be expected?
A normal distribution curve
What is the normal curve?
A bell shaped symmetrical curve around the mean
In a normal distribution, what percentage of values are lower than the mean?
50%
In what type of distribution does the mean equal the median and the mode?
The normal distribution
What is the name for data that is not normal?
Skewed/non-symmetrical
If the median and mode are smaller than the mean, what is the distribution?
Positively skewed
If data is positively skewed, the mean is ________ than the median and the mode.
Greater
If the median and mode are greater than the mean, what is the distribution?
Negatively skewed
If data is negatively skewed, the mean is _______ than the median and the mode.
Smaller
What are the best descriptive indices to use for normal (symmetrical) numerical data?
Mean and standard deviation
What are the best descriptive indices to use for skewed (non-symmetrical) numerical data?
Median, minimum and maximum, and interquartile range
What is the best measure of dispersion for normal data?
Standard deviation
What is the best measure of dispersion for skewed data?
Minimum and maximum, and interquartile range
What is the best measure of central tendency for normal data?
Mean
What is the best measure of central tendency for skewed data?
Median
What are the best descriptive indices for discrete numerical data?
Median, mode, min-max, and IQR
What is the benefit of using a box plot?
To visualise outliers and depict the distribution of data across groups
In a box plot, what percentage of values are greater than Q3?
25%
In a box plot, what does the middle line represent?
The median
What type of distribution would produce a box plot where (Q2 - Q1) < (Q3 - Q2) ?
A positively skewed distribution
What type of distribution would produce a box plot where (Q2 - Q1) = (Q3 - Q2) ?
A normal distribution
What type of distribution would produce a box plot where (Q2 - Q1) > (Q3 - Q2) ?
A negatively skewed distribution
If the mean, median, and mode are all close together, what can we infer from the data?
That it is normally distributed
What are usually the best descriptive indices for numerical interval data?
Median and range (assuming it is skewed)