STA8170 Flashcards
Data
systematically recorded values (numbers or labels) together with their context
Categorical/qualitative variable
variable that names categories with words or numbers
Context (info required for?) (x6)
who was measured what was measured how data was collected where data was collected when and why study was done
Rows in a data table hold…
individual cases, eg respondents, participants, subjects, units, records
Columns in a data table hold…
variables that give info about each individual case
Quantitative variable
an amount or degree, measured in meaningful numbers eg scale
Identifiers
variable that assigns unique value to each individual/case - cannot be analysed
Relational database
large data bases that link data tables together by matching identifiers
Ordinal variable
categorical variable with ordering of values
Data table
an arrangement of data in which each row represents a case, and each column a variable
Case
individual about whom/which we have data
Record
info about an individual/case in a database
Sample (x2)
representative subset of population
analysed to estimate/learn about the population
Population
the collection of all individuals or
items or objects of interest
Nominal variable
variable whose values are only names of categories
Units
quantity or amount used as standard of measurement
Parameter (and greek letter)
any numerical characteristic of a population - μ (meuw)
Distribution (x2)
description of all the values a variable can take, and how often those values occur
Three important things pictures can do in data analysis?
reveal things not able to be seen in data tables, helping to think about patterns/relationships
show important features in the data
tell others about the data
Area principle (for graphing data)
the area occupied by a part of the graph should correspond to the magnitude of value it represents
Frequency table (x3)
organises the cases according to their variable
rows are category names
also records totals
describes the distribution of a categorical variable
Relative frequency table (x2)
displays percentages, rather than counts, of values in each category
describes the distribution of a categorical variable
Bar chart (x3)
Display distribution of a categorical variable
Categories on the x, counts on the 7
spaces between the bars indicate that freestanding bars can be placed in any order
Relative frequency bar chart
shows the percentage/proportion of values (y) falling under each category (x)
Pie charts are used to display…?
Plus one disadvantage
categorical data
visual comparisons between categories are more difficult than in eg a bar chart
Contingency table
how cases are distributed along each variable, dependent on the other variable
Marginal distribution
the totals displayed (as counts or %) in the bottom row and last column of contingency tables
Conditional distribution
show the distribution of one variable for just those cases that satisfy a condition on another variable
Independent variables in a contingency table are when… (x2)
the distribution of one variable is the same for all categories of another
ie there is no association between them
Histogram (x3)
Bar chart for quantitative data
Counts (y) grouped into bins (x) that make up the bars
No gaps between bars - or gap indicates no values for that bin
Relative frequency histogram
Use percentage on y-axis instead of counts
Stem and leaf plot (x3)
Similar to histogram, but shows the individual values
Useful for doing by hand or in Word, for <100 values
Stem values on the vertical axis, leaves across the horizontal
Dotplots
Like a stem and leaf, but with dots
Can be vertical (like stem plot) or horizontal
Categorical data condition (for deciding on how to display data) (x2)
Data is counts or percentages of individual cases in categories
Categories do not overlap
Quantitative data condition (for deciding on how to display data)
Data ar values of a quantitative variable whose units are known`
Four components for descriptions of distribution (plus egs)
that mean you should be able to…
shape - symmetry, skew, gaps
outliers
centre - median
spread - range, interquartile range
roughly sketch the distribution
Modes (plus 3 types)
the peaks in distributions
unimodal
bimodal
multimodal
A distribution with no modes is described as…
uniform
Skew (x2)
a distribution with longer tail on one side
skew is described as to the side with the longer tail
Median (x3, plus how to find, x2)
the middle value that divides a histogram into two equal areas
appropriate description of centre for skewed distributions or with outliers
always pair with the IQR
if n is odd, median is the middle value
if n is even, median is the average of the two middle values
Range
difference between min and max values in a distribution
Quartile
the dividing points of the number of values/cases in a distribution divided by four
Interquartile range (x2)
= upper quartile - low quartile
the data between the 25th and 75th percentile
Percentile (plus eg x1)
the value that leaves that percentage of data below it
eg, 25th percentile has 5% of data below it
Five number summaries of distribution include…
minimum q1 median q3 maximum
Boxplots (x7)
display of the five number summary
vertical axis from min to max of data
box around q1 and q3
horizontal line inside box at the median
‘fences’ at 1.5 IQRs beyond lower and upper quartiles (not displayed, just for working)
whiskers from box to most extreme data values found within the fences
add dots for any values found outside the fences