intro to data Flashcards
data matrix
table of data
columns : variables
rows : individuals
what are variables and individuals
(information given in data)
variables = characteristics
individuals = observational unit
quantitative variablee
numerical or measurement variable
ex: age, distance
two types: discrete & continuous
quantitative variable
discrete
can only take numerical values with jumps,
1, 2, 3, 4
# of plants in a garden # of dogs in a house
quantitative variable
continuous
can take on any value in an interval
temperature throughout the day
decimals
categorical variable
qualitative variable
place an individual or item into one of several groups or categories called levels
examples of levels
blood types (A,B, AB, O)
gender
two types of categorical variables
nominal and ordinal
categorical variable
nominal
no natural ordering for the categories
Ex: dog breed, brand of soda
categorical variable
ordinal
have a logical order for categories
ex: size of soda, grade level
what graphs are used to graph categorical data
bar graph and pie chart
box plot qualities
shows the median using dark horizontal line
can’t see the number of modes
what graphs are used to graph quantitative data
dotplots and histograms
dotplots qualities
represents each observation in a data set using a single dot along the x-axis
do well displaying values of a variable in a smaller data set
NOT good at displaying data with too many different values —> lose sense of overall distribution
histograms qualities
give a good sense of the shape of the distribution
shows the modes
symmetry is visible
distribution
what values does the variable take and how often
modes
of peaks
univocal, bimodal, multimodal
symmetry
symmetric
skewed to right (tail on right) lower values
skewed to left (tail on left) higher values
outliers
observations that lie outside the overall pattern of distribution
^^ must consider reason they exist
population
the entire group we are interested in learning about
sample
subset of individuals that is often a small fraction of the overall population
parameter
the numerical summary for a characteristic of the population (as a whole)
keyword : “All”
statistic
the numerical summary for a characteristic of a sample
keyword: sample
what two goes together
sample and parameter
statistic and population
sample is a statistic
population is a parameter