chapter 1: picturing distributions with graphs Flashcards
Individuals
Particular things
Population
Set of individuals
Variable
an attribute of an individual.
value (of a variable)
any way that that variable could be exhibited by an individual
True or False: every individual must have only a single value for any given variable
True
Data
numbers with a context, or, values of variables for the individuals in a population
Dataset
The particular data that we are presented with
observation
a member of our dataset
sample
That part of the population from which our observations come
The size of a dataset (or sample)
the number of observations in it
in order to clearly define a statistical problem, we must…
we must first clearly state the population and the variables that it concerns
exploratory data analysis
when you seek simply to describe datasets
distribution
a description of the values a variable takes and how
often it takes them
graph
a visual representation of a distribution
count
a category’s size
If we want to know the proportion of observations in a dataset that are in a given category…
we divide the count of that category by the sample size
If we want to know the percentage of observations in a dataset that are in a given category…
we multiply the proportion of observations in the dataset that are in that category by 100%
two ways we can look at the size of a given category
we might be interested in how it compares to either the sizes of the other categories, or in how it compares to the size of the population
To compare the sizes of the categories with each other
we use a bar graph
One drawback of bar graphs is
it is difficult to look at proportions of observations using them
To compare the sizes of the categories with the size of the dataset
we calculate the percentage of the dataset that fall into each category
to compare the percentage of any given category not only to each other, but… to the dataset as a whole as well
we use a pie chart
roundoff error
when the error in percentage totals reflect the accumulated errors in rounding
when dealing with quantitative data, we must have [blank]
a unit of measurement
If we allow the value of a variable for a single individual to change, we must also allow [blank] for it to do so
time, on a time plot
the long-term upward or downward movement
over time on a time plot
trend
The amount of change over time in a time plot can be deceiving, so make sure that you [blank]
check the scale of the variable and where its values begin on the 𝑦-axis
For small data sets, if we want to compare the values that a quantitative variable takes among different
individuals in a population, we can use a [blank]
stem plot (numbers in a column (“stem”) at left, other numbers stretching in rows to the right. you lop off the latter number on the stem and the former on the leaf
In a stem plot, we reduce the digits in each observation to only the first digit, which we call a…
leaf (get it?). Ex. a score of 107 is represented by the 7 to the right of the 10 in the stem
True or false: we NEVER include commas or decimal points in a stem plot
True
For larger quantitative datasets, ranges into which the values of the data fall are called [blank]
bins
When you make quantitative data categorical in bins, you can then make a bar graph called a [blank]
histogram
we define statistics as…
the science of learning from data
right-skewed or left-skewed
when data is further away from the center on one side than the other