Exam 1 Flashcards
the science of collecting, describing, and analyzing data
statistics
subjects/objects we obtain information about in a data set
cases/units
any characteristic recorded for each case (columns in the data table)
variable
divides the cases into groups, placing each case into exactly one of two or more categories
categorical variable
measures or records a numerical quantity for each case
quantitative variable
helps explain or predict values of other variables
explanatory variable
gives the reason for a specific variable
response variable
what is a lurking or confounding variable?
a third variable that is not considered
ex: age of children not considered in the reading level/cavity data
includes individuals or objects of interest
population
subset of the population
sample
n =
sample
process of using data from a sample to gain information about the population
statistical inference
method of selecting a sample causes sample to differ from the population in some relevant way
sampling bias
each unit of a population has an equal change of being selected, regardless of the other units chose for the sample
simple random sample
difference between sampling bias and bias?
sampling bias impacts the sample
bias impacts the actual method of data collection
values of one variable tend to be related to the values of another variable
association
how does association and cause relate?
association does NOT imply a cause and effect relationship
changing the value of one variable influences the value of the other variable
causation/casually associated
_____ implies a particular direction and relationship holds an overall trend
causation
a study in which the researcher actively controls one or more of the explanatory variables
experiment
a study in which the researcher does not actively control the value of any variable but simply observes the values as they naturally exist
observational study
what does the word “improve” imply in a study?
causality, cannot happen in observational studies
a casual relationship can only be determined in what study?
experiment
the value of the explanatory variable for each unit is determined randomly, before the response variable is measured
randomized experiment
randomly assign cases to different treatment groups and then compare results on the response variables
randomized comparative experiment
each case gets both treatments in random order and examine individual differences in the response variable between 2 treatments
matched pairs experiment
a summary statistic that helps describe a variable
proportion
how to determine a proportion in a category =
number in that category / total number
proportion for a sample is denoted:
p-hat
p-hat =
proportion for a sample
proportion for a population is denoted:
p
p =
proportion for a population
used to show relationship between 2 categorical values
2 way table
an observed value that is notable distinct from the other values in a data set
outlier
a numerical average of the data values
mean
mean of a sample is denoted:
x-bar
x-bar =
mean of a sample
mean of a population is denoted:
mu
mu =
mean of a population
the middle entry of an ordered list if the list contains an off number of entries
median
median is denoted:
m
m =
median
a statistic that is relatively unaffected by extreme values
resistance
is median resistant to outliers?
yes
is mean resistant to outliers?
no
measures the spread of the data in a sample
standard deviation
the larger the standard deviation, the ____ variability there is in the data and the _____ spread out the data are
more
more
standard deviation of a sample is denoted:
s
s =
standard deviation of a sample
standard deviation of a population is denoted:
σ
σ =
standard deviation of a population
what is the 95% rule?
if a distribution of data is symmetric and bell-shaped, 95% of the data should fall within 2 standard deviations from the mean
tells how many standard deviations the value is from the mean and is independent of the unit of measurement
z-score
z-score =
(x - xhat) / s
the value of a quantitative variable which is greater than p percent of the data
percentile
what is the 5 number summary?
q0 = minimum
q1 = first quartile (25%)
q2 = median
q3 = third quartile (75%)
q4 = maximum
range =
maximum - minimum
interquartile range =
q3-q1
is range resistant to outliers?
NO
is interquartile range resistant to outliers?
YES
is standard deviation resistant to outliers?
NO
the start of a box in a box plot is at
q1
the end of a box in a box plot is at
q3
the line that divides the box in a box plot is
the median
the lines on a box plot are
to the most extreme data value that is not an outlier
if the data is skewed left, median _____ mean
median greater than the mean
if the data is symmetric, median _____ mean
equal
if the data is skewed right, median _____ mean
median smaller than the mean
a graph of the relationship between 2 quantitative variables
scatterplot
for a scatterplot, the _____ variable is on the x axis and the _____ variable is on the y axis
explanatory
response
a measure of the strength and direction of linear association between 2 quantitative variables
correlation
correlation of a sample denoted:
r
correlation of a population denoted:
ρ
“rho”
correlations closer to 1 are _____
stronger
for the linear regression line equation y=bo + bi x
what is y?
predicted value
for the linear regression line equation y=bo + bi x
what is bo?
y-intercept
for the linear regression line equation y=bo + bi x
what is bi?
slope
for the linear regression line equation y=bo + bi x
add in where response and explanatory variables would be
response = bo+bi(explanatory)
difference between the observed and predicted values of the response variable
residual
equation for residual:
observed - predicted
y - y-hat
what does a residual represent on a scatterplot?
vertical deviation from line to a data point
line that minimizes the sum of the squared residuals
least squares line
do outliers influence regression line?
YES
data from the principality of andorra were used to determine that 98.9% of andorrans have access to the Internet, the highest rate of any country.
what are the cases in the data from andorra?
what variable is used?
is it categorical or quantitative?
cases - people in Andorra
variable - internet access
categorical
an online poll conducted on biblegateway.com asked, “how often do you talk about the bible in your normal course of conversation?” over 5000 people answered the question, and 78% of respondents chose the most frequent option: multiple times a week.
can we infer that 78% of people talk about the bible multiple times a week? why or why not?
no
biblical website creates bias
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
studies show that taking a practice exam increases your score on an exam.
association w/ causation
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
families with many cars tend to also own many television sets.
association implying causation
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
sales are the same even with different levels of spending on advertising.
no association
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
taking a low-dose aspirin a day reduces the risk of heart attacks.
association with causation
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
goldfish who live in large ponds are usually larger than goldfish who live in small ponds.
association implying causation
state whether the sentence implies no association between the variables, association without implying causation, or association with causation:
putting a goldfish into a larger pond will cause it to grow larger.
association with causation
a nationwide US telephone survey conducted by the pew foundation1 asked 2625 adults ages 18 and older, “some people say there is only one true love for each person. do you agree or disagree?” In addition to finding out the proportion who agree with the statement, the pew foundation also wanted to find out if the proportion who agree is different between males and females, and whether the proportion who agree is different based on level of education (no college, some college, or college degree). the survey participants were selected randomly, by landlines and cell phones.
what are the cases in the survey about one true love?
what are the variables?
are the variables categorical or quantitative?
how many rows and how many columns would the data table have?
cases - 2625 people
variables:
do u agree? - categorical
gender - categorical
level of education - categorical
2625 rows, 3 columns
give the notation for the mean:
for a random sample of 50 seniors from a large high school, the average SAT score was 582 on the math portion of the test.
x-bar = 582
give the notation for the mean:
about 1.67 million students in the class of 2014 took the SAT,28 and the average score overall on the math portion was 513.
mu = 513
the five number summary for the mammal longevity data in table 2.21 on page 73 is (1, 8, 12, 16, 40). find the range and interquartile range for this dataset.
range: 40-1 = 39
IQR: 16-8 = 8
use the regression line to predict the tip of a bill that is $59.33
tip = -0.292 + 0.182 (bill)
10.51
use the regression line to predict the tip of a bill that is $9.52
tip = -0.292 + 0.182 (bill)
$1.44
use the regression line to predict the tip of a bill that is $23.70
tip = -0.292 + 0.182 (bill)
$4.02