Exam 1 Flashcards by Sarah Castle

the science of collecting, describing, and analyzing data

statistics

How well did you know this?

Not at all

Perfectly

subjects/objects we obtain information about in a data set

cases/units

How well did you know this?

Not at all

Perfectly

any characteristic recorded for each case (columns in the data table)

variable

How well did you know this?

Not at all

Perfectly

divides the cases into groups, placing each case into exactly one of two or more categories

categorical variable

How well did you know this?

Not at all

Perfectly

measures or records a numerical quantity for each case

quantitative variable

How well did you know this?

Not at all

Perfectly

helps explain or predict values of other variables

explanatory variable

How well did you know this?

Not at all

Perfectly

gives the reason for a specific variable

response variable

How well did you know this?

Not at all

Perfectly

what is a lurking or confounding variable?

a third variable that is not considered
ex: age of children not considered in the reading level/cavity data

How well did you know this?

Not at all

Perfectly

includes individuals or objects of interest

population

How well did you know this?

Not at all

Perfectly

subset of the population

sample

How well did you know this?

Not at all

Perfectly

n =

sample

How well did you know this?

Not at all

Perfectly

process of using data from a sample to gain information about the population

statistical inference

How well did you know this?

Not at all

Perfectly

method of selecting a sample causes sample to differ from the population in some relevant way

sampling bias

How well did you know this?

Not at all

Perfectly

each unit of a population has an equal change of being selected, regardless of the other units chose for the sample

simple random sample

How well did you know this?

Not at all

Perfectly

difference between sampling bias and bias?

sampling bias impacts the sample
bias impacts the actual method of data collection

How well did you know this?

Not at all

Perfectly

values of one variable tend to be related to the values of another variable

association

How well did you know this?

Not at all

Perfectly

how does association and cause relate?

association does NOT imply a cause and effect relationship

How well did you know this?

Not at all

Perfectly

changing the value of one variable influences the value of the other variable

causation/casually associated

How well did you know this?

Not at all

Perfectly

_____ implies a particular direction and relationship holds an overall trend

causation

How well did you know this?

Not at all

Perfectly

a study in which the researcher actively controls one or more of the explanatory variables

experiment

How well did you know this?

Not at all

Perfectly

a study in which the researcher does not actively control the value of any variable but simply observes the values as they naturally exist

observational study

How well did you know this?

Not at all

Perfectly

what does the word “improve” imply in a study?

causality, cannot happen in observational studies

How well did you know this?

Not at all

Perfectly

a casual relationship can only be determined in what study?

experiment

How well did you know this?

Not at all

Perfectly

the value of the explanatory variable for each unit is determined randomly, before the response variable is measured

randomized experiment

How well did you know this?

Not at all

Perfectly

randomly assign cases to different treatment groups and then compare results on the response variables

randomized comparative experiment

each case gets both treatments in random order and examine individual differences in the response variable between 2 treatments

matched pairs experiment

a summary statistic that helps describe a variable

proportion

how to determine a proportion in a category =

number in that category / total number

proportion for a sample is denoted:

p-hat

p-hat =

proportion for a sample

proportion for a population is denoted:

p =

proportion for a population

used to show relationship between 2 categorical values

2 way table

an observed value that is notable distinct from the other values in a data set

outlier

a numerical average of the data values

mean

mean of a sample is denoted:

x-bar

x-bar =

mean of a sample

mean of a population is denoted:

mu =

mean of a population

the middle entry of an ordered list if the list contains an off number of entries

median

median is denoted:

m =

median

a statistic that is relatively unaffected by extreme values

resistance

is median resistant to outliers?

yes

is mean resistant to outliers?

measures the spread of the data in a sample

standard deviation

the larger the standard deviation, the ____ variability there is in the data and the _____ spread out the data are

more more

standard deviation of a sample is denoted:

s =

standard deviation of a sample

standard deviation of a population is denoted:

σ =

standard deviation of a population

what is the 95% rule?

if a distribution of data is symmetric and bell-shaped, 95% of the data should fall within 2 standard deviations from the mean

tells how many standard deviations the value is from the mean and is independent of the unit of measurement

z-score

z-score =

(x - xhat) / s

the value of a quantitative variable which is greater than p percent of the data

percentile

what is the 5 number summary?

q0 = minimum q1 = first quartile (25%) q2 = median q3 = third quartile (75%) q4 = maximum

range =

maximum - minimum

interquartile range =

q3-q1

is range resistant to outliers?

is interquartile range resistant to outliers?

YES

is standard deviation resistant to outliers?

the start of a box in a box plot is at

the end of a box in a box plot is at

the line that divides the box in a box plot is

the median

the lines on a box plot are

to the most extreme data value that is not an outlier

if the data is skewed left, median _____ mean

median greater than the mean

if the data is symmetric, median _____ mean

equal

if the data is skewed right, median _____ mean

median smaller than the mean

a graph of the relationship between 2 quantitative variables

scatterplot

for a scatterplot, the _____ variable is on the x axis and the _____ variable is on the y axis

explanatory response

a measure of the strength and direction of linear association between 2 quantitative variables

correlation

correlation of a sample denoted:

correlation of a population denoted:

ρ "rho"

correlations closer to 1 are _____

stronger

for the linear regression line equation y=bo + bi x what is y?

predicted value

for the linear regression line equation y=bo + bi x what is bo?

y-intercept

for the linear regression line equation y=bo + bi x what is bi?

slope

for the linear regression line equation y=bo + bi x add in where response and explanatory variables would be

response = bo+bi(explanatory)

difference between the observed and predicted values of the response variable

residual

equation for residual:

observed - predicted y - y-hat

what does a residual represent on a scatterplot?

vertical deviation from line to a data point

line that minimizes the sum of the squared residuals

least squares line

do outliers influence regression line?

YES

data from the principality of andorra were used to determine that 98.9% of andorrans have access to the Internet, the highest rate of any country. what are the cases in the data from andorra? what variable is used? is it categorical or quantitative?

cases - people in Andorra variable - internet access categorical

an online poll conducted on biblegateway.com asked, “how often do you talk about the bible in your normal course of conversation?” over 5000 people answered the question, and 78% of respondents chose the most frequent option: multiple times a week. can we infer that 78% of people talk about the bible multiple times a week? why or why not?

no biblical website creates bias

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: studies show that taking a practice exam increases your score on an exam.

association w/ causation

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: families with many cars tend to also own many television sets.

association implying causation

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: sales are the same even with different levels of spending on advertising.

no association

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: taking a low-dose aspirin a day reduces the risk of heart attacks.

association with causation

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: goldfish who live in large ponds are usually larger than goldfish who live in small ponds.

association implying causation

state whether the sentence implies no association between the variables, association without implying causation, or association with causation: putting a goldfish into a larger pond will cause it to grow larger.

association with causation

a nationwide US telephone survey conducted by the pew foundation1 asked 2625 adults ages 18 and older, “some people say there is only one true love for each person. do you agree or disagree?” In addition to finding out the proportion who agree with the statement, the pew foundation also wanted to find out if the proportion who agree is different between males and females, and whether the proportion who agree is different based on level of education (no college, some college, or college degree). the survey participants were selected randomly, by landlines and cell phones. what are the cases in the survey about one true love? what are the variables? are the variables categorical or quantitative? how many rows and how many columns would the data table have?

cases - 2625 people variables: do u agree? - categorical gender - categorical level of education - categorical 2625 rows, 3 columns

give the notation for the mean: for a random sample of 50 seniors from a large high school, the average SAT score was 582 on the math portion of the test.

x-bar = 582

give the notation for the mean: about 1.67 million students in the class of 2014 took the SAT,28 and the average score overall on the math portion was 513.

mu = 513

the five number summary for the mammal longevity data in table 2.21 on page 73 is (1, 8, 12, 16, 40). find the range and interquartile range for this dataset.

range: 40-1 = 39 IQR: 16-8 = 8

use the regression line to predict the tip of a bill that is $59.33 tip = -0.292 + 0.182 (bill)

10.51

use the regression line to predict the tip of a bill that is $9.52 tip = -0.292 + 0.182 (bill)

$1.44

use the regression line to predict the tip of a bill that is $23.70 tip = -0.292 + 0.182 (bill)

$4.02

Exam 1 Flashcards

(98 cards)