AP exam flashcards

1
Q

interpret standard deviations

A
  • standard deviation accounts for variability from the mean*

height of students typically varied by about 3.2 inches from the mean height of 64 inches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

scope of inference cause and effect

A

cause and effect conclusions can only be drawn if subjects were randomly assigned treatments and we find a statically significant difference

a difference is statistically significant if it is larger than what would be expected to happen by chance alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

generalizing to a larger population

A

we can generalize and a study to a larger population if we randomly select from that population.

however, sampling variably can affect estimates because if we conduct different samples of the same size from the same population we will produce different estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

replication and control

A

2 out of 4 factors for a good experiment

replication - giving each treatment to enough subjects or units so that any difference in the effect of treatments can be distinguished from chance differences

control - keeping other variables the same for all groups especially variables that are likely to cause confounding(control helps reduce variability in the response variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

experimental units, factors and levels, treatments

A

experimental units - objects for which the treatment is randomly assigned. when the unit is a person, they are often called “subjects”

factor - an explanatory variable that is manipulated and may cause a change in the response variable

level - different values of a factor

all combinations of levels are treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

control groups and blinding

A

other 2 factors that contribute to a good experiment

control group - provide a baseline for comparing the effects of other treatments. A control group is often given an inactive treatment(placebo), active treatment, or no treatment

blind - when the subject doesn’t know which treatment they are receiving. the people recording or measuring the response variable don’t know they are blind. when both groups don’t know it is called “double-blind”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

blocking and matched pairs design

A

before random assignment divide the experimental units into groups that would respond similarly. then randomly assign treatments within blocks.

a matched pairs design uses blocks of size 2 or gives both treatments to each subject in random order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

random assignment and completely randomized designs

A

random assignment - create groups of experimental units that are roughly equivalent at the beginning of the experiment

if treatments are assigned to experimental units completely at random(no blocking), the result is a completely randomized design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

simple random sample

A

of size n is chosen so that every group of n individuals in the population has an equal chance to be selected as the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

bias

A

a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know

samples that can result in bias - convenience, voluntary, under coverage, non-response, and response bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

using a random table to select a sample

A

label all members of the population with the same number of digits
randomize and read the digits from left to right skipping any repeated numbers or numbers not in the interval or numbers
selects the individuals whose labels you find

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

choosing a model

A

choose the model whose residual plot has the most random scatter

if there is more than one model with a random scattered residual plot, choose the model with the largest coefficient of determinations, r2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

population, census, sample

A

the population in a statistical study is the entire group of individuals we want information about

census collects information from every single person within the population

a sample is a subset of individuals from the population from which we collect data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

experimental vs observational study

A

experimental study - researchers impose treatment(s) upon the experimental units. well designed experiments allow for cause-and-effect conclusions to be made

observational study - does not influence variables and the results cannot conclude cause and effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a chi square distribution

A

a chi square distribution is defined by a density curve that takes only nonnegative values and is skewed to the right

as df increases the chi square distributions become more variable, less skewed and centered at a larger value (mean = df)

the chi square test statistic measures how different the observed counts are from the expected counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

inference for regression

A

Liner - association between variables is linear
Independent - observations, 10% condition if sampling without replacement
Normal - responses vary normally around the regression line for all x-values (or n > 30)
Equal SD - around the regression line for all x-values
Random - data from a random sample or randomized experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

outlier rule

A

outliers > Q3 + 1.5(IQR)
outliers < Q1 - 1.5(IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a resistant measure

A

a reassure measure is not affected by outliers

resistant measures: median, IQR, Q1, Q3

non resistant: mean, SF, range correlation, equation of LSRL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interpret a Z-score

A

“Jessica;s test score was 2.3 standard deviations below the mean”
z = -2.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

z - score formula

A

z = value - mean/standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

interpret standard deviation of residuals s

A

s measures the size of the typical residual

“The cost of a car typically varies by about $2375 from the price predicted by the LSRL with x = years”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

residual formula

A

actual - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

interpreting a residual plot

A
  • if there is no leftover curvature the model used to make the plot is appropriate
  • if there is leftover curvature the model used to make the plot is not appropriate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

making predictions/extrapolation

A

extrapolation is the use of a LSRL for prediction outside of the interval. The further we extrapolate the less reliable predictions

25
Q

interpret slope and y intercept

A

slope - “The predicted cost of a car decreases by about $1285 for each additional year”
slope - the change in y when x increases by one unit

y intercept - “The predicted cost of a car is about $23,450 when it is x = 0 years old”
y intercept - the predcited value of y when x is 0

26
Q

interpret a residual

A

” the car cost $1500 more than the price predicted by the LSRL with x = years”

27
Q

working with a power model

A
28
Q

interpret coefficient of determination(r2)

A

r2 measures the percent of variability in y that is accounted for by the LSRL of y on x

“48% of variability in the cost of a car is accounted for by the LSRL with x = years”

29
Q

cluster sampling

A

split the population into groups(based on location) called cluster, randomly selefct cluster and include each member of the selected clusters in the sample

30
Q

confounding

A

two variables are associated in such a way that their effects on the response variable cannot be distinguished

31
Q

systematic random sampling

A

selected a sample from an ordered arrangement of the population by random selecting one of the first k individuals choosing every kth individual thereafter

k =

32
Q

stratified random sampling

A

split the population into homogeneous(similar) groups(strata) based on anticipate response. selected an srs from each stratum and combine the srss to form the overall sample

33
Q

outliers, high leverage, and influential points in regression

A

high leverage - a point with much larger or much smaller x values than the other points

outliers - a point that does not follow the pattern of the data and has a much larger residual(actual - predicted)

influential point - a point that if removed substantially changes the slope, y-intercept, correlation, r2, or standard deviation of the residuals

high leverage points and outliers can both be influential

34
Q

how does shape affect measures of center

A

mean < median (Left Skew)
mean > median (Right Skew)
mean = median (Roughly Symmetric)

35
Q

association

A

two variables have an association if knowing the value of one variable helps to predict the value of the other variable

36
Q

discrete vs continuous variables

A

a quantitative variable is discrete if its possible values have gaps between them. ie (1, 2, 3, 4)

a quantitative variable is continuous if its possible values have no gaps between them and can take any value on the number line. ie(1, 1.1, 1.2, 1.3 … 1.7)

37
Q

interpret r

A

correlation measures strength and direction

r is always between -1 and 1

close to zero = very weak
close to 1 or -1 = strong
exactly 1 or -1 = perfectly straight line

positive r = positive correlation
negative r = negative correlation

38
Q

finding boundaries under a normal distribution

A

use invNorm and label inputs

empirical rule

39
Q

finding area under a normal distribution

A

use normalcdf

40
Q

standard normal distribution

A

the area of a normal distribution will always be 0 and SD 1

41
Q

describing/comparing distributions of quantitative data

A

use SOCV

Shape
Outliers
Center
Variability

42
Q

parameter vs statistic

A

a parameter is always about a population

a statistic is always about a sample

parameters include the population mean, population standard deviation, population proportion

statistics include the sample mean, sample standard deviation, sample proportion

43
Q

marginal, joint, and conditional relative frequency

A

marginal - the values on the edge of the 2-way table

joint - the values that make up the body of the table

conditional - the joint frequency/condition

ex: the probability that a survey respondent likes basketball the most, given that the respondent is male. 15(males who like basketball)/48(males because that’s the condition)

44
Q

percentiles

A

the pth percentile of a distribution is the value that has p% of the observations less than or equal to that value

example: a student who scores at the 90th percentile got the same score or a greater score than 90% of the other test takers

45
Q

describing an association in a scatterplot

A

use DUFS to describe association in a scatterplot

Direction - positive, negative, no association
Unusual features - clusters, other points
Form - linear, nonlinear
Strength - weak, moderate, strong

“There is a moderate, positive, linear association between height and weight for HS students”

46
Q

empirical rule

A

if a distribution of data is approximately normal then,

  • 68% of the data will be within 1 SD of the mean
  • about 95% of the data will be within 2 SD of the mean
  • about 99.7% of the data will be within 3 SD of the mean
47
Q

transforming data/ effect of changing units

A

adding “a” to every member of a data set adds “a” to the measures of center/position but does not change the measures of variability or shape

multiplying every member of a data set by a positive constant “b” multiplies the measures of center/position by “b” and multiplies most measures of variability by “b”, but does not change shape

48
Q

density

A

a density curve models the distribution of a quantitative variable with a curve that is always on or above the horizontal axis and has an area exactly 1 underneath

the area under the curve and above any interval of values on the horizontal axis estimates the proportion of all observations that fall in that interval

49
Q
A
50
Q
A
51
Q
A
52
Q
A
53
Q
A
54
Q
A
55
Q
A
56
Q
A
57
Q
A
58
Q
A