EXTRA CREDIT FOR FINAL EXAM Flashcards
What does Variability do?
Makes it easier to detect differences in the treatment outcomes.
Practice Problem A study found correlation r=0.61 between the sex of a worker and his or her income. You conclude that
A. Women earn more than men on the average.
B. Women earn less than men on average.
C. An arithmetic mistake was made; this is not a possible value of r.
D. This is nonsense because r makes no sense here.
D. This is nonsense because r makes no sense here.
Block
group of experimental units or subjects that are similar in ways that are expected to affect the response to the treatments
What is the explanatory variable in the LSRL formula?
x
What is the general equation for a least square regression line
ŷ=a+bx
Controls allows ______?
The experimenter to eliminate confounding effects of lurking variables.
What is a Matched Pairs Design?
A type of block design used when an experiment only has two treatments and participants can be grouped into pairs based on a variable. Within these pairs, participants are randomly assigned to treatments.
What is a two way table?
IDK, There is a row, column. Add all the number of column and row will give the total amount of the study.
What are the four types of quantitative graphs covered in class?
Stem Plot
Histogram
Dot Plot
Ogive
What are the possible measures of spread?
Range
IQR
Variance
Standard Deviation
If a player rolls two dice and gets a sum of 2 or 12 he wins $20. If the person gets a 7 he wins $5. The cost to play the game is $3. Find the expectation of the game.
Expectation=the meanE(x)= 1.165
Define simulation
The imitation of chance behavior based on a model that reflects the experiment.
What is a Convenience Sample?
A poor sampling technique in which the data is obtained in the most convenient way possible, which will not account for any lurking variables in the sample. EX: A teacher asking only the students who will be in his room during the day.
What type of graph is this?
continuous density curve
What is the 5th and final step in designing a simulation?
State your conclusions
Where is b found in a computer printout?
Directly below a
What is IQR?
What is its equation?
When do you find the IQR for spread?
- Inner Quartile Range
- Q3-Q1=IQR
- You use it when the shape of the graph is skewed
Placebo effect
Dummy treatment that has no physical effect
Comparative Experiment
Treatment–ObservationObservation 1–Treatment–Observation 2
Independence
Is P(A) × P(B) = P(A and B)?Is P(B|A) = P(B)?Is P(A|B) = P(A)?If yes then the events are independent.
What type of graph is this?
discrete density curve
What does “ŷ” mean?
Predicted value/outcome
What is the z-score equation?
z = (x - μ) / σx = x-valueμ = meanσ = standard deviation
What is an example of under coverage?
Sample Houses, prisoners or students
What are the symbols for Correla2tion and Coefficient of Determination?
R = Correlation
R2 = Coefficient of Determination
Central is playing Centennial. If Central’s 1st quarterback plays they have a 75% of winning. if they play the backup QB their chances drop to 40%. The chance of the 1 QB playing is 70%. a) Probability of winning the game? b) If central won was was the probability the backup QB played?
a) P(winning)= .645b) P(Backup QB/ Winning)= .186
In the binomial setting, is the number of observations fixed or variable?
The number of observations is fixed in a binomial setting.
What is a Completely Randomized Design?
Participants are randomly assigned to treatments.
How do you find the residual?
y-ŷ
If the graph of your data is skewed, what center and measure of spread should you use?
Center: Median
Measure of Spread: IQR
Why is this an example of matched pairs design?
The participants are matched by type of car.
What is a wording bias?
the wording of questions can cause bias
which of the following is Quantitative?
a. age
b. time
c. gender
d. number of girls in a class
e. number of people in a class
a and e
Practice problem:Sarah got a test score of 77%- The standard deviation of the test was 7%- Sarah’s z-score was .57What is the mean of the class??
73%
what is the difference between geometric cdf and geometric pdf?
cdf= winning on a certain trial pdf= winning on or before a certain trial
95% of the observations fall within how many standard deviations of the mean?
2
Graph has no peak and all bars of data are the same height.
Uniform
Subject
Unit is a person
Graph has one peak and data is on the right.
Skewed Right
numerical outcome of a probability that is determined by chanceex) roll of a die, or flipping a coin
random variable
Describe the four conditions that describe a Binomial setting
- 2 outcomes
- Fixed number of trials
- Each trial is independent.
- Probability of success is the same everytime.
Probability of getting accepted to Yale is 0.7 and the probability of being accepted to Princeton is 0.6. Probability of getting accepted to both is 0.4. a) Probability of getting accepted to one school? b) Not accepted to either school?
a) 0.9b) 0.1
Conditional Probability Formula
P(A|B) = P(AnB)/P(B)
Find the probability of getting a success on the 5th trial with 10% chance of success.
(1-0.1)^4•0.1
Individuals that do not follow the pattern of the data. These include values that are very small or very large.
Outliers
How can a study be biased?
A study is biased if it systematically favors certain outcomes.
What is the equation to find “b”?
b=r(Sy/Sx)
Graph has one peak and the data is on the left side.
Skewed Left
can be measuredex) weight of m&m’s
continuous
What quantitative graph divides its x-values into groups and then adds the previous y-values quantity to its own?
Ogive
What is a z-score?
Standardized variable, the amount of standard deviations away from the mean
What is Multistage Sampling?
A using multiple sampling to create a sample.
What is a Voluntary Response Sample?
A poor sampling technique in which the individuals elect to be a part of the survey. It is a poor sampling technique because the only people who volunteer, will be the people who feel the very strongest on the matter.
what is the area under any density curve?
1
What is the variable name of constant coefficient in the LSRL formula?!
a
can be countedex) number of red m&m’s in a bag
discrete
What is the center?
The middle of the distribution, first located by finding the highest peak. It is measured by mean, median, or mode.
What is mean; its equation?
The average; the sum of the data divided by the number of numbers.
What is the slope in the LSRL formula?
b
Graph has one peak and data is on both sides.
Symmetric
Practice Problem
Which of the folowing would not be a correct interpretation of a correlation of r=-.30?
A. The variances are inversely related.
B. The coefficient of determination is 0.09.
C. 30% of the varitaion between the variables is linear.
D. There exists a weak relationship between the variables.
E. All of the above statements are correct.
C. 30% of the varitaion between the variables is linear.
Is it possible for past successes to influence future successes or failures in the binomial setting?
No, in the binomial setting, all of the trials are independent.
What are the four traits of a geometric distribution?
1) only 2 possible outcomes 2) independent of each other 3) probability of success is always the same 4) variable of interest is the number of trials before success
What are the three types of observational studies?
Cross-Sectional Case-Control Cohort
What is response bias?
behavior of the respondent or interviewer can cause bias
What is a Stratified Random Sample?
A sample in which the population is split into groups known as ‘strata’. The strata are formed based on the individuals’ shared attributes and characteristics. After stratifying, you take an SRS of each strata.
What is the correlation coefficient?!
r
How to interpret R2 (Coefficient of Determination)
Coefficient of Determination : (R2 as a percent) of the change in (Y) can be explained by the change in (X)
Example Problem:
Given that R2 is .99, Y = height, and X= age, describe what it means in the context of the situation.
-99% of the change in height (Y) can be explained by the change in age (X)
Suppose a chocolate company has three factories. 40% of their chocolate is made at factory 1 and 80% is dark chocolate 25% of chocolate is made at factory 2 and 30% is milk chocolate. Rest of the chocolate is made in Factory 3 and 65% is dark chocolate. a)Probability of milk chocolate? b) if you have dark chocolate what the probability it came from factory 2?
a) P(milk chocolate)= 0.2775b) P(dark/2)= .2422
Which type of quantitative graph organizes your one variable data chronologically so that, for example, all of the data in the 20-29 region would be grouped together with a 2 on one side of a line and a list of the changing ones place (2 2 3 4 4 7)?
Stem Plot
What is a Block Design?
This divides the participants into groups called blocks. Participants within each block are then randomly assigned treatments.
What is Confounding?
When the effects of two variables (lurking or explanatory) on a response variable cannot be distinguished. Ex. In a study regarding the relationship between Mr. Hopkin’s happiness level and his student’s grades, the gender ratio of the students and their heights are confounding because neither of them influence Mr. Hopkin’s happiness level; their effects cannot be distinguished.
Case-Control Study
an epidemiological research method in which people who have developed a disease such as cancer are studied alongside people who have not, and the differences and possible causes analysed
Which design is this an example of?
Completely Randomized Design
What does “a” represent in the LSRL equation?
When x=0, that is the value of y (y-intercept).
What is a Population?
Used to measure all possible outcomes in a particular study.
What is standard deviation?
What is its equation?
When do you use standard deviation to find spread?
- The average distance from the mean
- You use it when the shape of the graph is symmetric
What is Causation?
When an increase in “x” causes an increase in “y”.Ex. In a study regarding the relationship between Mr. Hopkin’s happiness level and his student’s grades, an increase in a student’s grades may increase Mr. Hopkin’s happiness level.
What is Conditional Probability
P(A|B) ex: venn diagram or a tree
Conditions of Binomial Distribution
1) Only 2 outcomes: Success or failure2)Probability of success the same for each trial3)Trials are independent 4)There are a fixed number of trials. A random variable, X, will count the number of successes
What is a Quantitative variable?
A numerical variable that represents a measurable quantity.
True or False. A class in a school is a sample.
True
What is the second step in designing a simulation?
State assumptions
What is the mode and when is it used to describe the center of a graph?
- The number that occurs the most
- When the graph’s shape is bimodal
Suppose that 7% of people in town have cancer. Suppose a test can determine if a patient is positive for cancer for 89% of patients with cancer but is also 5% positive for those without cancer. If selected at random what is the probability a person has cancer given a positive test result?
P(positive)=.1088P(Cancer/Positive)= P(Cancer and Positive)/ P(Positive)Answer = .573 or 57.3%
P(A or B)
P(A) + P(B) - P(A and B)
68% of the observations fall within how many standard deviations of the mean?
1
What is an example of response bias?
Being asked by Mr. Williams or a student “Do you drink alcohol?”
What is the third step in designing a simulation?
Assign digits to represent the possible outcomes
What’s the difference between positive and negative z-scores?
A positive standard score is above the mean, while a negative standard score is below the mean.
What is Experimental Design?
A plan of assigning experimental units to treatment conditions.
99.7% of the observations fall within how many standard deviations of the mean?
3
The higher the r^2, the
More of an exact line the data set is.
On a graph, how is a represented?
a is the y-intercept
What is an example of wording bias?
Question: Local referendum to create an new schoolOption 1: This referendum will help young people stay in school by encouraging participation in athletics.Option 2: This referendum will cost property owners $320 in additional taxes every day.
What is range?
What is its equation?
When do you find range?
- The maximum value minus the minimum value in the data set
- Max- Min
- You find the range for boxplots and its five-number summary
Causation allows _____?
The experimenter to make inferences about the relationship between independent variables and dependent variables.
Finding the Probability Under a Normal Distribution (with calculator)
normalcdf(lower bound, upper bound, μ, σ)
What are the things required to interpret correlation?
Form, strength, and direction
Which type of 1-variable quantitative graph does not necessarily need a y-axis label and stacks a dot over it’s x value for each count?
Dot Plot
Which design type is this an example of?
Block Design
What is a Risidual?
The vertical distance from a expected point to the regression line.
Simulation Practice problem: On average, suppose a baseball player hits a home run once in every 10 times at bat, and suppose he gets exactly two “at bats” in every game. Using simulation, estimate the likelihood that the player will hit 2 home runs in a single game.
Solution:Describe the possible outcomes. For this problem, there are two outcomes - the player hits a home run or he doesn’t.Link each outcome to one or more random numbers. Since the player hits a home run in 10% of his at bats, 10% of the random numbers should represent a home run. For this problem, let’s say that the digit “2” represents a home run and any other digit represents a different outcome.Choose a source of random numbers. For this problem, we used Stat Trek’s Random Number Generator to produce a list of 500 two-digit numbers (see below).Choose a random number. The list below shows the random numbers that we generated.Based on the random number, note the “simulated” outcome. In this example, each 2-digit number represents two “at-bats” in a single game. Since the digit “2” represents a home run, the number “22” represents two home runs in a single game. Any other 2-digit number represents a failure to hit consecutive home runs in the game.Repeat steps 4 and 5 multiple times; preferably, until the outcomes show a stable pattern. In this example, the list of random numbers consists of 500 2-digit pairs; i.e., 500 repetitions of steps 4 and 5.Analyze the simulated outcomes and report results. In the list, we found 6 occurrences of “22”, which are highlighted in red in the table. In this simulation, each occurrence of “22” represents a game in which the player hit consecutive home runs.
Which quantitative graph divides its data into Bin and Counts?
Histogram
Practice Problem A copy machine dealer has data on the number x of copy machines at each of 89 customer locations and the number of y service calls in a month at each location. It was given that r=0.86. What percent of the variation in number of service calls is explained by the linear relation between number of service calls and number of machines?
A. 86%
B. 93%
C. 74%
D. None of these
E. Can’t tell from the information given
C. 74% The question is asking for r2 and since r=0.86, you would square 0.86 and get 0.74.
What is the fourth step in designing a simulation?
Simulate many repetitions
What is the coefficient of determination?!
r^2
Experimental Units
The individuals on which an experiment is donePeople=Subjects
Factor
Explanatory variable(Most experiments have multiple factors. When a treatment is formed from these factors, it is often called a level.)
The probability that 0, 1, 2,3 or 4 people will seek treatment for the flu during any given hour at the emergency room is shown in a distribution.What does the random variable count or measure?
Number of people who seek treatment.
What is the overall probability of the sample space.
ie: P(S)
1
Dustin Sucks
What is variance?
What is its equation?
- The square of standard deviation
Independent Events
P(AandB)=P(A)*P(B/A)=P(A)*P(B)
What is a Lurking Variable?
A variable that is not recorded in the study but may still influence the relationship between variables in the study. Ex. In a study regarding the relationship between Mr. Hopkin’s happiness level and his student’s grades, a lurking variable may be whether or not he drank his daily Kickstart.
what does it mean to be in the 80th percentile on the ACT?
the person scored higher than 80% of the people who took the ACT
What are the Three Statistical Design Principles?
- Control of effects of lurking variables2. Randomization–assigning individuals to treatments randomly by using random number table3. Replication
What is a Simple Random Sample?
A sample in which every individual has an equally likely chance of being chosen; every sample of size ‘n’ has an equally likely chance of being chosen. It can be referred to as ‘SRS’.
the area under the curve from negative infinity to a given point
a percentile
Finding the Z-Score (with table)
True or False. When looking at the United States, the city of New York is a population.
False
How do you find the outliers in a set of data?
Low outliers: Q1-(1.5*IQR)
High outliers: Q3+(1.5*IQR)
What are the two types of qualitative graphs covered in class?
Pie Chart
Bar Graph
What is the equation to find “a”?
a=(mean of y values)-(b)(mean of x values)
Practice Problem:
What is the probability that some event occurs with a z-score greater than 0.8 and less than -0.9?
0.3959
Cohort Studies
Analytical study in which a group having one or more similar characteristics (such as habit of smoking or a particular disease) is closely monitored over time simultaneously with another group (whose member do not smoke or are free from the disease). Although more tedious, this method is used where case study approach is not feasible, creates too many statistical problems, or generally produces unreliable results. Also called follow up study.
What is non response?
an individual for the sample does not cooperate, or does not respond
What is the probability that is the probability of success is 25% and you have to find the probability that you get zero successes the first 10 trials?
1-geometric cdf (.25,10) =.0563
What is under coverage?
some group of the population is left out of the process for choosing the sample
What is an example of non response?
phone polls at dinnertime
Practice Problem:Which of the following statements are true?1. a completely randomized design offers no control for lurking variables.2. a randomized block design controls for the placebo effect.3. in a matched pairs design, participants within each pair receive the same treatment. (A) 1 only(B) 2 only(C) 3 only(D) all of the above(E) none of the above
(E), none of the above, is the correct answer. In a completely randomized design, experimental units are randomly assigned to treatment conditions which provides control for lurking variables. To control for the placebo effect, a placebo must be included as a treatment option.In a matched pairs design, experimental units within each pair are assigned to different treatments.
What is a median?
The middle # when the numbers are in numerical order.
Double blind
Control group has no treatment or has been given a placebo.
What is Cluster Sampling?
A sampling technique similar to Stratified Random Sampling in that the population is divided into groups. With Cluster Sampling, you actually just select one whole group instead of performing SRS on each group.
What does “b” represent in the LSRL and interpret this value.
“b” is the slope, and it is how much y changes by per x-value.
What is the value of correlation for operating cost per hour and number of passenger seats in the plane? Interpret the correlation.
r=0.75
There is a strong positive linear relationship between the operating cost per hour and the number of passenger seats on the plane.
What is the range for correlation coefficient?
-1 to 1
How many outcomes are there for each observation in the binomial setting?
There are two outcomes in the binomial setting, determined as “success” or “failure”. (example: either getting a job or not getting a job.)
What type of graph would you make with this information?
Ogive
What is the equation to find the slope (b)?
b=r(Sy/Sx)
What units do z-scores have?
none!
What is a Sample?
A portion of a population
What is Common Response?
When changes in both “x” and “y” are caused by a lurking variable “z”.Ex. In a study regarding the relationship between Mr. Hopkin’s happiness level and his student’s grades, a sudden leak of asbestos in the school can both decrease Mr. Hopkin’s happiness level and keep his students from having good grades.
Finding the Z-Score (with calculator)
invNorm(area, μ, σ)μ= Meanσ= Standard Deviation
The graph has two distinct peaks. The gap in graph is very clear.
Bi-Modal
What is a qualitative variable?
A variable that describes non numerical values.
There’s a 30% chance that when Greg sees his grandmother, she will give him a quarter. If Greg sees his grandmother fifteen times, what are the chances that she won’t give him a quarter at least four times?
1-binomcdf(15, .7, 3) = .9999There’s a 70% that she will not give him a quarter. Since the question is asking how many times won’t she give him a quarter, you have to find the probability of failure.
What is the first step in designing a simulation?
State the problem/ experimental design
Can the probability of success change for each observation?
No, the probability of success must be constant in each observation for the binomial setting.
What is the equation of the LSRL?
ŷ=a+bx
Treatment
The specific experimental condition applied
Where is a found in a computer printout?
Far upper left hand data