Reading Quiz 4 Flashcards
logarithmic (exponential) transformation
if (x, y) display approximately exponential shape then graph of (x, logy) will display approximately linear shape.
steps in logarithmic transformation
- graph original data set
- plot ordered pairs (x, logy). shape should be approximately linear
- find linear regression equation for logy in terms of x. answer calculator gives is of form logy-hat = ax + b. check correlation coefficient and residual plot to verify that equation is fairly good fit for data
- take antilogarithm of both sides of equation to solve for y-hat
power transformation
if ordered pairs (x, y) display approximate power function graph then graph of ordered pairs is (logx, logy)
steps in power transformation
- graph original data set
- plot ordered pairs (logx, logy). shape should be approximately linear
- find linear regression equation for logy in terms of logx. answer calculator gives is of form logy-hat = a + blogx. check correlation coefficient and residual plot
- take antilogarithm of both sides to solve for y-hat
important notes
- when explanatory variable is years, transform date to years since so values are smaller
- if function resembles power function then reasonable that point (0,0) should lie on graph
- can use any type of logarithm in log transformation
- extrapolation is use of regression line for prediction outside of range of values of explanatory variable x
- interpolation is use of regression line for prediction inside range of values of x (more trustworthy)
important notes cont
- association does not imply causation
- a lurking variable is a variable that has an important effect on relationship among variables but is not included in variables
- a confounding variable is a lurking variable that affects only the response variable but creates a situation where it’s impossible to determine whether the affect on the response variable is caused by the explanatory variable, the confounding variable, or neither
two way table
organizes data about two categorical variables
often used to summarize large amounts of data by grouping outcomes into categories
row variables
label rows that run across the table
column variable
label the columns that run down the table
marginal distributions
row totals and column totals in a two way table give the marginal distributions of the two individual variables
conditional distribution
look at one row and one column
find each entry in column as percent of column total
conditional distribution of row variable for each column in the table
how to describe association between row and column variables when column variable is explanatory
compare conditional distributions
how to describe association between row and column variables when row variable is explanatory
compare conditional distributions of column variable for each value of row variable
simpson’s paradox
a comparison between two variables that holds for each individual value of a third variable can be reversed when the data for all values of the third variable are combined
example of effect of lurking variables on observed association
common response
effect of lurking variables can operate through common response if changes in both explanatory and response variables are caused by changes in lurking variable
confounding
cannot distinguish between two variables’ effects on the response variable
best evidence that association is due to causation
comes from experiment in which explanatory variable is directly changed and other influences on response are controlled
NEED TO DO MORE READING QUESTIONS
OKAY
- True or False: if we have a curvilinear function, and we want to straighten it out to make a linear function, we can’t do that by multiplying or dividing by constants or adding or subtracting constants (i.e. by using linear transformations).
true
What are the transformations that are most commonly used, other than linear transformation?
A. Positive and negative powers, and logarithms.
- Linear growth is to adding a fixed amount per unit time as exponential growth is to ______ by a fixed amount per unit time.
multiplying
- Suppose we have a function y=ab^x, where a and b are constants and x is the explanatory or independent variable, and y is the response or dependent variable. Is this an example of an exponential function, or a power function?
exponential
- If y is an exponential function of x, plotting what function of y versus x should result in a linear graph?
the log of y vs x
Suppose you do a regression of the log (base 10) of y versus x, and you get a nice linear scatterplot and a high coefficient of determination (r^2) when you do a regression. Now you can use this linear relationship for
prediction. Suppose someone (like a test-maker) asks you what the predicted value is of y (not log y) for a given value of x. How would you find it?
A. You’d use your equation to find the predicted value of log y. Then you take the antilog (or 10 to that number) to get the predicted value of y. In other words, you “untransform” the value back to the original scale. (as shown in Example 4.7 on page 274)
- If a variable grows exponentially, its logarithm grows how?
linearly
- Suppose we have a function y = ax^b, where a and b are constants and x is the explanatory variable and y is the response variable. Is this an example of an exponential function, or a power function?
power function
- To make an exponential function linear, we use the log transformation just with the response variable y. To make a power function linear, we use the log transformation with what?
both the explanatory and the response variable
- If you start with the power function y=ax^p, and take the log of both sides, what result do you end up with?
A. log y=log a +p log x.
- Suppose you have a data set, and its scatterplot is curved. Then you take the log of both explanatory and response variables, and plot them, and you get a line. What do you infer from this?
A. That the original variables were related according to a power function (or power law).
- When you plot the log of y vs. the log of x, do you give any meaningful interpretation to the slope of the line that you get? If so, what is it?
A. According to the equation log y =log a + p log x, the slope of the line is the power to which x is raised in the original power function.
- Suppose you plot the log y vs. the log x and you get a good line, with intercept 2 and slope 3. So log y = 2+3log x. Now you are asked to find the equation for y in terms of x, without logs in it. How do you do this?
A. You take the antilog of both sides (make both sides base 10). You get y = 10^(2+3 log x) y = 10^2(10^log x)^3 y = 100x^3. That is, y=100 times x cubed.
- What type of variables does a two-way table describe the relationship between?
categorical
These give us the distribution for each variable separately, in our sample. These distributions are called what?
marginal distributions
- Suppose you have three age groups, and you have data on how many individuals got educated to each of 4 different levels. Suppose you calculate, just for one of the age groups, the per cent of people in that age group who attained each level. This distribution of per cents for one age group is called what?
a conditional distribution
- Do the percents for a conditional distribution add to 100 for each of the different groups for which you calculate them?
yes
- Do the per cents for conditional distributions equal the per cents for marginal distributions?
not necessarily
- True or False: When describing the relationship between two quantitative variables, the scatterplot and the correlation coefficient are usually the graph and numeric measure of choice; but in describing the relation between two categorical variables, no single graph or numeric measure summarizes the strength of the association. We usually pick and choose among bar charts and pie charts and the reporting of various percents.
true
- There were two AP Statistics teachers. 40% of the 40 students in the first teacher’s classes got 5’s, and 25% of the 40 students in the second teacher’s classes got 5’s. People assumed that the first teacher is better. However, someone then studied the results based on whether or not the students scored above or below a certain cutoff on the SAT, before going into AP Statistics. The first teacher had 80% of students above this cutoff and 20% below. The second teacher had 20% above and 80% below. The first teacher had 50% of the “aboves” get 5’s, and none of the “belows.” The second teacher had 75% of the “aboves” get 5’s, and 12.5% of the “belows.” Now which teacher appears to be better, and why?
A. The second teacher, because a higher fraction of that teacher’s students got 5s from those both above the cutoff and below the cutoff.
situation above is whose paradox
simpson’s
- True or False: In Simpson’s paradox, there is a lurking variable (a term you learned in chapter three), which predisposes the results against one of the two groups; controlling for the effects of that lurking variable by looking separately at the subsets formed by the categories of it reveals results in the opposite direction from those obtained when ignoring the lurking variable.
true
- When two variables X and Y are found to correlate with each other, of course two possible explanations for this association are 1) that X causes Y, and 2) (one not diagrammed on page 307) that Y causes X. Please name the other two possible explanations that are good to keep in mind when interpreting findings of associations.
A. Common response (z causes both x and y) and confounding (z, which is associated with x, may cause y).
- Someone finds that the degree of physical fitness in youth (as measured by heart rate recovery from exercise) is correlated with the number of ankle injuries the person has had. But before concluding that we should hurt the ankles of youth in order to make them more fit, a COMMON RESPONSE explanation for the association comes to mind. Can you posit this common response explanation?
A. That both fitness and ankle injuries are associated with more running or more athletic activity – both are responses to this basic causal variable.
- Suppose a researcher studies the effects of a way of teaching children not to be violent. The researcher gives the instruction to all the children in Mrs. Harmony’s classroom, and uses the kids in Mr. Gutsly’s classroom as a comparison group. But then the researcher realizes that Mrs. Harmony has a very different personality and interpersonal style than Mr. Gutsly: she tries to promote kindness and good will, whereas Mr. Gutsly is mainly interested in promoting competitiveness and not being wimps. What would we say about the variables of teacher personality and interpersonal style in this study?
A. That they are CONFOUNDED with the intervention. Thus the effects of these teacher variables can’t be distinguished from the effects of the intervention the study is meant to test.
- What is the strongest type of evidence for causal relations?
A. Well-designed experiments that are meant to control for all lurking variables. (These usually entail randomly assigning individuals to different conditions.)
- What’s the problem with doing a well-designed experiment, for example, to see what the effects of child abuse are?
A. We will never find it ethical to randomly assign children to conditions of child abuse versus nonabuse.
- Is it possible to come to valid causal inferences without doing experiments that randomly assign people to various conditions? Can you give an example of such?
A. Although your text says that “the only fully compelling method” of establishing causality is an experiment, we can and do come to valid causal inferences without randomly assigning people to conditions. The example of smoking and lung cancer is one where the evidence for causation is “overwhelming” despite no study in which people were randomly assigned to smoke or not smoke over many years.
- If a lurking variable can actually reverse the direction of results, do you think it is also possible that a lurking variable could result in lack of an observed association when in fact there is a causal influence?
yes
- Does the fact that lurking variables can obscure influences that are actually present imply that: not only does correlation not imply causation, but lack of correlation does not rule out causation?
yes