Stats 511 Flashcards
Creative Case study: Creativity study: 47 subjects randomly divided in two groups. Each subject gets 1 questionnaire.
Questionaries given to either invoke intrinsic motivation (ex: pleasure for writing something good) or extrinsic motivation (receiving public recognition for their writing). Then they write Haiku (this is how creativity is measured).
12 poets then evaluate each haiku and assign a creativity score from 1-40. The average of the judges score of their haiku is the subjects creativity score.
Salary Study:
Did Harris Trust and Savings bank discriminate by paying higher salaries to men than woman between 1969 and 1977? The data set given: data of all starting salaries for 32 males and 61 females. This data set includes all people hired during this time.
List the difference between these studies.
Which one is a sample, which one is census (population)
Which one is randomized experiment?
Which one is Observational Study?
Which one is approximately balanced?
Which one is not balanced?
List other differences.
Sample: Creativity study only 47 people (sample) out of larger group of individuals.
Census:
Bank study: All people (population) who were hired during a time period.
Randomized: Creativity study is (randomized experiment) and is randomly assigned subjects to groups.
Observational: Groups in Salary study determined by the subjects (observational)
Approximately balanced: Sample sizes in creativity study = approximately balanced. (because both groups N are about even)
Not Balanced: Sample size in salary/bank study are not balanced.
Other:
1. “Response” (creativity score) is attempting to quantify something vague (creativity). But the salary’s are already numbers, so it’s easier to quantify.
- Period of time of the study: a week or so for creativity study vs 8 years for salary study.
- Difference in expense
- Creativity score is subjective
- Need to control the environment for the creativity study.
- Were the poets blinded to the treatment? The salary’s were assigned without blinding.
What is Scope of Interest?
What can we infer from the study? What can we infer from this study? What does it tell us about the world?
What can we ask to gather scope of interest? 2 separate questions.
- Were subjects randomly assigned to groups:
If subjects WERE randomly assigned. Then we can infer causation or that the treatment CAUSED any differences we observed in the response.
If the subjects were NOT randomly assigned into groups, we CANNOT infer causation.
In the salary study, we can’t infer that gender CAUSED salary differences. There may be underlining cause associated with gender (example: education)
- Was it a random sample?
If sample is a random sample from a larger population than results can be inferred to be reflective of the larger population. Otherwise, results only apply to subjects observed.
Not easy to get a random sample.
Imagine repeating the random sample many times, eventually you’ll see the whole population.
Inference to Population?
Inferences to populations can be drawn from random sampling studies, but not
otherwise. In a random sampling study, units are selected by the investigator from a well defined population. All units in the population have a chance of being selected, and the investigator employs a chance mechanism (like a lottery) to determine actual selection.
The subjects of the creativity study
volunteered their participation- it was self selected, not a random sampling study.
- Casual Conclusions/Inference: Can statistical analysis alone be used to establish causal relationships?
- Confounding variable
Statistical inferences of cause-and-effect relationships: can be drawn from randomized
experiments, but not from observational studies.
Confounding Variable: is related both to group membership and to the outcome.
Its presence makes it hard to establish the outcome as being a direct consequence
of group membership.
Do observational studies have value?
Yes
1. Establishing causation is not always the goal.
2. Establishing causation may be done in other ways (Example examining people exposed to radiation from atomic blast, and those far enough away not to be impacted by atomic blast).
3. Analysis of observational data may lend evidence toward causal theories and suggest
the direction of future research.
The most basic form of random sampling is:
A simple random sample
A simple random sample of size n from a population is a subset of the population
consisting of n members selected in such a way that every subset of size n is afforded
the same chance of being selected.
A typical method assigns each member of the population a computer-generated
random number.
An inference
A statistical inference
An inference is a conclusion that patterns in the data are present in some broader
context.
A statistical inference is an inference justified by a probability model linking the
data to the broader context.
Statistical Inferences Based on Chance Mechanisms
What is the Population?
Unless you have a random sample, the population is conceptual. Imagine repeating the study many times. The population is the collection of subjects over these many repetitions.
Creativity study: Not a random sample but volunteers.
Over many repetitions of the study you’d have a large collection of volunteers, and that is your conceptual populations, and you don’t really know that much about it.
Statistical Hypothesis Test Structure
Null Hypothesis:
Test Statistic:
Sampling Distribution: distribution of test statistic over many repetitions of the study.
Measuring uncertainty in randomized Experiments
Typical Randomized Experiment (Image)
Standard deviation
Measure of spread. Amount of variation of a set of numbers. Low SD means values tend to be close to the mean. High SD indicates values are spread out over wider range.
Null Hypothesis and Alternative Hypothesis and T-Test
Insert Image