Stats 511 Flashcards

1
Q

Creative Case study: Creativity study: 47 subjects randomly divided in two groups. Each subject gets 1 questionnaire.
Questionaries given to either invoke intrinsic motivation (ex: pleasure for writing something good) or extrinsic motivation (receiving public recognition for their writing). Then they write Haiku (this is how creativity is measured).
12 poets then evaluate each haiku and assign a creativity score from 1-40. The average of the judges score of their haiku is the subjects creativity score.

Salary Study:
Did Harris Trust and Savings bank discriminate by paying higher salaries to men than woman between 1969 and 1977? The data set given: data of all starting salaries for 32 males and 61 females. This data set includes all people hired during this time.

List the difference between these studies.

Which one is a sample, which one is census (population)

Which one is randomized experiment?
Which one is Observational Study?

Which one is approximately balanced?
Which one is not balanced?

List other differences.

A

Sample: Creativity study only 47 people (sample) out of larger group of individuals.

Census:
Bank study: All people (population) who were hired during a time period.

Randomized: Creativity study is (randomized experiment) and is randomly assigned subjects to groups.

Observational: Groups in Salary study determined by the subjects (observational)

Approximately balanced: Sample sizes in creativity study = approximately balanced. (because both groups N are about even)

Not Balanced: Sample size in salary/bank study are not balanced.

Other:
1. “Response” (creativity score) is attempting to quantify something vague (creativity). But the salary’s are already numbers, so it’s easier to quantify.

  1. Period of time of the study: a week or so for creativity study vs 8 years for salary study.
  2. Difference in expense
  3. Creativity score is subjective
  4. Need to control the environment for the creativity study.
  5. Were the poets blinded to the treatment? The salary’s were assigned without blinding.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Scope of Interest?

A

What can we infer from the study? What can we infer from this study? What does it tell us about the world?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can we ask to gather scope of interest? 2 separate questions.

A
  1. Were subjects randomly assigned to groups:
    If subjects WERE randomly assigned. Then we can infer causation or that the treatment CAUSED any differences we observed in the response.

If the subjects were NOT randomly assigned into groups, we CANNOT infer causation.
In the salary study, we can’t infer that gender CAUSED salary differences. There may be underlining cause associated with gender (example: education)

  1. Was it a random sample?
    If sample is a random sample from a larger population than results can be inferred to be reflective of the larger population. Otherwise, results only apply to subjects observed.
    Not easy to get a random sample.
    Imagine repeating the random sample many times, eventually you’ll see the whole population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inference to Population?

A

Inferences to populations can be drawn from random sampling studies, but not
otherwise. In a random sampling study, units are selected by the investigator from a well defined population. All units in the population have a chance of being selected, and the investigator employs a chance mechanism (like a lottery) to determine actual selection.

The subjects of the creativity study
volunteered their participation- it was self selected, not a random sampling study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Casual Conclusions/Inference: Can statistical analysis alone be used to establish causal relationships?
  2. Confounding variable
A

Statistical inferences of cause-and-effect relationships: can be drawn from randomized
experiments, but not from observational studies.

Confounding Variable: is related both to group membership and to the outcome.
Its presence makes it hard to establish the outcome as being a direct consequence
of group membership.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do observational studies have value?

A

Yes
1. Establishing causation is not always the goal.
2. Establishing causation may be done in other ways (Example examining people exposed to radiation from atomic blast, and those far enough away not to be impacted by atomic blast).
3. Analysis of observational data may lend evidence toward causal theories and suggest
the direction of future research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The most basic form of random sampling is:

A

A simple random sample
A simple random sample of size n from a population is a subset of the population
consisting of n members selected in such a way that every subset of size n is afforded
the same chance of being selected.
A typical method assigns each member of the population a computer-generated
random number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An inference
A statistical inference

A

An inference is a conclusion that patterns in the data are present in some broader
context.
A statistical inference is an inference justified by a probability model linking the
data to the broader context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical Inferences Based on Chance Mechanisms

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Population?

A

Unless you have a random sample, the population is conceptual. Imagine repeating the study many times. The population is the collection of subjects over these many repetitions.

Creativity study: Not a random sample but volunteers.
Over many repetitions of the study you’d have a large collection of volunteers, and that is your conceptual populations, and you don’t really know that much about it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical Hypothesis Test Structure

A

Null Hypothesis:
Test Statistic:
Sampling Distribution: distribution of test statistic over many repetitions of the study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measuring uncertainty in randomized Experiments

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Typical Randomized Experiment (Image)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard deviation

A

Measure of spread. Amount of variation of a set of numbers. Low SD means values tend to be close to the mean. High SD indicates values are spread out over wider range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Null Hypothesis and Alternative Hypothesis and T-Test

A

Insert Image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Historgram

A

ordinarily used to show broad features, not exquisite detail, and the
broad features will be apparent with many choices.

17
Q

stem-and-leaf diagram
https://www.youtube.com/watch?v=8WMTdnDLAj4

A

is a cross between a graph and a table. It is used to get a
quick idea of the distribution of a set of measurements with pencil and paper or to
present a set of numbers in a report.
Display 1.10 shows stem-and-leaf diagrams. digits in each observation are separated into a stem and a leaf. Each number
in a set is represented in the diagram by its leaf on the same line as its stem. All
possible stem values are listed in increasing order from top to bottom, whether or
not there are observations with those stems. At each stem, all corresponding leaves
are listed, in increasing order. Outliers may require a break in the string of stems.
The stem-and-leaf diagrams show the centers, spreads, and shapes of distributions
in the same way histograms do. Their advantages include exact depiction of
each number, ease of determining the median and quartiles of each set, and ease
of construction. Disadvantages include difficulty in comparing distributions when
the numbers of observations in data sets are very different and severe clutter with
large data sets.

18
Q

Boxp and whisker plot

A