Class 4 Spring 🌷 Flashcards

1
Q

Other R Basics

A

Order of operations
Commenting
ggplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

More R Review

A

Vectors
Dataframes
Functions
Loops
Libraries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling basics

A

Spoonful of soup example
The sample needs to be REPRESENTATIVE
Mud flat transect example
The sample needs to be RANDOM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Samples vs. populations

A

Each research question refers to a target population.

Often times, it is too expensive (or will take too long, or is impossible) to collect data for every case in a population. Instead, a sample is taken.

A sample represents a subset of the cases and is usually only a small fraction of the population.

The shorthand statistical notation is usually different for populations vs. samples (see the population vs. sample size to the right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

population parameters:

A

the characteristics of the whole population (like the population mean, population standard deviation, or population proportion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

point estimates (also called Sample Statistics):

A

the things you calculate from your sample (like sample mean, sample standard deviation, sample proportion) are considered point estimates for the population parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Shorthand statistical notation for Population parameters vs. Point estimates/sample statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Anecdotal data

A

First, the “data” only represent one or two cases.
Second, and more importantly, it is unclear whether these individual cases are representative of the population. These might represent unusual or even extraordinary cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Representative samples

A

most likely to come from random sampling methods (of which there are several)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example 1: Non-representative samples can muddy your results

A

We (as consumers) can easily access ratings for products, sellers, and companies through websites. These ratings are based only on those people who go out of their way to provide a rating.
Q: If 50% of online reviews for a product are negative, do you think this definitely means that 50% of buyers are dissatisfied with the product?

A: From our own anecdotal experiences, we believe people tend to rant more about products that fell below expectations than rave about those that perform as expected. For this reason, we suspect there is a negative bias in product ratings on sites like Amazon. However, since our experiences may not be representative, we also keep an open mind.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example 2: Non-representative samples can muddy your results

A

Sometimes, a sample “chooses itself.” This is a non-random (non-probability) sample.
Example: suppose a family doctor located in Santa Monica, CA wants to do some research on the frequency with which various ailments occur among the patients who happen to visit her office over a period of time.
Her “sample members” chose themselves by contacting her clinic for care.

Q: What are the concerns?
A: Can’t assume the sample will be typical (i.e. representative) of patients living in that area.
A: Very useful for their office, less useful for talking about patients in the US, or even in California, or even in LA!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random sampling (aka probability sampling) is incredibly important for a representative sample

A

In research or analysis, the researcher starts off with the population in mind.

He or she then selects a sample that they believe will represent all of the important demographic and other variations (perhaps education level, or health level, or gender, or age groups, or college year, or others)

In order for a sample to be representative, each member of a sample must be chosen at Random from the population. Each member of the population should have an equal chance of being chosen.

This is not always easy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example – problems with non-random sampling

A

If you wanted to interview people in downtown LA about their political views, by standing on a street corner and talking to people, you are unlikely to create a representative sample.
Question - why might that be?
You are most likely to approach (and be successful with) people who are not in a huge hurry, people without headphones on, or people who don’t look super angry.
These people may very well differ in their political opinions than those who you avoided (or who wouldn’t talk to you)
Despite your best efforts, you’ve introduced Statistical Systematic Bias into your sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Example – problems with non-random sampling

A

Identify the flaw(s) in reasoning in the following scenarios. Explain what the individuals in the study should have done differently if they wanted to make such strong conclusions.

1) Students at an elementary school are given a questionnaire that they are asked to return after their parents have completed it.
One of the questions asked is, “Do you find that your work schedule makes it difficult for you to spend time with your kids after school?”
60% of parents responded. Of the parents who replied, 85% said “no”.
Based on these results, the school officials conclude that a great majority of the parents have no difficulty spending time with their kids after school.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Example – problems with non-random sampling

A

2) A survey is conducted on a simple random sample of 1,000 women who recently gave birth, asking them about whether or not they smoked during pregnancy.
A follow-up survey asking if the children have respiratory problems is conducted 3 years later. However, only 567 of these women are reached at the same address.
The researcher reports that these 567 women are representative of all mothers.

3) An orthopedist administers a questionnaire to 30 of her patients who do not have any joint problems and finds that 20 of them regularly go running.
She concludes that running decreases the risk of joint problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of sampling

A

Some forms of random sampling/probability sampling (that you’ll need to know for class)
Simple random sample
Stratified random sample
Cluster sampling
Systematic population sampling

Some forms of non-random sampling/non-probability sampling (that you’ll need to know for class)
Convenience Sample
Voluntary response sample

17
Q

Random sampling is KEY for reducing systematic bias

A
  • If someone was permitted to pick and choose exactly which people/observations were included in the sample, it is entirely possible that the sample could be skewed to that person’s interests, subconscious biases, laziness, or a whole host of other issues.
  • This introduces Statistical Systematic Bias into a sample.
  • Statistical Systematic Bias is the difference between the sample value that you calculate from your problematic sample, and the true population value.
  • These are most commonly due to two problems: (1) measurements being taken on a non-representative sample, and/or (2) incorrect measurements being taken
  • Usually “systematic bias” is unintentional and accidental! But no less of an issue…