About Data 1 Flashcards

Experiment, data collection, categories

1
Q

What are the rows and columns of the data table called?

A

Rows (observations/cases)

Columns (variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two sub classes of numerical variables and what do they mean?

A

Continuous - infinite choices

Discrete - finite choices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two categorical variable sub classes and what do they mean?

A

Ordinal - has natural order

Regular- doesn’t have natural order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Associated vs independent variables?

A

Asssociated - two variables have a connection

Independent- no connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Anecdotal evidence?

A

Grandma says lightning cures cancer cuz it happend to her

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problems with taking a census

A

Some are hard to locate
Complex
Population changes while cencus is being taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

“Tasting soup” exploratory analysis, infrence and representative.

A

Exploritory analysis - gathering data (tasting the soup)
Inference - to generalize your claims to the whole population
Representative - does your sample represent tge whole population (it needs to!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling bias from these- non response, voulentary response, convenience sample

A

Non response - if only a small fraction of randomly sampled people respond; the sample may no longer be representative of the population.
Voulentary response- only people who care to respond are those with strong opinions (npt representative)
Convineince sample- people who are more easily accessable are more likly to be in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explanitory variable and response variables

A

Its a suggestion to which one is influencing the other (does not mean it is causal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Observational study

A

Data is collected in a way that does not effect how data comes “observes”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Experiment

A

subjects are assigned treatments to establish causal connections between explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Co-founding variable

A

a variable which is correlated to the explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two types of observational studys?

A

prospective and retrospective studys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

prospective study?

A

collects info as events unfold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

retrospective study?

A

collects info after events have taken place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four sampling methods?

A

simple random sampling
stratified “”
cluster “”
multistage “”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simple random sample

A

random samples

17
Q

Stratified sample

A

divides population into groups based on similar observations. Then takes random samples from each

18
Q

Cluster sample

A

divides population into random groups then takes whole cluster samples from some randomly chosen groups

19
Q

Multistage sample

A

make random clusters. then randomly chose clusters to sample. simple random sample within

20
Q

Principles of experimental design (4) C R R B

A

Control-compare treated with control group
Randomize- random samples
Replicate-do the experiment many times by collecting a large sample
Block-assign groups into subdivisions to eliminate a third variable

21
Q

Scatter plot

A

useful for visualizing the relationship between two numerical values

22
Q

Dot plots and mean

A

Shows the mean along with dots grouped densely up in a single line

23
Q

Sample statistic and point estimate

A

Sample statistic- data found from the sample

Point estimate- an estimation of the population

24
Stacked plot
dots are piled on top of each other in multiple rows to show mean
25
Histograms
shows data density, describes the shape of the data, bar width can alter the story (makes it less/more accurate) ex 1-10 vs 1-2
26
4 types of histogram shapes
Uni modal - looks like normal distribution bimodal- two humps multi modal 3 or more humps uniform- one straight line
27
3 types of skew
Right medianmean | symmetric median=mean
28
Varience
``` average squared deviation from the mean. sum of(x-mean)^2/n-1 ```
29
Standard deviation
sqrt (sum of(x-mean)^2/n-1 sqrt of variance all data should be within 3 sd's
30
median
value that splits data in half when in ascending order. if even average of two mid #s
31
Q1, Q3 and IQR
Q1=25th percentile median=50th Q3=75th percentile between Q1 and Q3 = the middle 50%= interquartile range=IQR
32
box plot
represents the middle 50% with a box. with whiskers that can reach up to 1.5x the closest quartile. outliers are outside of the wiskers
33
robustness | median and IQR vs mean and SD
median and IQR are more resistant to skewness. skewed distribution are rep by median and IQR. symmetric distribution rep by mean and SD
34
Graph transformations
log(x) "eliminates outliers easier to model Hard to interpret
35
contingency tables
summarizes data from two categorical variables
36
bar plot
common way to show a single categorical variable
37
histogram vs bar plot
histogram-numerical variables x axis number | bar plot-categorical variables x axis categories
38
Segmented bar and mosaic plot
bar graph with legend. All the space is taken up to total # of samples with yes and no
39
Pie charts
Don't use if comparing more than 5 items. Colorful but bad.