About Data 1 Flashcards

Experiment, data collection, categories

1
Q

What are the rows and columns of the data table called?

A

Rows (observations/cases)

Columns (variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two sub classes of numerical variables and what do they mean?

A

Continuous - infinite choices

Discrete - finite choices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two categorical variable sub classes and what do they mean?

A

Ordinal - has natural order

Regular- doesn’t have natural order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Associated vs independent variables?

A

Asssociated - two variables have a connection

Independent- no connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Anecdotal evidence?

A

Grandma says lightning cures cancer cuz it happend to her

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problems with taking a census

A

Some are hard to locate
Complex
Population changes while cencus is being taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

“Tasting soup” exploratory analysis, infrence and representative.

A

Exploritory analysis - gathering data (tasting the soup)
Inference - to generalize your claims to the whole population
Representative - does your sample represent tge whole population (it needs to!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling bias from these- non response, voulentary response, convenience sample

A

Non response - if only a small fraction of randomly sampled people respond; the sample may no longer be representative of the population.
Voulentary response- only people who care to respond are those with strong opinions (npt representative)
Convineince sample- people who are more easily accessable are more likly to be in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explanitory variable and response variables

A

Its a suggestion to which one is influencing the other (does not mean it is causal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Observational study

A

Data is collected in a way that does not effect how data comes “observes”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Experiment

A

subjects are assigned treatments to establish causal connections between explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Co-founding variable

A

a variable which is correlated to the explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two types of observational studys?

A

prospective and retrospective studys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

prospective study?

A

collects info as events unfold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

retrospective study?

A

collects info after events have taken place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four sampling methods?

A

simple random sampling
stratified “”
cluster “”
multistage “”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simple random sample

A

random samples

17
Q

Stratified sample

A

divides population into groups based on similar observations. Then takes random samples from each

18
Q

Cluster sample

A

divides population into random groups then takes whole cluster samples from some randomly chosen groups

19
Q

Multistage sample

A

make random clusters. then randomly chose clusters to sample. simple random sample within

20
Q

Principles of experimental design (4) C R R B

A

Control-compare treated with control group
Randomize- random samples
Replicate-do the experiment many times by collecting a large sample
Block-assign groups into subdivisions to eliminate a third variable

21
Q

Scatter plot

A

useful for visualizing the relationship between two numerical values

22
Q

Dot plots and mean

A

Shows the mean along with dots grouped densely up in a single line

23
Q

Sample statistic and point estimate

A

Sample statistic- data found from the sample

Point estimate- an estimation of the population

24
Q

Stacked plot

A

dots are piled on top of each other in multiple rows to show mean

25
Q

Histograms

A

shows data density, describes the shape of the data, bar width can alter the story (makes it less/more accurate) ex 1-10 vs 1-2

26
Q

4 types of histogram shapes

A

Uni modal - looks like normal distribution
bimodal- two humps
multi modal 3 or more humps
uniform- one straight line

27
Q

3 types of skew

A

Right medianmean

symmetric median=mean

28
Q

Varience

A
average squared deviation from the mean.
sum of(x-mean)^2/n-1
29
Q

Standard deviation

A

sqrt (sum of(x-mean)^2/n-1
sqrt of variance
all data should be within 3 sd’s

30
Q

median

A

value that splits data in half when in ascending order. if even average of two mid #s

31
Q

Q1, Q3 and IQR

A

Q1=25th percentile
median=50th
Q3=75th percentile
between Q1 and Q3 = the middle 50%= interquartile range=IQR

32
Q

box plot

A

represents the middle 50% with a box. with whiskers that can reach up to 1.5x the closest quartile. outliers are outside of the wiskers

33
Q

robustness

median and IQR vs mean and SD

A

median and IQR are more resistant to skewness. skewed distribution are rep by median and IQR. symmetric distribution rep by mean and SD

34
Q

Graph transformations

A

log(x) “eliminates outliers
easier to model
Hard to interpret

35
Q

contingency tables

A

summarizes data from two categorical variables

36
Q

bar plot

A

common way to show a single categorical variable

37
Q

histogram vs bar plot

A

histogram-numerical variables x axis number

bar plot-categorical variables x axis categories

38
Q

Segmented bar and mosaic plot

A

bar graph with legend. All the space is taken up to total # of samples with yes and no

39
Q

Pie charts

A

Don’t use if comparing more than 5 items. Colorful but bad.