About Data 1 Flashcards by Trevor Sebastien

What are the rows and columns of the data table called?

Rows (observations/cases)

Columns (variables)

How well did you know this?

Not at all

Perfectly

What are the two sub classes of numerical variables and what do they mean?

Continuous - infinite choices

Discrete - finite choices

How well did you know this?

Not at all

Perfectly

What are the two categorical variable sub classes and what do they mean?

Ordinal - has natural order

Regular- doesn’t have natural order

How well did you know this?

Not at all

Perfectly

Associated vs independent variables?

Asssociated - two variables have a connection

Independent- no connection

How well did you know this?

Not at all

Perfectly

Anecdotal evidence?

Grandma says lightning cures cancer cuz it happend to her

How well did you know this?

Not at all

Perfectly

Problems with taking a census

Some are hard to locate
Complex
Population changes while cencus is being taken

How well did you know this?

Not at all

Perfectly

“Tasting soup” exploratory analysis, infrence and representative.

Exploritory analysis - gathering data (tasting the soup)
Inference - to generalize your claims to the whole population
Representative - does your sample represent tge whole population (it needs to!)

How well did you know this?

Not at all

Perfectly

Sampling bias from these- non response, voulentary response, convenience sample

Non response - if only a small fraction of randomly sampled people respond; the sample may no longer be representative of the population.
Voulentary response- only people who care to respond are those with strong opinions (npt representative)
Convineince sample- people who are more easily accessable are more likly to be in the sample

How well did you know this?

Not at all

Perfectly

Explanitory variable and response variables

Its a suggestion to which one is influencing the other (does not mean it is causal)

How well did you know this?

Not at all

Perfectly

Observational study

Data is collected in a way that does not effect how data comes “observes”

How well did you know this?

Not at all

Perfectly

Experiment

subjects are assigned treatments to establish causal connections between explanatory and response variables

How well did you know this?

Not at all

Perfectly

Co-founding variable

a variable which is correlated to the explanatory and response variables

How well did you know this?

Not at all

Perfectly

Two types of observational studys?

prospective and retrospective studys

How well did you know this?

Not at all

Perfectly

prospective study?

collects info as events unfold

How well did you know this?

Not at all

Perfectly

retrospective study?

collects info after events have taken place

How well did you know this?

Not at all

Perfectly

What are the four sampling methods?

simple random sampling
stratified “”
cluster “”
multistage “”

How well did you know this?

Not at all

Perfectly

Simple random sample

random samples

Stratified sample

divides population into groups based on similar observations. Then takes random samples from each

Cluster sample

divides population into random groups then takes whole cluster samples from some randomly chosen groups

Multistage sample

make random clusters. then randomly chose clusters to sample. simple random sample within

Principles of experimental design (4) C R R B

Control-compare treated with control group
Randomize- random samples
Replicate-do the experiment many times by collecting a large sample
Block-assign groups into subdivisions to eliminate a third variable

Scatter plot

useful for visualizing the relationship between two numerical values

Dot plots and mean

Shows the mean along with dots grouped densely up in a single line

Sample statistic and point estimate

Sample statistic- data found from the sample

Point estimate- an estimation of the population

Stacked plot

dots are piled on top of each other in multiple rows to show mean

Histograms

shows data density, describes the shape of the data, bar width can alter the story (makes it less/more accurate) ex 1-10 vs 1-2

4 types of histogram shapes

Uni modal - looks like normal distribution bimodal- two humps multi modal 3 or more humps uniform- one straight line

3 types of skew

Right medianmean | symmetric median=mean

Varience

``` average squared deviation from the mean. sum of(x-mean)^2/n-1 ```

Standard deviation

sqrt (sum of(x-mean)^2/n-1 sqrt of variance all data should be within 3 sd's

median

value that splits data in half when in ascending order. if even average of two mid #s

Q1, Q3 and IQR

Q1=25th percentile median=50th Q3=75th percentile between Q1 and Q3 = the middle 50%= interquartile range=IQR

box plot

represents the middle 50% with a box. with whiskers that can reach up to 1.5x the closest quartile. outliers are outside of the wiskers

robustness | median and IQR vs mean and SD

median and IQR are more resistant to skewness. skewed distribution are rep by median and IQR. symmetric distribution rep by mean and SD

Graph transformations

log(x) "eliminates outliers easier to model Hard to interpret

contingency tables

summarizes data from two categorical variables

bar plot

common way to show a single categorical variable

histogram vs bar plot

histogram-numerical variables x axis number | bar plot-categorical variables x axis categories

Segmented bar and mosaic plot

bar graph with legend. All the space is taken up to total # of samples with yes and no

Pie charts

Don't use if comparing more than 5 items. Colorful but bad.

About Data 1 Flashcards

Experiment, data collection, categories