Ch 1 - Intro to Data Flashcards

1
Q

Summary Statistic

A

a single number summarizing a large amount of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Matrix

A

Common way to organize data, each row is a unique case (aka unit of observation or obervational unit), each column corresponds to a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Numerical Variable

A

Can take a wide range of numerical values, and it makes sense to add, subtract, take averages with those values. Discrete or Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordinal Variable

A

a categorical variable that has levels with a natural ordering (gold, silver, bronze)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nominal Variable

A

Binary/dichotomous: 0=male, 1=female. Categorical: Blood types 0=A, 1=B, 3=AB, 4=O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Associated Variables

A

When two variables show some connections with one another. AKA dependent variables. If not associated, then they’re independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Simple Random Sample

A

Each case in the population has an equal chance of being included and there is no implied connection between the cases in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Convenience Sample

A

Individuals who are easily accessible are more likely to be included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Observational Studies

A

Data is collected in a way that does not directly interfere with how the data arise. Can provide evidence of naturally occurring association between variables, but they cannot by themselves show a causal relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Randomized Experiment

A

Individuals are randomly assigned to a group, and the individuals in each group are assigned a treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confounding Variable

A

A variable that is correlated to both the explanatory and response variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Stratified Sampling

A

The population is divided into groups called strata, chosen so that similar cases are grouped together, then a second sampling method is employed within each stratum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cluster Sample

A

Like a two-stage simple random sample. Break up population into many groups, called clusters. Then sample a fixed number of clusters and collect a simple random sample within each cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Scatterplot

A

Provides a case-by-case view of data for two numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dot Plot

A

One-variable scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Skew

A

When data trail off in one directions, the distribution has a long tail. If long left tail, it is left skewed. Right tail, right skewed.

17
Q

Sample Mean

A

Average of all observed values

18
Q

Sample Variance

A

Square all deviations from the sample mean, take an average. Divide by n-1 for sample mean.

19
Q

Median

A

50th percentile

20
Q

Interquartile Range (IQR)

A

The length of the box in a box plot = Q3 - Q1, where Q1 and Q3 are the 25th and 75th percentiles

21
Q

Box Plot Whiskers

A

Extend out from the box to the max of the farthest data point or 1.5 * IQR

22
Q

Robust Estimates

A

Median and IQR are examples, because extreme observations have little effect on their values. The mean and standard deviation are not robust.

23
Q

Contingency Table

A

A table that summarizes counts of data for two categorical variables

24
Q

(Relative) Frequency Table

A

A table that summarizes (percentages) counts of data for one variable