Week 1 & 2 Flashcards

1
Q

Why do we need to know data analysis?

A

There is a problem that needs to be solved and we need data and analytics to properly act on it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population?

A

All entities of interest in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a sample?

A

A subset or portion of the populations that is randomly chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dataset?

A

Table of data containing variables in the column section (horizontal), and observations in the row sections (vertical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of variable?

A

height, gender, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some data types?

A

Numeric vs categorical; Ordinal vs nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is numeric?

A

Meaningful arithmetic that can be performed on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is categorical

A

otherwise, non numeric (not numbers (?))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is ordinal?

A

There is a natural ordering of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is nominal?

A

No natural ordering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a binary decision?

A

0/1 - a categorical variable with n different categories (n-1) (?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is binning or discretizing

A

Categorizing a numeric variable into discrete (not specific)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some more data types?

A

Discrete vs continuous; Cross sectional vs time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is discrete?

A

Count data (e.g. # of children)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is continunous?

A

Continuous measurement like weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is cross sectional?

A

Cross section of a population in a FIXED time

17
Q

What is time series?

A

Data that are collected overtime

18
Q

What is an outlier?

A

An observation that lies outside of the norm (doesn’t mean it’s wrong)

19
Q

What is missing values?

A

Value of a variable is missing for observation

20
Q

What to do with missing values?

A

Ignore, average value, or estimate

21
Q

What to do with an outlier?

A

Run analysis and report with and without the outlier

22
Q

What is the most useful numeric system measure?

A

Correlation

23
Q

What is the most useful graph?

A

Scatter plot

24
Q

What is the tool to compare numerical variables across two or more subpopulations?

A

Side-by-Side Boxplots

25
Q

Tools to study relationships among numeric variables?

A

Scatterplot, correlation, and covariance

26
Q

What is a scatterplot?

A

2D graph to plot pairs from 2 numerical variables often used to examine relationships (e.g. temperature and sales)

27
Q

What is correlations and covariance?

A

Measuring the strength and direction of a LINEAR relationship between 2 numerical variables: X & Y
Note:
X&Y should be paired variables
Xi and Yi for observation i
n: Number of observations

28
Q

What is a perfect positive correlation?

A

An upward trend scatterplot graph that almost formed a straight line (Value = 1)

29
Q

What is a perfect negative correlation?

A

A DOWNWARD trend scatterplot graph that almost formed a straight line (Value = -1)

30
Q

What is and what value is a NO CORRELATION?

A

A scatterplot that is spread out and has no line trend. Value = 0