Organizing, Visualizing, and Describing Data Flashcards

1
Q

Data (Definition)

A

a collection of numbers, characters, words, text that represent FACTS or INFORMATION but NOT KNOWLEDGE (but analysis and interpretation on the facts and information develops knowledge)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main types of Data?

A

Numerical (quantitative) and Categorical (qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the definition of Categorical data?

A

Values that describe a quality or characteristic (mutually exclusive labels or groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of Categorical data types?

A

Nominal and Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the definition of nominal data?

A

No logical order (e.g. sectors of the economy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the definition of ordinal data?

A

Has logical order or rank (note that there is no information in the distance between groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the definition of Numerical data?

A

data that is measured or counted quantities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two types of Numerical data?

A

Integer/Discrete - limited to a finite number of values (number of people)
Ratio/Continous - can take on any value within a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does NOIR stand for and how does it related to data types?

A

Nominal
Ordinal
Integer/Discrete
Ratio/Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define variable

A

a particular quality or characteristic (Stock price, height)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define observation

A

a value of a specific variable (GM $53.30 and Trish is 5’9”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define cross-sectional data

A

multiple observations of a particular variable (the stock price of 60 companies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define time series

A

multiple observations of a particular variable for the same observational unit overtime // one unit and multiple observations (GM’s stock price over the last 60 months)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define panel data set

A

cross-sectional and time-series combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define structured data

A

Highly organized in a pre-defined manner (stock prices, returns, EPS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define unstrucuture data

A

no organized form (news, social media post, company filings, audio/video)

17
Q

Define absolute frequency

A

the actual count of observations per value of the variable

18
Q

Define relative frequency

A

Percentage of observations per value of the variable which is the absolute frequency divided by total N)

19
Q

How to create non-overlapping bins

A

Sort data in ascending order
Find the range: max-min
Decided on the number of intervals (which is K)
Calculate the interval width by dividing the range by k (always round up)
Add the internal to the first value and so on

20
Q

What is a Contingency table

A

it’s a table that summarizes data for 2 or more categorical variables (helps visually find patterns)

21
Q

What does a histogram or frequency polygon show?

A

represents the distribution of numerical data (y-axis shows frequency and x-axis shows intervals/values)

22
Q

What does a bar chart show?

A

Represent the frequency distribution of categorical data

23
Q

What does a tree map show

A

a set of coloured rectangles to represent groups

24
Q

What is a line chart used for?

A

Used to visualize ordered observations
Typically used for time series data
Facilitates showing changes and underlying trends

25
Q

What is a scatter plot used for?

A

Used to visualize the joint variation in 2 numerical values

26
Q

What is a heat map?

A

It is a contingency table with color-coded cells

It can also be used to visualize the degree of correlation among different variables

27
Q

Define “measures of central tendency”

A

Measures of central tendency specify where data are centered (arithmetic mean, median, mode, weighted mean, geometric mean, harmonic mean)

28
Q

Define “measures of location”

A

they are deciles, quantiles, quintiles

29
Q

Define median

A

Median is the middlemost value of a set of observations.
It is not affected by extreme values (i.e. outliers).
It is useful for describing the central tendency for a non-symmetrical distribtuion.
If the distribution is perfectly symetrical, then the mean equals the median.

30
Q

How do you calculate median?

A

For an even number of observations: (n+1)/2

For an odd number of observations: (n/2 + ((n+2)/2) )/ 2

31
Q

Define mode

A

The most frequently occurring value in a distribution
When there is no mode, then the observations are uniformly distributed.
This is the only measure of central tendency that can be used with nominal data

32
Q

When is geometric mean used?

A

Used with rates of change over time or to compute growth rates

33
Q

Is the arithmetic mean always greater than the geometric mean?

A

Yes

34
Q

What is the formula for geometric mean?

A

( ( (1 + R1) x (1 + R2) x (1 + R3) ) ^ (1 / N) ) - 1

35
Q

How are geometric mean and arithmetic mean related?

A

Xg = Xa - Variance/2

36
Q

What is an advantage of the harmonic mean?

A

It gives much less weight to outliers

37
Q

What is it appropriate to use the harmonic mean?

A

It is appropriate for averaging ratios when the ratios are repeatedly applied to a fixed quantity to yield a variable number of units (i.e. dollar-cost averaging)

38
Q

What is the formula for harmonic mean?

A

n / sum of all (1 / Xi)
n = number of observations and
Xi = the specific value for each observation.