AFM 112 - Chp 3 Flashcards

1
Q

Define a frequency table

A

number of observations in each class/category/data
define a relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define a relative frequency table

A

percentage of observations that fall in each class/category (frequency/total number of observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when to use a bar graph vs pie chart

A

bar graphs uses a bar to represent each category + height of the bar equal to the frequency or relative frequency of the class/category (more specific)

pie chart - each class/category is represented by a slice and the size of each slice is proportional to the relative frequency of the class/category (more general)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we use to summarize + describe quantitative data

A

measures of central tendency + measures of variability of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define measures of central tendency

A

capture the tendency of the data to cluster around some central values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define measure of variability

A

captures the spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what’s the use of a histogram in graphing?

A

to capture the shape of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the histogram distributions described as? and define them

A
  1. symmetric - looks the same on both sides from the centre
  2. skewed left - majority of the data is on the right but there is a little bit of data on the left
  3. skewed right - majority of the data is on the left but there is a little bit of data on the right

4.bimodal - 2 spikes

5.multimodal - multiple spikes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we use mean + median to predict the shape of distribution

A
  1. if mean = median, distribution = likely symmetric
  2. if mean > median, distribution = likely skewed to the right - if difference is significant, it indicates there’s are outliers at the upper end (right side)
  3. if mean < median, distribution = likely skewed to the left. if the difference is significant, there are outliers at the lower end (left side)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define the interquartile range

A

distance between the first and third quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s the spreadsheet formula for first and third quartile?

A

1st quartile =percentile (e1:e39, 0.25)

3rd quartile = percentile (e1:e39, 0.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

define variance

A

averaged squared deviation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

define standard deviation

A

square root of the variance - higher the value, higher the variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 assumptions we can predict if the shape of distirbution is bell shaped and symmetric

A
  1. 68% of the observations will fall within 1 standard deviation from the mean - (range = x-1s to x+1s)
  2. 95% of the observations will fall within 2 standard deviation from the mean, (range = x-2s to x+2s)
  3. 99.7% of the observations will fall within 3 standard deviation from the mean (range = x-3s to x+3s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the importance of data understanding?

A

structure of the data + data captured in each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

define accuracy/consistency in terms of data quality

A

data set = accurate + consistent if it is as free as possible from intentional/unintentional errors

data values = accurate if they capture what the decision maker would consider as the actual value

data values = consistent if they do not change across occurrences

17
Q

Define timeliness in data quality context

A

if it contains info that is time relevant to the business problem that it will be used to address

18
Q

define completeness in terms of data quality

A

data set is complete if all the data points needed to capture a transaction are available

19
Q

define outliers + the importance

A

observations in a data set far away from the bulk of the observations

outliers are important as they may indicate various info - eg. potential fraud

20
Q

what’s the rule of distribution for detecting outliers in a bell shaped distribution

A

any value which is more than 3 standard deviation above the mean or below the mean.

21
Q

what’s the rule for detecting outliers in any distribution

A

any value which is above the end of the upper whisker (Q3 + 1.5*IQR) or is below the end of the lower whisker (Q1 - 1.5IQR)

22
Q

what are the 3 count functions in excel and the differences?

A

count - counts any cell with numerical data

count A - counts any cell with any data type

count blank - counts any cell with a blank value

23
Q

What are some issues with data quality?

A

missing values (using count functions), erroneous data

24
Q

what’s the function to categorize data?

A

unique function

25
Q

what’s the use and importance of filter function?

A

Filter (range, condition 1, [condition 2, etc])

26
Q

sort, filter, unique function

A
27
Q

clenaing data fucntions - time based - month, text functino, year function

A

“mmm”, “mmmm”, “ddd”, “dddd”

28
Q

What’s the spreadsheet formula to find the interquartile range?

A

v

29
Q

vlookup function

A
30
Q

sumif, countif, averageif function

A
31
Q

nested if functions

A
32
Q

round function

A
33
Q

text function

A
34
Q

datedif, today, now function

A
35
Q

proper, lower, upper, trim, concatenate function

A
36
Q

len, search, left, right, substitute function

A