AFM 112 - Chp 3 Flashcards

1
Q

Define a frequency table

A

number of observations in each class/category/data
define a relative frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define a relative frequency table

A

percentage of observations that fall in each class/category (frequency/total number of observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when to use a bar graph vs pie chart

A

bar graphs uses a bar to represent each category + height of the bar equal to the frequency or relative frequency of the class/category (more specific)

pie chart - each class/category is represented by a slice and the size of each slice is proportional to the relative frequency of the class/category (more general)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we use to summarize + describe quantitative data

A

measures of central tendency + measures of variability of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define measures of central tendency

A

capture the tendency of the data to cluster around some central values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define measure of variability

A

captures the spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what’s the use of a histogram in graphing?

A

to capture the shape of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the histogram distributions described as? and define them

A
  1. symmetric - looks the same on both sides from the centre
  2. skewed left - majority of the data is on the right but there is a little bit of data on the left
  3. skewed right - majority of the data is on the left but there is a little bit of data on the right

4.bimodal - 2 spikes

5.multimodal - multiple spikes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we use mean + median to predict the shape of distribution

A
  1. if mean = median, distribution = likely symmetric
  2. if mean > median, distribution = likely skewed to the right - if difference is significant, it indicates there’s are outliers at the upper end (right side)
  3. if mean < median, distribution = likely skewed to the left. if the difference is significant, there are outliers at the lower end (left side)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define the interquartile range

A

distance between the first and third quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what’s the spreadsheet formula for first and third quartile?

A

1st quartile =percentile (e1:e39, 0.25)

3rd quartile = percentile (e1:e39, 0.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

define variance

A

averaged squared deviation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

define standard deviation

A

square root of the variance - higher the value, higher the variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 assumptions we can predict if the shape of distirbution is bell shaped and symmetric

A
  1. 68% of the observations will fall within 1 standard deviation from the mean - (range = x-1s to x+1s)
  2. 95% of the observations will fall within 2 standard deviation from the mean, (range = x-2s to x+2s)
  3. 99.7% of the observations will fall within 3 standard deviation from the mean (range = x-3s to x+3s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the importance of data understanding?

A

structure of the data + data captured in each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

define accuracy/consistency in terms of data quality

A

data set = accurate + consistent if it is as free as possible from intentional/unintentional errors

data values = accurate if they capture what the decision maker would consider as the actual value

data values = consistent if they do not change across occurrences

17
Q

Define timeliness in data quality context

A

if it contains info that is time relevant to the business problem that it will be used to address

18
Q

define completeness in terms of data quality

A

data set is complete if all the data points needed to capture a transaction are available

19
Q

define outliers + the importance

A

observations in a data set far away from the bulk of the observations

outliers are important as they may indicate various info - eg. potential fraud

20
Q

what’s the rule of distribution for detecting outliers in a bell shaped distribution

A

any value which is more than 3 standard deviation above the mean or below the mean.

21
Q

what’s the rule for detecting outliers in any distribution

A

any value which is above the end of the upper whisker (Q3 + 1.5*IQR) or is below the end of the lower whisker (Q1 - 1.5IQR)

22
Q

what are the 3 count functions in excel and the differences?

A

count - counts any cell with numerical data

count A - counts any cell with any data type

count blank - counts any cell with a blank value

23
Q

What are some issues with data quality?

A

missing values (using count functions), erroneous data

24
Q

what’s the function to categorize data?

A

unique function

25
what's the use and importance of filter function?
Filter (range, condition 1, [condition 2, etc])
26
sort, filter, unique function
27
clenaing data fucntions - time based - month, text functino, year function
"mmm", "mmmm", "ddd", "dddd"
28
What's the spreadsheet formula to find the interquartile range?
v
29
vlookup function
30
sumif, countif, averageif function
31
nested if functions
32
round function
33
text function
34
datedif, today, now function
35
proper, lower, upper, trim, concatenate function
36
len, search, left, right, substitute function