Descriptive Statistics Fundamentals Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Descriptive Statistics?

A

Descriptive statistics refers to a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution.

These methods provide an overview of the data and help identify patterns and relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do you want to learn the appropriate statistics to perform different test?

A

yes - do you know them?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 main ways to classify data?

A
  1. Types of data
  2. Measurement levels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 ‘Types of Data’ that you can have?

A
  1. Categorical
  2. Numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of Categorical Data?

A

A. Car brands like Audi, BMW, Mercedes, etc
B. Answers to Yes and No questions
Example - “Are you currently enrolled in a university?” “Do you own a car?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of Numerical Data?

A

Numerical Data represents numbers. It has two subsets Discrete & Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Numerical Data is a subset of Types of Data or Levels of Measurement?

A

Types of Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of Data - view

A
  1. Types of Data 2. Levels of Measurement
    a. Categorical b. Numerical
    i. Discrete ii. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the ‘3 Types of Data’?

A
  1. Categorical
  2. Numerical - Discrete
  3. Numerical - Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two subsets of Numerical Data?

A
  1. Discrete
  2. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Discrete Data?

A

Something that can be counted in a finite manner. (Absolutely sure the value will be an integer) (it is the opposite of continuous data)
Examples:
“How many children do you want?”
Scores on the SAT
Grades at university
Number of objects
Money as bank notes and coins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Continuous data?

A

Continuous Data is ‘infinite’ and impossible to count. (It can take on an infinite amount of value)
Examples:
Your weight
Height
Area
Distance
Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A variable represents the weight of a person. What type of data does it represent?

A

numerical, continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A variable represents the gender of a person, What type of data does it represent?

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 2 “Levels of Measurement”?

A
  1. Qualitative
  2. Quantitative - represented by numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of Qualitative Data?

A
  1. Nominal
  2. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are examples of Nominal Data?

A

Categorical data like car brands or like the four seasons (winter, spring, summer, fall)
They are not numbers and cannot be ordered
Definition: (of a role or status) existing in name only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are examples of Ordinal Data

A

Groups and categories that follow a strict order. Data that can be ordered.
Examples:
Likert Scale
Definition: relating to a thing’s position in a series.
“ordinal position of birth”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the two groups of Quantitative Data?

A
  1. Interval
  2. Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is unique about Ratio?

A

They have a true 0, and intervals don’t
Most things we observe in the world are ratio’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are examples of Ratio’s?

A

Number of objects, distance, price and time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the most common Interval variable?

A

Temperature - it doesn’t have a true zero
Celsius and Fahrenheit are Intervals and have no true zero

Temperature in K is a ratio and has a true zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A variable represents the gender of a person. What type of data and level of measurement does it represent?

A

Categorical, Qualitative- Nominal
Gender is a nominal variable. The possible categories cannot be put in any order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A variable represents the weight of a person. What type of data does it represent?

A

Continuous, Quantitative - Ratio
Weight is a ratio variable, which means it is a quantitative measure that has a true zero point, signifying the absence of the attribute being measured. In the case of weight, zero signifies a complete lack of weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the most intuitive way to interpret data?

A

Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are some useful ways to visualize categorical variables?

A

a. Frequency distribution tables
b. Bar Charts
c. Pie Charts
d. Pareto diagrams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a Frequency Distribution Table?

A

A table that has two columns. The type and the corresponding frequency.
frequency - the number of occurrences of each item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is Relative Frequency?

A

Relative frequency is the percentage of the total frequency for each category
Example: The percentage of cars sold
All relative frequencies add up to 100%
Reveals the share of the total
ie. Market Share - a good representation is a pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a Pareto Diagram?

A

A Pareto diagram is a special type of bar chart, where categories are shown in descending order of frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does Frequency represent?

A

the number of occurrences of each item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is Cumulative Frequence?

A

Cumulative Frequency is the sum of relative frequencies

It starts as the frequency of the first item and then adds the second item and so on until it finishes at 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you calculate Desired Intervals?

A

Largest number minus smallest number divided by number of desired intervals

largest number - smallest
/
number of desired intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Desire Interval Width?

A

5-20

34
Q

If the frequency of a variable is 20 and its total frequency of all variables is 120, what is its relative frequency?

A

.17

35
Q

What is the most common graph to represent Numerical Data?

A

The Histogram

36
Q

Why do the bars in a histogram touch?

A

to show continuation between the intervals. Each interval ends where the next one starts.

37
Q

True or False - Relative Frequency is made up of percentages?

A

True

38
Q

Can histogram’s have unequal widths?

A

Yes

39
Q

What are two visualization options to represent relationships between two variables?

A
  1. Cross tables
  2. Scatter Plots
40
Q

What are cross tables? What do they best represent?

A

A table where you calculate each row and column. They best represent relationships between two categorical variables. It best represents Categorical data

A variation is the side by side bar chart

41
Q

What is a scatter plot best used for?

A

A scatter plot is used when representing two ‘numerical’ variables
-representing relationships between two variables
- best used to get the main idea on how the data is distributed

42
Q

What is a definition of an ‘Outlier’?

A

Outliers are data points that go against the logic of the whole dataset

43
Q

What are the 3 measures of Central Tendency?

A

Mean, Median, Mode

44
Q

What are some uses of Central Tendency?

A

They give you an idea of how the data in a given dataset is distributed.

The mean is the arithmetic average of all numbers. It is very useful because it indicates the average value in the dataset. However, the mean can be flawed because outliers might impact it significantly.

The median is a value at the 50th percentile of the distribution.It disregards outliers and shows you what is in the middle of the distribution.

The mode is the value that is observed most frequently in the distribution. This gives you an idea about the value that reoccurs most often in the dataset.

45
Q

What is Mean also known as?

A

The simple average

Denoted as mu (µ) for Population and x-bar (x̄) in Sample

46
Q

What is the Median?

A

The middle number in the dataset

47
Q

What is the Mode?

A

The mode is the value that occurs most often.

It can be used in both categorical and numerical data

48
Q

When calculating Mode in a dataset what happens when no number is represented more than once?

A

We say, there is NO mode

49
Q

Which Central Tendency measure is best?

A

The measures should be used together rather than independently. There is no best, but using only one is definitely the worst.

50
Q

What is Skewness?

A

Skewness is the most common way to measure asymmetry.

Skewness indicates whether the data is concentrated on one side

51
Q

What is a Positive or Right Skew?

A

When the mean > median.

Data points are concentrated on the Left side
(outliers are to the Right. Less data to the Right)

52
Q

What kind of Skew happens when the Mean, Median and Mode are equal?

A

Zero or No Skew

the distribution is cymetrical

53
Q

What is a Negative or Left Skew?

A

When the Mean < Median

The highest point is defined by the mode.

The outliers are to the left

54
Q

Why is Skew important?

A

Skew tells us where the data is situated.

The link between Central Tendency and Probability Theory

55
Q

What are the 3 main measures of Variability?

A
  1. Variance
  2. Standard Deviation
  3. Coefficient of Variation
56
Q

Do you use the same formulas when working with Population Data vs Sample Data?

A

No - different formulas are used

57
Q

What does Variance measure?

A

Variance measures the dispersion of a set of data points around their mean

The closer a number is to the mean the lower the result (variance)
The farther away a number is from the mean the higher the result (variance)

Can never be a negative value

dispersion is about distance and distance cannot be negative

  • the result will be large and hard to compare - because it is squared
58
Q

Which is more meaningful, Std Dev or Variance?

A

Std dev will be much more meaningful than variance

59
Q

Are there different formulas for Std Deviation?

A

Yes, one for population and sample data

60
Q

What are the formulas for Standard deviation?

A

Population = sq root of the population variance
Sample = sq root of the sample variance

61
Q

What is the formula for Coefficient of Variation (CV)?

A

standard deviation / mean

62
Q

What is another name for Coefficient of Variation (CV)?

A

relative standard deviation

63
Q

What is the most common measure of variability for a single dataset?

A

standard deviation

64
Q

Why do we need the measure of Coefficient of Variation (CV)?

A

comparing the standard of deviation of two datasets is meaningless. Comparing Coefficients of Variation is not.

65
Q

Why is Standard Deviation preferred measure of variability?

A

Because it is directly interpretable. It is given in original units. Variance is given in squared units.

66
Q

Where is Coefficient of Variation (CV) best used?

A

When comparing the variability of two datasets

67
Q

What are the 3 univariate measures? (one variable)

A
  1. Central Tendency
  2. Asymmetry
  3. Variability
68
Q

What are the two methods to explore the relationship between two variables?

A
  1. Covariance
  2. Linear correlation coefficient
69
Q

What is the main statistic to measure correlation?

A

Covariance - it may be positive, negative, or zero

70
Q

What does the direction of covariance tell us?

A

> 0, the two variables move together
< 0, the two variables move in opposite directions
= 0, the two variables are independent

71
Q

What does the correlation coefficient do?

A

It adjusts the covariance, so that the relationship between the two variables becomes easy and intuitive to interpret.

72
Q

What is the range of the correlation coefficient?

A

-1 to +1

73
Q

What does Perfect Positive Correlation mean?

A

The entire variability of one variable is explained by the other
Correlation coefficient = 1

74
Q

What does a Correlation coefficient of Zero mean?

A

The variables are absolutely independent of each other. The two variables don’t have anything in common.

75
Q

What does a Negative Correlation Coefficient mean?

A

The variables move in opposite directions for each other. When one goes up the other goes down.

76
Q

Is the correlation of x, y = y, x

A

Yes

77
Q

Causality - Correlation does not imply causation

A

It is important to understand the direction of causal relationships

In housing, size causes the price and not vice versa

Causality is an asymmetric relation. (x causes y is different from y causes x)

78
Q

What is the formula for Correlation Coefficient?

A

Cov (x,y) / Stdev(x) * Stdev(y)

79
Q

What are the types of data and the levels of measurement of the following variables: Cust ID, Mortgage, Year of sale

A

Variable Type of Data Level of Measurement

Cust ID Categorical, Qualitative Nominal
Mortgage Categorical Nominal
Year of Sale Numerical, discrete Interval

Age Quantitative, Ratio - as a whole number is discrete
Price Numerical, Continuous Ratio
Gender Categorical Nominal
State Categorical Nominal

80
Q

What Excel function is used to calculate Correlation Coefficient?

A

CORREL()

81
Q

What Excel function is used to calculate Covariance?

A

COVARIANCE.S()

82
Q

When should you disregard correlations?

A

When the correlation is below 0.2