Describing and Summarizing Data Flashcards

1
Q

Histogram’s X & Y Axis

A

x-axis represents bins corresponding to ranges of data; its y-axis indicates the frequency of observations falling into each bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an outlier?

A

An outlier is a value that falls far from the rest of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does skewness measures?

A

Skewness measures the degree of a graph’s asymmetry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If the right tail is longer, we say it is skewed …

A

to the right or “right-tailed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

“central tendency”

A

an indication of where the “center” of the data set lies. We usually start by calculating the mean, the most common measurement of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MEAN =

A

“average” of a set of numbers

=AVERAGE(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

MODE =

A

the value that occurs most frequently in a data set

=MODE.SNGL(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

bimodal distribution

A

A distribution is called bimodal if it has two clearly defined peaks (two points with very high frequency). The two peaks may have equal frequency and hence be true modes, or one peak may be a mode and the other peak may simply have a very high (but not the highest) frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MEDIAN =

A

is the middle value of the data set. The median is the 50th percentile of the data set.
=MEDIAN(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PERCENTILE =

A

The value beneath which a certain percentage of the data lie. For example, someone who scored in the 95th percentile of a test scored equal to or higher than 95% of all people who took that test. We can also say that person scored in the top 5%.

=PERCENTILE.INC(array, k)

array is the range of data for which we want to calculate a given percentile.
k is the percentile value. For example, if we want to know the 95th percentile, k would be 0.95.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

VARIABILITY measures

A

How widely dispersed are the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

To gain insight into the spread of the distribution we calculate …

A

Variance. Variance looks at how far is min and max value away from the MEAN.

Small Standard Deviation = they are close to MEAN, Large -= they are far.

The standard deviation is equal to the square root of the variance. If the variance is 9, then the standard deviation must be 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To calculate the variance or standard deviation of a sample in Excel, we can use the following functions:

A

=VAR.S(number 1, [number 2], …)

=STDEV.S(number 1, [number 2], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Coefficient of Variation =

A

the amount of variation in two different data sets.

To compare variation in 2 Data Sets, we calculate a value called the coefficient of variation (CV).

= the standard deviation / the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To visualize the relationship between two variables, we typically use

A

a scatter plot. One variable is plotted on the horizontal axis (x-axis), and the other is plotted on the vertical axis (y-axis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Correlation coefficient measures …

A

the strength of a linear relationship between two variables.

The correlation coefficient tells us the strength of association and its direction.
For example, we can determine if the variables are directly or inversely correlated based on the sign on the coefficient.

17
Q

Excel: Correlation Coefficient

A

=CORREL(array 1, array 2)

18
Q

Hidden Variable is …

A

a variable that is correlated with each of two variables (such as ice cream and snow shovel sales) that are not fundamentally related to each other. EXAMPLE: Shovel Sales & Ice Cream SALES

19
Q

The value of the correlation coefficient ranges between

A

-1 and +1.

20
Q

A correlation coefficient near zero indicates

A

a weak or nonexistent linear relationship.

A correlation coefficient near zero does not mean there is no relationship between the two variables; it indicates only that any relationship that does exist is not linear.

21
Q

When one of the variables is time, the relationship is known as a …

A

time series

22
Q

Cross-sectional data provides a snapshot of

A

data across multiple groups at a given point in time.