Module 1 Flashcards

1
Q

Before gathering and analyzing data, we should always ______ the question we wish to answer.

A

identify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Graphs are very useful for examining a data set, as they often reveal ______ and ______ and help us detect outliers.

A

patterns; trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

One useful graph is a _______, also called a bar chart.

A

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A histogram’s _____ represents _____ corresponding to ranges of data;

A

x-axis; bins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A histogram’s _______ indicates the _______of

observations falling into each bin.

A

y-axis; frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

An ______ is a value that falls far from the rest of the data.

A

outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Graphing ___________ on a scatter plot can reveal relationships between ___________.

A

two variables; two variables (two data sets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Although there may be a relationship between two variables, we cannot conclude that one variable
______ the other. This point is best summarized in the admonition, “_________ does not imply causation.”

A

“causes”; correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Be alert to the possibility of _________, which may be responsible for patterns we see when graphing or examining relationships between two data sets.

A

hidden variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To summarize a data set numerically, we often use __________, also known as summary statistics.

A

descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Three values describe the center, or central tendency, of the data set:

A
  1. mean
  2. median
  3. mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The _____ is equal to sum of all data points in the set divided by the number of data points.

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The ______ is the middle value of the data set.

A

median - half of the data set’s values lie below the median, and half lie above the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The _____ is the value that occurs most frequently in the data set.

A

mode - A data set may have multiple modes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The ______, _______, and ________ measure the spread of the data.

A

range; variance; standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The standard deviation is equal to the _______of the variance.

A

square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

To compare variation in different data sets, we calculate the ___________.

A

coefficient of variation

18
Q

The coefficient ofvariation measures the size of the ________ relative to the size of the ______.

A

standard deviation; mean

coefficient of variation = standard deviation/mean

19
Q

A ________ is the mean of a subset of the data that includes all values satisfying a certain condition.

A

conditional mean

20
Q

60% of the observations are _____ or ______ to the

60th percentile.

A

less than; equal

21
Q

The median is by definition the _____ percentile of a data set.

A

50th

22
Q

We can quantify the strength of a linear relationship between two variables by calculating the _______.

A

correlation coefficient

23
Q

The value of the correlation coefficient ranges between __ and ___.

A

-1; +1

24
Q

A correlation coefficient near ____ indicates a weak or nonexistent linear relationship.

A

zero

25
Q

A correlation coefficient near zero does not mean there is no relationship between the two variables; it indicates only that any relationship that does exist is _____.

A

not linear

26
Q

When one of the variables is time, the relationship is known as a ______.

A

time series

27
Q

_________ provide a snapshot of data across multiple groups at a given point in time.

A

Cross-sectional data

28
Q

What excel function finds the average?

A

=AVERAGE(number 1,[number 2], …)

29
Q

What excel function finds the median?

A

=MEDIAN(number 1, [number 2], …)

30
Q

What excel function finds the mode?

A

=MODE.SNGL(number 1, [number 2], …)

31
Q

What excel function returns the conditional mean, or average of the cells in a specified range that meet the given criteria?

A

=AVERAGEIF(range, criteria, [average_range])

  • Range contains the one or more cells to which we wish to apply the criteria or condition.
  • Criteria is the condition that is to be applied to the range.
  • [average_range] is the range of cells containing the data we wish to average.
32
Q

What excel function returns the k-th percentile of value in the specified array? For example, if we want to know the 95th percentile for an array of data, k would be ____.

A

=PERCENTILE.INC(array, k); 0.95

33
Q

What excel function finds the variance?

A

=VAR.S(number 1, [number 2], …)

34
Q

What excel function finds the standard deviation?

A

=STDEV.S(number 1, [number 2], …)

35
Q

What excel function finds the square root?

A

=SQRT(number)

36
Q

What excel function finds the number of data points?

A

=COUNT(value 1, [value 2], …)

37
Q

What excel function finds the minimum number?

A

=MIN(number 1, [number 2], …)

38
Q

What excel function finds the maximum number?

A

=MAX(number 1, [number 2], …)

39
Q

What excel function finds the sum of all numbers?

A

=SUM(number 1, [number 2], …)

40
Q

What does the S stand for in VAR.S and STDEV.S?

A

S=Sample

41
Q

What excel function finds the correlation coefficient?

A

=CORREL(array 1, array 2)