Exam 1 Flashcards

1
Q

Cases

A

the objects described by a set of data. These can be companies, customers, etc. Basically anything that is described by the set of data collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

A special characteristic of a case. Such as age, sex, gender, etc. This is a specific attribute of the cases/individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical variables

A

variables which describe the characteristic of the individual or category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative variables

A

variables which takes numerical values on which arithmetic operations make sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data set

A

the collection of raw statistics and information collected by a research study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Graphs for categorial variables examples

A

Bar graphs & pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Graphs for quantitative variables examples

A

Stemplots and Histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Individual

A

Object described by data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

We have a data set where the cases are college students. One of the variables in the data set is “gender.” The values of gender are 1 if the student is male and 2 if the student is female. What type of variable is gender?

A

Categorical(because it doesn’t make sense to do any arithmetic with group 1 and group 2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Units of measurement are an important part of the description of what type of variables?

A

Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The first day of class, the professor collects information on each student to make a data set that will be analyzed throughout the semester. The information asked includes hometown, GPA, number of classes taking, number of siblings, and favorite subject. How many quantitative variables are in this data set?

A

three (number of classes, GPA, number of siblings. All of these can be averaged and/or put through other arithmetic operations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Consider the following data, which describe the amount of time in minutes that students spend studying for a quiz:

10, 11, 11, 12, 12, 14, 15, 18, 19, 20, 22, 24, 39, 40, 41, 44, 46, 50, 52, 52, 53, 55, 70

What numbers make up the leaf of the first stem?

A

0,1,1,2,2,4,5,8,9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

classes

A

intervals on histograms of equal width.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Right skewed

A

When the right side of the distribution is “longer” aka not as large as the left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a time series?

A

A data set which tracks a sample overtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When examining a distribution of a quantitative variable, which of the following features do we look for?

A
  • Overall shape, center, and spread
  • Symmetry or skewness
  • Deviations from overall patterns, such as outliers
  • The number of peaks or modes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie charts are useful for…

A

Seeing which part of the whole a group forms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the interquartile range for a set of data?

A

The interquartile range is the difference between the 75th and 25th percentiles of data.

IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A reporter wishes to portray baseball players as overpaid. Which measure of center should he report as the average salary of major league players?

A

The mean, as higher pays will pull the mean in their direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If the median is much larger than the mean, this is indicative of what type of stemplot?

A

The stemplot of the data would be skewed left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If the skew is symmetrical, what can be said about the median and mean?

A

They are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the five number summary consist of for a set of data?

A
  • Minimum
  • Q1
  • Median
  • Q3
  • Maximum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define Histogram

A

a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A boxplot is

A

A graph of the five number summary.

Box plot’s central box spans Q1 and Q3 with a line in the box indicating the median and lines extend out from the box to the smallest and largest observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the 1.5-IQR formula?

A

Outlier = 1.5 * IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the five number summary of this list of numbers?

175 199 204 234 259 275 299 304 317 345 355 384 549

A

Minimum: 175

Q1:219

Median: 299

Q3: 350

Maximum: 549

27
Q

Density curves do not adeuquately reveal…

A

Outliers.

28
Q

In a skewed density curve, what is pulled away in the direction of the long tail?

A

Mean

29
Q

The time to complete an exam is approximately Normal with a mean of 70 minutes and a standard deviation of 10 minutes. Using the 68-95-99.7 rule, what percent of students will complete the exam in under an hour?

A

16%. This is because 68% will fall within one standard deviation of the time, the rest, 32% will fall outside of the 68% range (one STDV range). Since 32% will be split evenly above and below the 68% range, only 16% will fall in the “under and hour” group.

30
Q

What information can we get from Z-scores?

A

How many standard deviations an observation in a set falls from the mean and in which direction.

31
Q

Formula for z-score?

A

z = ((obervation value of set)-(mean of set))/(standard deviation)

32
Q

A normal distribution is…

A

Symmetric about the mean

33
Q

A data set lists apartments available for students to rent. Information provided includes the monthly rent per person, whether cable is included free of charge, whether or not pets are allowed, the number of bedrooms, the number of bathrooms, and the distance to the campus. Describe the cases in the data set, give the number of variables, and specify whether each variable is categorical or quantitative.

A

Cases = Apartments for students to rent

Variables: monthly rent per person(quantitative), cable (categorical), pets(categorical), number of bedrroms (quantitative - categorical), number of bathrooms(quantitative - categorical), distance to campus (quantitative - categorical).

34
Q

This is the term used to describe how the value of a variable changes from case to case.

A

Distribution.

35
Q

Exploratory data anaylsis is…

A

The “first impressions” of a data set. Just describing the general features of what is seen.

36
Q

The processes of using data to predict aspects of the future is known as…

A

Predictive analytics

37
Q

Bar graphs and pie charts are graphical representations for this type of variable.

A

Categorical

38
Q

How are the percent and proportion related?

A

Percent = proportion * 100

39
Q

What are the types of variables and their values?

A

Reasons are the categorical variables and their values are count.

40
Q

Pie charts naturally use what type of value?

A

Percents

41
Q

Stemplots and histograms are used for…

A

Quantitative variables

42
Q

What is the process of separating each stem into 0-4 and 5-9?

A

Splitting stems

43
Q

The processes of removing the last digit or digits before making a stemplot is known as…

A

Trimming

44
Q

This type of graphical display breaks the values of a variable into classes and only show the percentage or count that fall into each class.

A

Histogram

45
Q

Between stemplots and histograms, which is prefered for small data sets and why?

A

Stemplots, because they show the actual value of the data of the observations.

46
Q

The Minimum and Quartile 1 values make up what “tail”?

A

Left tail.

47
Q

The maximum and quartile 3 values make up what tail?

A

Right tail.

48
Q

When does s = 0?

A

when all observations have the same value/ no spread.

49
Q

Is the standard deviation resistant or not- resistant?

A

Not resistant. depends upon the mean.

50
Q

x-bar and s are associated with what type of data set size?

A

Sample

51
Q

What are µ and σ associated with (population or sample)?

A

Population

52
Q

If standard deviation of a density curve changes, what will change graphically?

A

The spread

53
Q

What is the abbreviation for normal distribution’s mean and standard deviation?

A

N(µ, σ)

54
Q

Variables must be measured from the same ____ for a relationship to be measured.

A

Cases.

55
Q

To evaluate or check something against a standard is called:

A

Benchmarking

56
Q

The strength of a relationship between two quantitative variables is measured with…

A

correlation r

57
Q

The , is the fraction of the variation in values of y that is explained by the least-squares regression of y on x.

A

square of the correlation, r2

58
Q

R^2 formula

A

(var(mean) - var(line))/(var(mean))

59
Q

“There is 81% less variation around the line than around the mean.”

A

This is saying that 81% of the association is accounted for by the relationship between the x and y variables.

60
Q

The line that makes the sum of the squares of the vertical distance of the data points as small as possible.

A

Least Squares Regression Line

61
Q

Correlations based on average tend to be _____ than correlations based on individuals.

A

Higher than

62
Q

Interest rates for home mortgages have, in general, declined during recent months. With the apparent favorable influence for new-home building, there seems to be a clear relationship between 𝑥=the prevailing mortgage interest rate and 𝑦= the number of new houses being built per month in a Midwestern city over a period of 18 months. A scatterplot of the data collected shows that the linear model is appropriate. The equation of the least-squares regression line is

Number of new houses =672.89−30.65 × Interest rate and 𝑟2=0.49

Which of the following descriptions best represents the value of the slope?

A

When the interest rate increases by 1%, the number of new houses being built is expected to drop by 30.65.

63
Q
A