4: Understanding And Comparing Distributions Flashcards

1
Q

What is the purpose of the Williams College Center for Environmental Studies (CES)?

A

To monitor forest resources and conditions over the long term.

The CES manages the Hopkins Memorial Forest, which is a 2500-acre preserve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What variables are measured in the Hopkins Memorial Forest?

A

Wind speed (minimum, maximum, and average).

Wind speed is recorded in miles per hour by remote sensors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What causes wind?

A

Air flows from areas of high pressure to areas of low pressure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is often associated with high winds?

A

Low pressure and storms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can wind speeds vary?

A

They can vary greatly during a day and from day to day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can modeling wind speed patterns help us understand?

A

Insights about weather that we may not have known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of data is examined when comparing two variables, one quantitative and the other categorical?

A

Wind speed averages grouped by time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when we partition time into different groups?

A

It increases flexibility and reveals different patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the different time groupings mentioned for analyzing wind speed?

A
  • Entire year
  • Seasons
  • Months
  • Days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of distribution does the histogram of daily Average Wind Speed for 2011 suggest?

A

Unimodal and skewed to the right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the box-plot below the histogram indicate?

A

Possible outliers that may warrant attention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the maximum Average Wind Speed recorded in the 5-number summary?

A

6.73 mph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the median Average Wind Speed recorded in the 5-number summary?

A

1.12 mph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: The third quartile (Q3) of Average Wind Speed is _______.

A

2.28 mph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fill in the blank: The first quartile (Q1) of Average Wind Speed is _______.

A

0.46 mph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the minimum Average Wind Speed recorded in the 5-number summary?

A

0.00 mph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the average daily wind speed in mph?

A

1.12 mph

This value represents the overall average across the observed days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the range of average wind speed on half of the days?

A

0.46 to 2.28 mph

This range is defined by the quartiles framing the middle half of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the maximum recorded wind speed mentioned?

A

6.73 mph

This value indicates a possible outlier in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How is the year divided for the analysis of wind speed?

A

Summer (April through September) and Winter (October through March)

This division allows for a seasonal comparison of wind speeds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the mean wind speed during summer?

A

0.85 mph

This is the average wind speed calculated for the summer months.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the mean wind speed during winter?

A

2.17 mph

This average indicates that winter typically has higher wind speeds than summer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the distribution of wind speeds in summer look like?

A

Unimodal and skewed to the right

This suggests that most days have low wind speeds with few high values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the distribution of wind speeds in winter look like?

A

Less strongly skewed and more nearly uniform

This indicates a more consistent range of wind speeds during winter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the interquartile range (IQR) for winter wind speeds?
1.91 mph ## Footnote This value represents the variability in wind speeds during the winter months.
26
What is the interquartile range (IQR) for summer wind speeds?
0.79 mph ## Footnote This indicates that summer wind speeds are less variable compared to winter.
27
True or False: Days with wind speeds above 3 mph are unusual in winter.
False ## Footnote In winter, days with wind speeds above 3 mph are not unusual, indicating higher wind variability.
28
Fill in the blank: The typical wind speed during summer is less than ______ mph.
1 mph ## Footnote This highlights the generally calm conditions during summer months.
29
Are some months windier than others?
Yes, some months are windier than others.
30
What is a useful method to compare wind speed distributions across months?
Boxplots.
31
What do boxplots help to visualize?
Overall summary information while hiding details.
32
What can be compared by placing boxplots side by side?
Medians, interquartile ranges (IQRs), central 50% of the data, and overall range.
33
What do boxplots display separately to aid comparisons?
Outliers.
34
In which months do wind speeds tend to decrease?
Summer months.
35
Which months have the strongest and most variable winds?
November through April.
36
Fill in the blank: Boxplots provide an ideal balance of ______ and simplicity.
information
37
True or False: Histograms are the best way to compare 12 months of wind speed data.
False.
38
What does IQR stand for in the context of boxplots?
Interquartile Range.
39
What kind of plots can be difficult to interpret when comparing multiple months?
Histograms.
40
What does it indicate if boxplots are arranged in some ordinal order?
General patterns in both the centers and spreads.
41
What type of data visualization can hide outliers?
Boxplots.
42
What is the significance of the central 50% of the data in boxplots?
It shows the interquartile range (IQR).
43
List the months where wind speeds are most variable.
* November * December * January * February * March * April
44
What is the primary purpose of using boxplots in wind speed analysis?
To compare distributions across different months.
45
What is the purpose of looking into outliers in data?
To correct errors in the data ## Footnote Errors occur remarkably often in data sets.
46
What are some common reasons for errors in data sets?
Errors can occur due to: * Misplaced decimal points * Transposed digits * Omitted values
47
What is an outlier?
A value that doesn't fit with the rest of the data ## Footnote Outliers may highlight exceptional cases or illuminate patterns.
48
Why might a student claiming to be 170 inches tall be considered an outlier?
Because 170 inches is about 14 feet, which is unrealistic for a human height ## Footnote This example illustrates how outliers can be obviously wrong.
49
What does a box plot rule indicate regarding outliers?
It provides a rough guideline to highlight unusual cases ## Footnote The exact threshold for special treatment of outliers is subjective.
50
In what scenario might a value be considered an outlier?
When it lies above the limits suggested by the box plot rule ## Footnote Box plots help visualize potential outliers.
51
What should be done with outliers?
They should always deserve attention ## Footnote Outliers may be the most important values in the data set.
52
Fill in the blank: An outlier may arise for many reasons and can be the most important _______ in the data set.
[key learning term]
53
True or False: All outliers are errors in the data.
False ## Footnote Some outliers can illuminate important patterns.
54
What is a common method to identify potential outliers?
Using box plots ## Footnote Box plots visually represent data and highlight outliers.
55
What does it mean if a value is nominated as a potential outlier in a box plot?
It stands out from the rest of the data ## Footnote This indicates that further investigation may be needed.
56
What is an outlier in the context of data distribution?
An outlier is a data point that lies significantly outside the range of the rest of the data set.
57
Why should outliers be studied?
Many outliers are not wrong; they're just different and can provide valuable insights.
58
What was the significance of the windiest day in February?
It was an outlier for both February and the entire year, leading to four days of subzero temperatures.
59
What event occurred on June 2, 2011, related to an outlier in the wind data?
A rare tornado struck Western Massachusetts.
60
What was the impact of Hurricane Irene on the wind data in August?
Hurricane Irene was an extreme outlier, with the eye passing over the Hopkins Forest.
61
What is the average wind speed recorded during Hurricane Irene?
2.53 mph.
62
What should you do if you find an outlier in your data analysis?
You should report summaries and analyses with and without the outlier for transparency.
63
True or False: You should ignore outliers in your data analysis.
False.
64
Fill in the blank: If you want to exclude an outlier, you must _______.
[announce your decision and justify it]
65
What are the two things you should never do with outliers?
* Leave an outlier in place without comment * Omit an outlier from the analysis without comment
66
Why is a histogram often better than a boxplot for analyzing outliers?
A histogram provides more detail about how the outlier fits in with the rest of the data.
67
What is a timeplot?
A display of values against time, also known as a time series plot.
68
How are speeds computed in the context of wind speed data?
As the midrange - the average of the highest and lowest speeds seen during the day.
69
What does the term 'average' imply in data analysis?
It can mean different things depending on the context.
70
What is the purpose of smoothing in timeplots?
To provide a clearer view of underlying trends by reducing point-to-point variation.
71
What are some methods to smooth timeplots?
Using computer algorithms or manual methods like sketching a smooth trace.
72
When do wind speeds typically become more variable and stronger?
During the late fall and winter months.
73
What pattern is observed in wind speeds during summer?
Wind speeds are relatively mild, starting around day 150 (beginning of June).
74
What is the significance of plotting daily averages?
It helps to identify patterns over time without arbitrary divisions between months.
75
Fill in the blank: A timeplot reflects the pattern seen when plotting the wind speeds by _______.
month.
76
True or False: Timeplots are useful for looking for patterns in data measured over time.
True.
77
What can you observe by looking at wind speed values day by day?
You can identify patterns and variations without grouping data into arbitrary time periods.
78
What does the timeplot version of center analysis involve?
Thinking about how the values vary around the trend.
79
What is the benefit of using a computer for smoothing timeplots?
It can provide more accurate and consistent smoothing than manual methods.
80
What does the term 'daily average' refer to in the context of wind speed data?
The average wind speeds reported for each day.
81
What is indicated by the calm period observed in the timeplot?
Periods of relatively stable wind speeds without significant fluctuations.
82
What method is used to create a smooth trace in statistics?
Lowess ## Footnote Lowess stands for locally weighted scatterplot smoothing, a technique used to create a smooth curve through a scatterplot.
83
What is the purpose of a smooth trace in data visualization?
To highlight long-term patterns and help visualize them through local variations.
84
In which types of publications are timeplots often drawn with all points connected?
Financial publications.
85
What does a smooth trace help you see in a time series data?
Patterns.
86
Fill in the blank: The daily average wind speed values can be visualized with a _______.
Smooth trace.
87
True or False: The daily fluctuations in wind speed are small enough that they can be easily connected in a plot.
False.
88
What is the significance of connecting points in a timeplot?
It can illustrate the overall trend but may obscure daily fluctuations.
89
What is shown on the x-axis of the daily average wind speed plot?
Day of Year.
90
What is shown on the y-axis of the daily average wind speed plot?
Average Wind Speed (mph).
91
What is a common challenge when summarizing skewed data?
It can be hard to summarize them simply with a centre and spread. ## Footnote Skewed data often leads to difficulties in determining the central tendency and variability.
92
What statistical technique can improve the symmetry of data?
Re-expressing the data. ## Footnote This technique can help in better analysis and interpretation of skewed distributions.
93
In what year did large companies' chief executive officers (CEOs) make, on average, about 42 times what workers earned?
1980. ## Footnote This statistic reflects the growing income disparity between CEOs and average workers.
94
What should one resist when interpreting statistical models?
The temptation to think beyond the data. ## Footnote It is important to base conclusions on the data rather than assumptions or trends.
95
True or False: No stock has ever increased in value indefinitely.
True. ## Footnote This statement highlights the inherent risk and volatility in stock investments.
96
What is a key factor that makes predicting economic and social trends challenging?
Their complex nature and the influence of various unpredictable factors. ## Footnote Economic indicators like unemployment rates are influenced by numerous social and psychological factors.
97
Fill in the blank: Statistical models often tempt users to think __________ the data.
beyond. ## Footnote This can lead to misinterpretations and overreaching conclusions.
98
What is a common practice for predicting future weather patterns?
Using recent past values with less weight on older values. ## Footnote This method helps in creating more accurate forecasts based on trends.
99
What is a common misconception about stock price trends?
That a rising stock will continue to go up indefinitely. ## Footnote This misconception can lead to poor investment decisions.
100
What aspect of data is often difficult to assess when extreme values are present?
Whether the most extreme values are outliers or just part of the data distribution. ## Footnote This assessment is crucial for accurate data analysis.
101
What was the multiple of CEO compensation to the average worker by 2008?
344
102
What type of graph is used to represent the distribution of CEO compensation in 2010?
Histogram and boxplot
103
What percentage of CEOs received compensation between $0 and $2,500,000?
About a quarter
104
What is the range of CEO salaries according to the histogram?
$35,000,000 to $150,000,000
105
The distribution of CEO compensation is skewed to which direction?
Right
106
What are the mean and median total compensation values for CEOs?
Mean: $8,035,770, Median: $4,780,000
107
What is one method to transform skewed data into a more symmetric distribution?
Taking the square root or logarithm of each pay value
108
Fill in the blank: The histogram for CEO compensation leaves much of the area blank because salaries are spread from about $_______ to $_______
$35,000,000 to $150,000,000
109
True or False: The histogram indicates that the majority of CEOs receive very high compensation.
False
110
What does the boxplot indicate about the compensation of some CEOs?
Some received extraordinarily high pay
111
What is cotinine?
Cotinine is a metabolite of nicotine found in the blood, indicating exposure to tobacco smoke. ## Footnote It provides a direct measurement of nicotine exposure.
112
What effect does re-expressing data have on skewed distributions?
Re-expressing data can help make skewed data more symmetric. ## Footnote Common methods include applying logarithms or square roots.
113
What is the purpose of taking logarithms of data?
Taking logarithms can improve the symmetry of distributions and alleviate problems in comparing groups. ## Footnote Logs are especially useful for data that cannot be negative.
114
In what scenarios might you use log transformation?
Log transformation is used for variables that are skewed to the right or have very different spreads. ## Footnote It is beneficial when comparing groups with differing variances.
115
What does a boxplot indicate about outliers?
A boxplot helps identify outliers based on the distribution of data points in relation to the interquartile range. ## Footnote Outliers are values that significantly differ from the majority of the data.
116
How can the logarithm of a salary be interpreted?
The logarithm of a salary indicates its scale, with each increment representing a tenfold increase in value. ## Footnote For example, a log10 of 5 corresponds to a salary around $100,000.
117
True or False: Re-expressing data is only useful for normally distributed datasets.
False. ## Footnote Re-expressing data is particularly useful for skewed datasets.
118
Fill in the blank: The logarithm of a number is roughly one less than the number of _______ needed to write the number.
digits
119
What is the main benefit of re-expressing cotinine data from smokers and nonsmokers?
It allows for better comparison of cotinine levels across different groups. ## Footnote This is crucial for understanding the effects of tobacco exposure.
120
What happens to the distribution of CEO compensation when logarithms are applied?
The distribution becomes more symmetric and easier to analyze for outliers. ## Footnote This enhances clarity in understanding compensation differences.
121
What is the significance of the median log compensation value?
The median log compensation value provides a typical value for CEO salaries, allowing for comparative analysis. ## Footnote It reflects the central tendency in a transformed scale.
122
What is the purpose of the boxplot icon?
To indicate and create a boxplot of the data
123
What should you do to set up STATPLOTs Plot1?
Make a boxplot of the L1 data
124
What happens when you use ZoomStat with both plots turned on?
The display shows the boxplots in parallel
125
What should you set up Plot2 to display?
The L2 data
126
To compare groups with boxplots, what is required?
Enter the data in lists
127
What is the first step to make a boxplot?
Set up a STAT PLOT using the boxplot icon
128
What is the function of Zoom Stat in this context?
To display the boxplot for L1
129
What can you use TRACE to see?
The statistics in the 5-number summary
130
What must you specify for displaying any outliers?
Xlist:L1 and Freq:1
131
What calculator model is referenced in the text?
TI-83/84 PLUS