S2016Q5 Flashcards
What is designing in the language of statistics?
Setting up a hypothesis or question and deciding how to collect data
What is describing in the language of statistics? (descriptive statistics)
Summarizing data with numbers and graph.
What is inferences in the language of statistics? (inferential statistics)
Decisions and predictions based on the data.
- Estimation
- Test
- Confidence intervals
The typical statistical model assumes what? (model = assumptions)
- Independence of observations
- The same underlying distribution for all observations
- Some sort of systematic structure
(but this is not always the case)
What is the significance level? (rule of thumb)
5 %.
What is random sampling and why is it important?
Making sure that each subject in the population has the same chance of being in the sample so that we make sure that the sample is a good reflection of the population.
What does inferential statistics refer to?
Methods of making decisions or predictions about a population, based on data obtained from a sample of that population.
What is the difference between a parameter and a statistic?
Parameter: a numerical summary of the population
Statistic: a numerical summary of the sample taken from the population
When is a variable categorical and when is it quantitative?
- Categorical: if each observation belongs to one of a set of categories such as “Yes” and “No”
- Ordered: “ordinal” (fx. exam grades)
- Unordered “nominal”: male/female, type of business, zip codes
- Quantitative: if observations take numerical values that represent different magnitudes of the variable (fx. age or annual income but NOT area code numbers).
What is unordered (nominal) data and what type?
Categorical: e.g.: Male/female, type of business, ZIP code etc.
What is ordered (ordinal) data and what type?
Categorical: e.g.: Grades, likert scales etc.
What is a good graph?
Check colors: www.colorbrewer2.org
Remember to: use different lines, colors, different plotting symbols.
Remember it might be printed black/white
What is a discrete variable and what type?
Numerical: Value in subset of natural numbers (typically integers)
E.g.: 0,1,2,3… (number of employees, number of companies etc.)
What is a continuous variable?
Numerical: May take any value in an interval
E.g.: income, sales etc.
When is a variable discrete and when is it continuous?
- Discrete: it has separate possible values such as the integers 0, 1, 2, …. for a variable expressed as “the number of…”. (number of companies in a region/employees in a company etc.
- Continuous: all possible values in an interval
What is the median?
The middle observation
E.g.: 1,1,1,2,2,2,3,3,4,5,6,7,7,8,8
Median = 3
When is it called modal category and when it is called mode?
Modal category and mode both refer to being the most frequent answer in a data set.
Modal category ⇒ the category with the highest frequency
Mode ⇒ the numerical value (quantitative) that occurs most frequently
What are the primary graphical display for summarizing a categorical variable?
- Pie chart
- Bar graph: the bar graph is usually preferred as it is easier to distinguish between two categories of approximately the same size
- When ordering by frequency as here, it is called a Pareto Chart (Vilfredo Pareto)

What are the primary graphical display for summarizing quantitative variables?
- Dot-plot: A dot plot shows a dot for each observation, placed just above the value on the number line for that observation (see picture). Can be useful for small data sets (<50 observations)
- Stem-and-leaf plot: Can be useful for small data sets (<50 observations)
- Histogram: The word is used for a graph with bars representing quantitative variables whereas bar graph is used for graphs with a categorical variable.
- Gives more flexibility in defining intervals and is better for big data sets (+50 observations)

What is the “mode” in a frequency table or histogram?
The highest point.
What does unimodal and bimodal refer to?
Whether the histogram or frequency table has a single mound or two distinct mounds.

What does symmetric and skewed shape refer to?
- Skewed to the left if the left is longer than the right
- The mean is smaller than the median

What is the “mean” of a distribution of a quantitative variable?
The sum of the observations divided by the number of observations.
(The average / The balance point of the distribution)

What is the median?
The median is the middle value of the observations when the observations are ordered from smallest to the largest.
(in case you have 20 observations, you will take observation (10+11)/2 as your median)









































































































