Econ 2740 Flashcards
Population
: The group of all items of interest to a statistics practitioner(Everyone we are interested learning about). Frequently very large or can be infinitely large.
a. Examples Include: All tv viewers, Canadians, students, and all human beings
b. Typically we do not observe the population because it is too difficult
Parameter
A descriptive measure of the population. In most applications of inferential statistics, the parameter represents the information we need.
a. Ex. Percent of Canadian voters who plan to vote for NDP
Sample:
A set of data drawn from the studied population (Smaller number of the population)
a. Reason for using a sample is because it is cheaper and easier to collect data
Statistic
A descriptive measure of a sample. Statistics are used to make inferences about parameters
a. Ex. Percent of the 1,200 voters polled who plan to vote for NDP
Confidence Level:
The proportion of times than an estimating procedure will lead to correct conclusions
Significance Level
Measures how frequently the conclusion will lead to false conclusions
Descriptive statistics
Just describe the sample, without worrying about the population. Includes graphical and numerical methods
Interval Data:
Also known as quantitative or numeric data. They are numbers that have meaning. Example, age, years of schooling, wage GDP, foul shot percentage and exchange rate.
Ordinal Data:
Numbers denote ordered categories and only the order matter. Ex. Highest degree completed 1 (none), 2 (elementary), 3 (high school), 4 (university).
Nominal Data:
Also known as Categorical or Qualitative. Numeric values just denote a name or category. They have no meaning as a number. Example, sex 0 (male), 1 (female) or postal code
Frequency
: Number of observations falling into a group or category
Relative Frequency
: Proportion of observations falling into a group or category
Cumulative Relative Frequency:
: Proportion of observations falling into a group and all previous groups;
• Applies only to ordered groups
• Applies to Ordinal, but not nominal data
Histograms
A graphical display of data using bars of different heights
Reverse Causality
When changes in the dependent variable (Y-Variable) cause changes in the independent variable (X-Variable)
• Put another way, the causation goes in the opposite direction as expected
• You see a relationship in the scatter diagram, but the interpretation is opposite to what you would think.
Bottom Line:
Due to indirect and reverse causation care is needed when interpreting relations between variables
Direction Causation
When Changes in the independent variable (X-Variable) cause changes in the dependent variable (Y-Variable).
Indirect Causation:
When changes in the X and Y-Variables are both caused by a third variable ( Say Z).
• You will observe a relationship in the scatter diagram even if X does not impact Y and Y does not impact X.
• Be careful interpreting such a relation ship
Modality
A unimodal histogram with a single peak, while a bimodal histogram is one with two peaks:
Measures of Variability
: Measures of central location fail to tell the whole story about distribution. How much are the observations spread out around the mean value.
Sample Standard Deviation:
- To obtain sample variance we squared distance of each observation from the mean
- Now we undue the squaring by taking the square root
- The standard deviation is simply the square root of the variance, thus:
Covariance (Generally Speaking):
- When two variables tend to move in the same direction (both increase or both decrease), the covariance will be a large positive number
- It is extremely rare that two variables always move in the same direction
- When two variables tend to move in opposite directions, the covariance is a large negative number
- When there is no particular pattern, the covariance is a small number
- However, it is often difficult to determine whether a particular covariance is large or small
Sample of Coefficient of correlation:
The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables
Sampling Errors
Differences between population and sample that occur because of the observations that happened to be picked from our sample
Nonsampling Errors
Differences between population and sample that occur due to a flaw in the sampling method
Stratified Random Sampling
- Divide the population into two (or more) mutually exclusive groups (stratas).
- Randomly sample from each strata
Cluster Sampling
Random sample of groups or clusters of observations.
• E.g. Draw townships, postal codes, or city blocks at random then survey the residents
- Classical Approach
Based on equally likely events
- Relative Frequency
Based on experimentation or historical data
- Subjective Approach
Based on (subjective) judgment
- Bayesian Approach
Based on combination of subjective assessment with relative frequency
Marginal Probabilities
Computed by adding across rows and down columns; that is they are calculated in the margins of the table
Conditional Probability
Used to determine how two events are related. That is, we can determine of one event given the occurrence of another related event.
Written as P(A|B)
Independence
• One of the objectives of calculating conditional probability is to determine whether two events are related
• In particular, we would like to know whether they are independent, that is, if the probability of one event is not affected by the occurrence of the other event.
-There are independent if:
P(A|B)=P(A) or P(A|B)=P(B)
Multiplication Rule
Used to calculate the joint probability of two events. It is based on the formula for conditional probability earlier defined
P(A|B)= [P(A and B)] / P(B)