Applied Quantitative Methods Flashcards by Magnus Kjær

Variable

Usually denoted by capital letters such as X or Y, is a
characteristic or measurement that can be determined for each
member of the population.

How well did you know this?

Not at all

Perfectly

Numerical variable

Take on numerical values.

How well did you know this?

Not at all

Perfectly

Continuous variable

We measure it. Distance, height, GDP in kr., value of cars sold in kr.

How well did you know this?

Not at all

Perfectly

Categorical variable

Known as qualitative data where the data is categorised (smoking vs non-smoking, vote (yes/no)) - the numbers in this type of data are purely for identification purposes (Ronaldo no.7, Christian Eriksen no. 10.)

How well did you know this?

Not at all

Perfectly

Population

Collection of persons, things or objects under study.

How well did you know this?

Not at all

Perfectly

Sampling

Select a subset (or portion) of the population, to gain information about population data.

How well did you know this?

Not at all

Perfectly

Sample

Resulting data from sampling a population.

How well did you know this?

Not at all

Perfectly

Statistic

Number that represents a property of the sample (e.g., sample mean, sample variance, etc.)

How well did you know this?

Not at all

Perfectly

Parameter

Numerical characteristic of the whole population (e.g.
population mean, population variance, etc.)

How well did you know this?

Not at all

Perfectly

Simple Random Sample

Chosen by a process that selects a sample of n objects from a population (N) in such a way that each member of the
population has the same probability of being selected.

How well did you know this?

Not at all

Perfectly

Sampling Distributions

The population parameter (e.g., mean µ or variance ‡2), is a fixed (but
unknown) number.
But each sample from a population, has a different value of the mean and
variance. If you pick many samples and calculate the mean (and variance) of each sample, then the sample means (and variances) become a variable, which
can be treated as a random variable with a probability distribution.

How well did you know this?

Not at all

Perfectly

Law of large numbers

States that given a random sample of size n from a population
N, the sample mean X¯ will approach the population mean µx as the sample size n
becomes large

How well did you know this?

Not at all

Perfectly

Central Limit Theorem

States that the mean of a random sample, drawn from a population with any probability distribution, will be approximately: normally distributed given a large-enough sample size

How well did you know this?

Not at all

Perfectly

Acceptance Interval

Is an interval where the sample mean has a high probability of occurring (given that we know the population mean and variance) If the sample mean falls within that specified interval, then we can accept
the conclusion that the random sample came from the population with the known mean and variance.

How well did you know this?

Not at all

Perfectly

Distribution of sample proportion

Assume, we are dealing with a qualitative or categorical variable
For example, we investigate a characteristic (e.g. smoker/non-smoker) and note 1 if an individual has this characteristic and 0 otherwise. The (unknown) proportion of ones in the population is denoted P. We have a sample of 0 and 1 values.

How well did you know this?

Not at all

Perfectly

Chi-Square Distribution

Study These Flashcards

If we can assume that the underlying population distribution is
normal, then it can be shown that the sample variance and the
population variance are related through a probability distribution.

Student’s t Distribution

Study These Flashcards

In this case, σ is replaced by the sample standard deviation (s):
t = X¯ − µ/ s/ √n
This random variable follows a member of a family of distributions called.

Sample Size for Population Proportion

Study These Flashcards

Whatever the outcome, pˆ(1 − pˆ) cannot be bigger than 0.25 (i.e, when the
sample proportion is 0.5)
Thus, the largest possible value for the margin of error, ME, is given by
the following:
n =
0.25(zα/2)2/(ME)2

Null hypothesis and alternative hypothesis

Study These Flashcards

We start with a hypothesis about the parameter - called the null hypothesis
- that hold unless there is strong evidence against this null hypothesis.
If we reject the null hypothesis, then the second hypothesis, named the
alternative hypothesis, will be accepted.

P-value

Study These Flashcards

Getting p-value is the most popular procedure for considering the test of the null hypothesis in statistics
The p-value is the probability of obtaining a value of the test statistic as extreme
as or more extreme than the actual value obtained when the null hypothesis is true
p-value is the smallest significance level at which a null hypothesis can be rejected, given the observed sample statistic.

Significance level

Study These Flashcards

In practice it can be necessary to decide that at what p-value we are going to
reject H0
The decision can be made if we have decided on a so-called α-level, known
as the significance level of the test
We reject H0, if p-value is less than or equal to α
We typically use 5% or 1% significance levels.

Tests of the difference between two population proportions

Study These Flashcards

We consider the situation, where we have two qualitative samples and we
investigate whether a given property is present or not:
The proportion of population 1 has the property Px , which is estimated by pˆx
based on a sample of size nx
The proportion of population 2 has the property Py , which is estimated by pˆy
based on a sample of size ny
We are interested in the dierence py ≠ px , which is estimated by d = ˆpy ≠ pˆx.

Regression

Study These Flashcards

Regressions are typically use to test whether two or more variables are
statistically related
In basic statistics, to explore the relationship between two variables.

Cross-sectional data

Study These Flashcards

We can use numerical variables and also qualitative (or
categorical) variables in regression models

A regression model

Studies the relationship between two or more variables.

Bivariate model

Studies the relationship between only two variables, e.g., x and y.

Multiple regression model

Studies the relationship between more than two variables.

Quadratic functions

Are also used quite often in applied economics to capture decreasing or increasing marginal effects.

Maximum Likelihood Estimation

The basic idea of Maximum Likelihood Estimation is: The data we see comes from some model We know the structure of the model - not the parameters The ML principle: ▶ From all the possible values that the parameters can take, choose the values that makes the observed data most likely (probable). ▶ These are the Maximum Likelihood Estimates (MLE) of the parameters.

Print in Python

Print() function is used to output information to the console or terminal.

Head in Python

The head() method is used to view the first few rows of a DataFrame or Series.

Describe in Python

Provides an overview of the dataset's numeric and/or categorical features. Helps detect outliers and understand the spread of the data. Acts as a quick summary for exploratory data analysis (EDA).

Legend in Python

Is used to provide labels for the elements in a plot.

Show in Python

Ensures that the visualization appears as expected. In some environments, such as scripts or command-line interfaces, plt.show() is necessary to display the plot.

Value_counts in Python

Counts the occurrences of unique values in a Series or DataFrame column. It is a powerful tool for exploratory data analysis (EDA) when working with categorical or numerical data.

Plt.tight_layout in Python

Automatically adjusts subplot parameters (e.g., spacing, padding). Prevents overlapping of axes titles, labels, or legends. Optimizes the use of available space in the figure.

Plt.figure in Python

Create a blank figure to hold subplots or plots. Customize the size, resolution, and properties of the figure. Allow for the creation of multiple figures in the same script.

Grid in Python

A grid refers to the lines that divide the plot into sections, helping to make data easier to interpret.

Applied Quantitative Methods Flashcards

(38 cards)