midterm review Flashcards

1
Q

Define Simple Linear Regression

A

A dependent variable (ex. Y) is predicted from one independent variable (ex. X) based on a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Regression

A

The relation/dependency of & between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SLR equation

A

y = a + βx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Residuals

A

The differences between the real data & the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Goal with SSR

A

To find THE line that minimizes the SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Stock “Beta”

A

Beta is a risk measure of stock investment, calculated as the coefficient of the market return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When |β| > 1…

A

Stock is riskier & its returns have greater volatility (change unpredictable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When |β| < 1…

A

Stock is less risky & its returns swing less than market returns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Monthly return formula

A

Monthly return = (Current month-end price - Last month-end price)/Last month-end price

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Basic set-up for lm() function

A

regression_analysis_result_name <- lm(Y ~ X, data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate SD manually

A
  1. (y-µ)^2
  2. square all results from step 1 + divide by count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to manually calculate Q1 & Q3

A
  1. split dataset into 2 halves
  2. find the median of each half
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find IQR

A

Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to calculate Lower & Upper Whisker

A

LW = Q1 - 1.5IQR
UW = Q3 + 1.5
IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate Extreme Lower & Upper Whisker

A

eLW = Q1 - 3IQR
eUW = Q3 + 3
IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Difference in using summarize() & mutate()

A

must use <- when mutating to save the new variable into a dataset

17
Q

When do you $?

A

When you are referring to a specific dataset for a variable

18
Q

Define Probability Density Curve

A

Density curve visualizes the probability distribution → how probabilities are distributed over the values of a random variable

19
Q

Advantages of Probability Density Curve

A
  • A more refined representation of data
  • Facilitate the probability calculation (even if data is absent)
20
Q

Describe Skewed distribution

A

Mean > Median → Right-skewed
Mean < Median → Left-skewed

21
Q

Density Curve Properties

A
  • A density curve must lie on or above the horizontal axis
  • The area under the density curve always equal to 1 or 100%
    > Cannot be on y-axis or be below x-axis
22
Q

Relationship between Probability Density & Probability

A

Probability Density ≠ Probability
- For a continuous variable, discussing its probability of being a specific value is not meaningful because it always equals to zero

23
Q

Meaning of PD & Probability

A

Probability = area
Likelihood = height = density = straight line

24
Q

Representation of Normal Distribution

A

If a random variable follows a normal distribution, it is presented as:
X ~ N (µ, σ)
Mean → center of the curve
Stdev → wideness of the curve

25
Q

How to find z-score in standard normal table

A
  1. Find the row that matches the (signed) first 2 digits of the z-score
  2. Find the column that matches the (signed) third digit of the z-score
  3. Find the probability value in the cell where the row & column meet
26
Q

Define Standardization

A

Transforming a general normal distribution (ex. N ~ X (µ, σ)) to a standard normal distribution (ex. Z ~ N (0, 1))

27
Q

If a dataset follows a (general) standard normal distribution, then

A

68% of the data lies within one standard deviation of the mean
95% of the data lies within two standard deviations of the mean
99.7% of the data lies within three standard deviations of the mean

28
Q

FInd Pr (-1 < Z < 1) using R code

A

pnorm(1, mean=0, sd=1) - pnorm(-1, mean=0, sd=1)

29
Q

Checks for Normality

A
  1. Histogram & Density Curve → bell-shaped & symmetric around the mean
  2. Empirical Rule intervals → 68%, 95%, 99.7%
  3. IQR-to-SD ratio = 1.3
  4. Quantile-Quantile (Q-Q) Normality Plot
30
Q

2 questions related to Population vs Sample

A
  1. Can we make accurate inferences about the population based on a sample of data? (Today’s class)
  2. How confident can we be in these inferences? (After the reading week)
31
Q

Define Parameter

A

A numerical value that describes a specific characteristic of an entire population

32
Q

Define Sample Statistic

A

A sample of data

33
Q

What is the meaning behind Statistical Inferences

A

Want to make informed inferences about the unknown population parameter based on the sample statistics

34
Q

Define normal distribution

A

A perfectly normal distribution would appear as a symmetric, bell-shaped curve centered around the mean but not smooth

35
Q

What is the SD of the sample means

A

Standard error
- ~ 10 times smaller than the population SD

36
Q

Define Central Limit Theorem

A

The variance between sample mean & actual mean decreases the more samples that are generated
- As the # of samples increases, the distribution starts to indicate a normal distribution with a smooth bell-shaped curve, more closely adhering to the normal distribution’s characteristics