DSE1101 Flashcards

Question

What is the formula for variance?

Answer 1

less computatoinal power, get rid of negative value

Answer 2

1.5 x IQR away from lower and upper quartile

Answer 3

outliers are values more than 1.5 times the IQR from the quartiles — either below Q1 - 1.5IQR, or above Q3 + 1.5IQR.

Answer 4

more than 1.5 times the IQR from the quartiles — either below Q1 - 1.5IQR, or above Q3 + 1.5IQR.

Answer 5

Median and IQR

Answer 6

natural log transformation

Answer 7

sample space

Answer 8

the uncertainty of a random process.

Answer 9

mutually exclusive and collectively exhaustive results of a random process.

Answer 10

collection of one or more outcomes. It is a subset of the sample space.

Answer 11

lists all possible outcomes and the probabilities with which each of them occurs.

Answer 12

"probability that a variable is less than or equal to a particular value. P(X<=2)"

Answer 13

cannot happen at the same time

Answer 14

occurrence of B provides no information about A.

Answer 15

P (B) × P (A|B)

Answer 16

Dependent (need to calculate using the given that…)

Answer 17

"P (X = x, Y = y) eg: P (Rain, Long commute) = P (X = 0, Y = 0) = 0.15"

Answer 18

"numeric quantity whose value depends on the outcome of a random process. Smaller letters denote the values of variable"

Answer 19

"DISCRETE: takes integer values Continuous: takes real decimal values"

Answer 20

extent to which 2 variables move in the same direction

Answer 21

covariance between two variables divided by the product of their standard deviations.

Answer 22

"- for discrete variables - binary, with only 2 possible outcomes (0 or 1)"

Answer 23

"X ∼ Bernoulli(p) p is for prob that value is 1"

Answer 24

N (µ, σ2).

Answer 25

= true value of population parameter - point estimate

Answer 26

the systematic tendency to over or under-estimate the true population parameter.

Answer 27

how much an estimate will tend to vary from one sample to the next.

Answer 28

a estimator of population MEAN

Answer 29

sample mean, y bar is a random variable

Answer 30

fixed feature of a particular population - usually unknown in real life

Answer 31

quantity that vary from one sample to another - easy to compute, as it is statistic of sample from simple random sampling

Answer 32

Asymptotic distribution (use approx on asmple) “Tending to a distribution”

Answer 33

Law of large numbers central limit theorem

Answer 34

sample mean approaches population mean as the sample size increases

Answer 35

using sample mean and sample variance to approximate distribution of sample mean

Answer 36

When n is large, the sampling distribution of Y¯ is approximately normal, regardless of the distribution of the underlying population. sample mean approx normally distributed with mean miu and variance (sigma^2)/n random sample size=n

Answer 37

student t distribution with n-1 degrees of freedom tails are higher than normal distribution variance is s^2/n

Answer 38

sigma^2 = p(1-p) (assuming the coin is fair) = 0.25 By clt, sample mean is approx normally distributed with : var(p hat)= sigma^2 / n = 0.0025 2 tail test

Answer 39

plausible range of values for the population parameter.

Answer 40

1.96 +/- Standard error Suppose we take many samples and build a confidence interval from each sample, then about 95% of these intervals would contain the true population parameter

Answer 41

standard deviation

Answer 42

width of CI

Answer 43

supervised learning

Answer 44

continuous dependent

Answer 45

Yes (credit card default)

Answer 46

estimate, a predicted value

Answer 47

Y = β0 + β1X + ϵ

Answer 48

residual term/ erorr term DIFFERENCE BETWEEN THE REGRESSION LINE AND THE ACTUAL OBSERVED DATA

Answer 49

= yi − yˆi = yi − (β0 + β1xi) = vertical distance between each point to purported line

Answer 50

SUM( residuals) for all observations ALSO CALLED LEAST SQUARES the variance in Y that is left unexplained after fitting the regression model.

Answer 51

RSS 1. sum all residuals , with variables b0 and b1 etc. 2. Take the derivative wrt b0 and b1

Answer 52

(x bar, y bar) b0 = y hat - b1(x bar) sub into eqn y= b0 +b1 x y bar= y bar - b1 x hat + b1 x hat b1 x hat CANCEL OFF!!!!

Answer 53

Minimises the square deviation to the proposed line ( least squares fit for the regression line)

Answer 54

If there is 0 of x, then ON AVERAGE, able to have y amount

Answer 55

change of Y when X increases/decreases by one unit

Answer 56

estimate of the standard deviation of the residual terms measures the lack of fit of a model to the data

Answer 57

N-2 (scale down)

Answer 58

total variance in Y can be explained by model(RSS) + cannot be explained

Answer 59

measures the goodness of fit variance in y that can be explained (larger the R^2, the bigger the goodness of fit) Formula: (TSS- RSS)/ TSS

Answer 60

how close the estimatoed b0 and b1 hat are to the true values of b0 and b1

Answer 61

repeated sampling, and see what values you get for b0 and b1

Answer 62

T test with n-2 degree of freedom, where n is sample size(cause estimate b0 and b1) t= (b1-0 )/ se(b1 hat)

Answer 63

1. Relationship between X and Y should be linear 2. Residual nearly normal 3. Residual shave constant variability (homoscedaticity)

Answer 64

Residuals vs Fitted plot RED LINE SHOULD BE HORIZONTAL

Answer 65

Normal Q-Q plot points should be roughly along straight diagonal line

Answer 66

(ei -e hat )/ SE(e)

Answer 67

Scale-Location plot (YOU WANT OT HAVE NO PATTERN IN RESIDUAL) red line is roughly horizontal

Answer 68

Residual vs leverage plot check for outlyying vales at upper-right or lower right If they fall outside of cook distance, then it is influential(should remove points)

Answer 69

transforming variables(scaling) seeking additional variables to explain Y Using more advanced methods

Answer 70

read.csv("file", head=True)

Answer 71

lm1= lm(y var~ x var, data= Advertising)

Answer 72

summary(lm1)$coefficients

Answer 73

when |t| for b1 greater than 1.96 There is relationship between variables

Answer 74

confit(lm1). By default 95%

Answer 75

confit(lm1, level=0.90)

Answer 76

data$column name

Answer 77

1sd: 68% 2sd: 95 3sd: 99.7

DSE1101 Flashcards

(104 cards)