Chp2 Stats Flashcards

1
Q

What is a random variable

A

It is a variable whose possible values are drawn from the outcome of a random phenomenonR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random variable examples

A

Tossing a coin, Tossing a die

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two types of R.V.

A

Discrete, Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we assume about the observed data

A

It is a random sample where each sample is drawn from X where each xi is independently and identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete

A

Takes on a countable number of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

A

Takes on an infinite number of possible values within a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability Mass Function

A

For discrete variables,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probability Density Function

A

For continuous variables,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kernel density estimation

A

A statistical technique that smooths out data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of central tendency

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mean

A

Average of all data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Robustness

A

The tendency to not be affected by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is the mean robust?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to obtain a robust mean

A

Trimmed mean, which occurs after extreme values on either side are discarded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Median

A

The middle value when the data points are arranged in order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is the median robust?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mode

A

The most frequent occurring value in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is mode a useful measure of central tendency?

A

May not be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When is robustness important?

A

When your data might contain anomalies or extreme values that could distort the overall analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Measures of dispersion

A

Variance
Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variance

A

A measure of how much the values of X deviate from the expected (mean) value of X – measure of dispersion

22
Q

Sample standard deviation

A

The squared root of sample variance

23
Q

What does standard deviation tell you

A

It directly tells you how much, on average, each data point deviates from the mean – just makes number small

24
Q

bi-variate/multi-variate analysis

A

Can consider multiple vectors, as oppose to just 1 with varaince/std

25
Q

What does bi-variate analysis try to understand

A

The association or dependence on X1 and X2

26
Q

How to calculate mean and variance (first and second moment) in multivariate?

A

Same as normal, but return a vector instead of a single value

27
Q

How to get total variance for multivariates

A

Sum all individual variances in the output vector

28
Q

Covariance

A

Measure of the association or linear dependence between two variables

29
Q

How to summarize covariance information for n attributes

A

nxn covariance matrix

30
Q

Main diagonal of the matrix

A

Holds the variance of the column with itself

31
Q

Is covariance matrix symmetric?

A

Yes

32
Q

Correlation between two variable

A

The standardized covariance obtained by normalizing the covariance with the std of each variabl

33
Q

Which is dimensionless and which is in units obtained by multiplying the two variables

A

Correlation is dimensionless
Covariance is in units obtained by multiplying the two variables

34
Q

Range of covariance

A

-inf, + inf

35
Q

Range of correlation

A

-1, 1

36
Q

what does correlation of 1 mean?

A

As one variable increases so does the other

37
Q

Collinearity

A

Occurs when the two variables are so highly correlated that we can use one to predict another ; one variable is a linear combination of the other variable

38
Q

Normal/Gaussian Distribution

A

Parameterized by mean and std
mean = median = mode

39
Q

std decreases what happens to normal/gaussian distribution

A

Becomes steep and short

40
Q

Binomial distribution

A

Parameterized by n (number of trials) and p (probability of success in each trial)
mean: np
Median: [np]
Variance: np(1-p)

41
Q

Power-law distribution

A

Long tailed distributions, Relationships where one quantity varies as a power of another
Hard to define

42
Q

Power law distribution example

A

Area of square, quadruples when length is doubled

43
Q

Visualization is

A

Important

44
Q

XY plots

A

Scatter plots, birds eye view of how your data is distributed

45
Q

Boxplots

A

Whisker plots. Maximum, 3rd quartile, median, first quartile, minimum
max and min are outliers

46
Q

Short rectangle in box plot means

A

data is similar

47
Q

Long whiskers

A

High std and variance

48
Q

Empirical cumulative distribution function

A

CDF(y) of a dataset X at a value y is the ration of samples that are lower that the value y.

49
Q

what is cdf (X,15) X= [2, 7, 8, 9, 10, 15, 16, 20]

A

CDF(X, 15) = 6/8 = 0.75

50
Q

CDF PDF relation

A

PDF is derivative of CDF

51
Q
A