Stats for Data Science Flashcards

1
Q

Descriptive Analytics

A

Leveraging historical data to determine “What” happened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predictive Analytics

A

Leveraging historical data to determine “What will” happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prescriptive Analytics

A

Based on information gained from predictive analytics, the information is used to determine “What will we do”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Probability

A

The measure of the likelihood that an event will occur based on a random experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Complement

A

P(A) + P(A’) = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Intersection

A

P(A∩B)=P(A)P(B) Set off all elements that are members of both A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Union

A

P(A∪B)=P(A)+P(B)−P(A∩B) Set of all elements in the collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditional Probability

A

P(A|B) is a measure of the probability of one event occurring with some relationship to one or more other events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Independent Events

A

Two events are independent if the occurrence of one does not affect the probability of occurrence of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mutually Exclusive Events

A

Two events are mutually exclusive if they cannot both occur at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bates’ Theorem

A

Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean

A

The average of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

The middle value of an ordered dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mode

A

The most frequent value in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Skewness

A

A measure of symmetry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kurtosis

A

A measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Range

A

The difference between the highest and lowest value in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Interquartile Range

A

IQR = Q3−Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance

A

The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard Deviation

A

The standard difference between each data point and the mean and the square root of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Standard Error

A

An estimate of the standard deviation of the sampling distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Causality

A

Relationship between two events where one event is affected by the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Covariance

A

A quantitative measure of the joint variability between two or more variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Correlation

A

Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Probability Mass Function

A

A function that gives the probability that a discrete random variable is exactly equal to some value.

26
Q

Probability Density Function

A

A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.

27
Q

Cumulative Density Function

A

A function that gives the probability that a random variable is less than or equal to a certain value.

28
Q

Uniform Distribution

A

Also called a rectangular distribution, is a probability distribution where all outcomes are equally likely.

29
Q

Normal/Gaussian Distribution

A

The curve of the distribution is bell-shaped and symmetrical and is related to the Central Limit Theorem that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.

30
Q

Central Limit Theorem

A

the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.

31
Q

Exponential Distribution

A

A probability distribution of the time between the events in a Poisson point process.

32
Q

Chi-Squared Distribution

A

The distribution of the sum of squared standard normal deviates.

33
Q

Bernoulli Distribution

A

The distribution of a random variable which takes a single trial and only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).

34
Q

Binomial Distribution

A

The distribution of the number of successes in a sequence of n independent experiments, and each with only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).

35
Q

Poisson Distribution

A

The distribution that expresses the probability of a given number of events k occurring in a fixed interval of time if these events occur with a known constant average rate λ and independently of the time.

36
Q

Null Hypothesis

A

A general statement that there is no relationship between two measured phenomena or no association among groups.

37
Q

Alternative Hypothesis

A

Contrary to the null hypothesis.

38
Q

Type 1 Error

A

rejection of a true null hypothesis.

39
Q

Type 2 Error

A

the non-rejection of a false null hypothesis.

40
Q

P-Value

A

When p-value > α, we fail to reject the null hypothesis, while p-value ≤ α, we reject the null hypothesis and we can conclude that we have the significant result.

41
Q

Critical Value

A

A point on the scale of the test statistic beyond which we reject the null hypothesis, and, is derived from the level of significance α of the test.

42
Q

Significance Level & Rejection Region

A

The rejection region is actually depended on the significance level. The significance level is denoted by α and is the probability of rejecting the null hypothesis if it is true.

43
Q

Z-Score

A

finds the distance from the sample’s mean to an individual data point expressed in units of standard deviation. (Large sample size)

44
Q

T - Score

A

A T-test is the statistical test if the population variance is unknown and the sample size is not large (n < 30).

45
Q

Paired Sample

A

means that we collect data twice from the same group, person, item or thing.

46
Q

Independent Sample

A

implies that the two samples must have come from two completely different populations.

47
Q

1 -Way ANOVA

A

compare two means from tow independent group using only one independent variable.

48
Q

2 -Way ANOVA

A

is the extension of one-way ANOVA using two independent variables to calculate main effect and interaction effect.

49
Q

Chi -Square Goodness of Fit Test

A

determine if a sample matches the population fit one categorical variable to a distribution.

50
Q

Chi -Square Test for Independence

A

compare two sets of data to see if there is a relationship.

51
Q

Linear Regression

A

is a linear approach to modeling the relationship between a dependent variable and one independent variable.

52
Q

Independent Variable

A

is the variable that is controlled in a scientific experiment to test the effects on the dependent variable.

53
Q

Dependent Variable

A

is the variable being measured in a scientific experiment.

54
Q

Multiple Linear Regression

A

is a linear approach to modeling the relationship between a dependent variable and two or more independent variables.

55
Q

Linear Regression: Step#1

A

Understand the model description, causality and directionality

56
Q

Linear Regression: Step#2

A

Check the data, categorical data, missing data and outliers

57
Q

Linear Regression: Step#3

A

Simple Analysis — Check the effect comparing between dependent variable to independent variable and independent variable to independent variable

58
Q

Linear Regression: Step#4

A

Multiple Linear Regression — Check the model and the correct variables

59
Q

Linear Regression: Step#5

A

Residual Analysis: Check normal distribution and normality for the residuals.

60
Q

Linear Regression: Step#5

A

Interpretation of Regression Output: R-Squared is a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variables. Higher R-Squared value represents smaller differences between the observed data and fitted values.

  • P - Value
  • Regression Equation