Stats for Data Science Flashcards

Question

Probability Mass Function

Answer 1

A function that gives the probability that a discrete random variable is exactly equal to some value.

Answer 2

A function for continuous data where the value at any given sample can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.

Answer 3

A function that gives the probability that a random variable is less than or equal to a certain value.

Answer 4

Also called a rectangular distribution, is a probability distribution where all outcomes are equally likely.

Answer 5

The curve of the distribution is bell-shaped and symmetrical and is related to the Central Limit Theorem that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.

Answer 6

the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.

Answer 7

A probability distribution of the time between the events in a Poisson point process.

Answer 8

The distribution of the sum of squared standard normal deviates.

Answer 9

The distribution of a random variable which takes a single trial and only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).

Answer 10

The distribution of the number of successes in a sequence of n independent experiments, and each with only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with probability (1-p).

Answer 11

The distribution that expresses the probability of a given number of events k occurring in a fixed interval of time if these events occur with a known constant average rate λ and independently of the time.

Answer 12

A general statement that there is no relationship between two measured phenomena or no association among groups.

Answer 13

Contrary to the null hypothesis.

Answer 14

rejection of a true null hypothesis.

Answer 15

the non-rejection of a false null hypothesis.

Answer 16

When p-value > α, we fail to reject the null hypothesis, while p-value ≤ α, we reject the null hypothesis and we can conclude that we have the significant result.

Answer 17

A point on the scale of the test statistic beyond which we reject the null hypothesis, and, is derived from the level of significance α of the test.

Answer 18

The rejection region is actually depended on the significance level. The significance level is denoted by α and is the probability of rejecting the null hypothesis if it is true.

Answer 19

finds the distance from the sample’s mean to an individual data point expressed in units of standard deviation. (Large sample size)

Answer 20

A T-test is the statistical test if the population variance is unknown and the sample size is not large (n < 30).

Answer 21

means that we collect data twice from the same group, person, item or thing.

Answer 22

implies that the two samples must have come from two completely different populations.

Answer 23

compare two means from tow independent group using only one independent variable.

Answer 24

is the extension of one-way ANOVA using two independent variables to calculate main effect and interaction effect.

Answer 25

determine if a sample matches the population fit one categorical variable to a distribution.

Answer 26

compare two sets of data to see if there is a relationship.

Answer 27

is a linear approach to modeling the relationship between a dependent variable and one independent variable.

Answer 28

is the variable that is controlled in a scientific experiment to test the effects on the dependent variable.

Answer 29

is the variable being measured in a scientific experiment.

Answer 30

is a linear approach to modeling the relationship between a dependent variable and two or more independent variables.

Answer 31

Understand the model description, causality and directionality

Answer 32

Check the data, categorical data, missing data and outliers

Answer 33

Simple Analysis — Check the effect comparing between dependent variable to independent variable and independent variable to independent variable

Answer 34

Multiple Linear Regression — Check the model and the correct variables

Answer 35

Residual Analysis: Check normal distribution and normality for the residuals.

Answer 36

Interpretation of Regression Output: R-Squared is a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variables. Higher R-Squared value represents smaller differences between the observed data and fitted values. * P - Value * Regression Equation

Stats for Data Science Flashcards

(60 cards)