Stats for Data Science Flashcards
Descriptive Analytics
Leveraging historical data to determine “What” happened.
Predictive Analytics
Leveraging historical data to determine “What will” happen.
Prescriptive Analytics
Based on information gained from predictive analytics, the information is used to determine “What will we do”.
Probability
The measure of the likelihood that an event will occur based on a random experiment.
Complement
P(A) + P(A’) = 1
Intersection
P(A∩B)=P(A)P(B) Set off all elements that are members of both A and B.
Union
P(A∪B)=P(A)+P(B)−P(A∩B) Set of all elements in the collection.
Conditional Probability
P(A|B) is a measure of the probability of one event occurring with some relationship to one or more other events.
Independent Events
Two events are independent if the occurrence of one does not affect the probability of occurrence of the other.
Mutually Exclusive Events
Two events are mutually exclusive if they cannot both occur at the same time.
Bates’ Theorem
Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.
Mean
The average of the dataset.
Median
The middle value of an ordered dataset.
Mode
The most frequent value in the dataset.
Skewness
A measure of symmetry.
Kurtosis
A measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution
Range
The difference between the highest and lowest value in the dataset.
Interquartile Range
IQR = Q3−Q1
Variance
The average squared difference of the values from the mean to measure how spread out a set of data is relative to mean.
Standard Deviation
The standard difference between each data point and the mean and the square root of variance.
Standard Error
An estimate of the standard deviation of the sampling distribution.
Causality
Relationship between two events where one event is affected by the other.
Covariance
A quantitative measure of the joint variability between two or more variables.
Correlation
Measure the relationship between two variables and ranges from -1 to 1, the normalized version of covariance.