EDA (Exploratory Data Analysis) : what is the stat process Flashcards
What is the Mean
The Mean is the statistical average of a set of numbers
What is the Median
The Median is the middle of a set of numbers , in a data set this can be the value with equal values (rows, json objects, etc.) on each side of it, making it the middle or median.
What is the Mode
The Mode is the most Recurrent value in the dataset or set of numbers.
What is Range
The Range is the difference of the largest value in a set of number minus the smallest value.
What are the Central Tendencies
Mean, Median, Mode, skewed mean, skewed median
What is the Variance
The variance is the squared distance from the mean
What is the Standard Deviation
The Standard Deviation is the square root of the variance and is the average amount we expect a point to differ from the mean.
What are Correlations
Are the method we use to test the relationship between quantitative or categorical data, or more simply, how are things related.
What is a Correlation Coefficient
A Correlation Coefficient is a way we put a value to the relationship.
What is Empirical Probability
Is the probability that we observe from the data.
What is Theoretical Probability
Is more of an ideal or truth out there in the universe that we can’t directly see.
What is the Additive Rule
This rule states that if an event cannot be more than one state, then the probability of 2 events happening within an occurrence is the sum of both events.
What is a GLM(general linear model)
These models explain that data can be modeled with an accurate model and some degree of error and they portray a line of best fit to the data.
This can be interpreted as y = b +mx instead of
y = mx + b in most cases
What are Confidence Intervals
is an estimated range of values that seem reasonable based on what we’ve observed. It’s center is still the sample mean, but we’ve got some room on either side for our uncertainty.
What is the T-Distribution
continuous probability distribution that’s unimodal(has one peak); it’s a useful way to represent sampling distributions.