EDA (Exploratory Data Analysis) : what is the stat process Flashcards
What is the Mean
The Mean is the statistical average of a set of numbers
What is the Median
The Median is the middle of a set of numbers , in a data set this can be the value with equal values (rows, json objects, etc.) on each side of it, making it the middle or median.
What is the Mode
The Mode is the most Recurrent value in the dataset or set of numbers.
What is Range
The Range is the difference of the largest value in a set of number minus the smallest value.
What are the Central Tendencies
Mean, Median, Mode, skewed mean, skewed median
What is the Variance
The variance is the squared distance from the mean
What is the Standard Deviation
The Standard Deviation is the square root of the variance and is the average amount we expect a point to differ from the mean.
What are Correlations
Are the method we use to test the relationship between quantitative or categorical data, or more simply, how are things related.
What is a Correlation Coefficient
A Correlation Coefficient is a way we put a value to the relationship.
What is Empirical Probability
Is the probability that we observe from the data.
What is Theoretical Probability
Is more of an ideal or truth out there in the universe that we can’t directly see.
What is the Additive Rule
This rule states that if an event cannot be more than one state, then the probability of 2 events happening within an occurrence is the sum of both events.
What is a GLM(general linear model)
These models explain that data can be modeled with an accurate model and some degree of error and they portray a line of best fit to the data.
This can be interpreted as y = b +mx instead of
y = mx + b in most cases
What are Confidence Intervals
is an estimated range of values that seem reasonable based on what we’ve observed. It’s center is still the sample mean, but we’ve got some room on either side for our uncertainty.
What is the T-Distribution
continuous probability distribution that’s unimodal(has one peak); it’s a useful way to represent sampling distributions.
What is the Normal Distribution
A normal distribution is a symmetric bell curve that occurs when the mode, median, and mean are all the same when you visualize it.
What is the Central Limit Theorem
The central limit theorem suggests that the distribution of sample means for an independent random variable, will get gradually closer to a normal distribution as the size of the sample gets bigger and bigger.
even if the original population distribution isn’t normal itself
What is Null Hypothesis Significance Testing
A form of the Reductio AD Absurdum Argument, which tries to discredit an idea by assuming the idea is true and then showing that if you make that assumption, something contradictory happens.
What is a P-value
In probability terms, the p-value is the probability of getting a sample as or more extreme than ours, given that the null hypothesis is true
What is a Critical Value
the value of our test statistic that marks the limits of our extreme values.
What is a Test Statistic
Are a procedure that allow us to quantify how close things are to our expectations or theories.
What are Boxplots
This is a form of visualization that uses some of the measures of central tendency to picture the data
What are Stem and Leaf plots
A form of plot