Week 4 Flashcards
Data
recorded values of qualitative or quantitative observations.
Population
the collection of all subjects of interest.
Sample
a subset of the population of interest.
Parameters
a characteristic of a population.
Statistic
a characteristic of a sample.
Levels of Measurement
qualitative [nominal (categories that cannot be put in any order) & ordinal (categories that can be ordered)] & quantitative [interval (-infinity to infinity) & ratio (0 to infinity)]
Measure of Central Tendency
Mean (average of data points), Median (middle of data points) and Mode (most recurring data point)
Measure of Position
Mean, Median, Mode, Min, Max.
Measure of Dispersion
Range, frequency, variance, standard deviation.
Measures of Relationship
Covariance, Correlation, Regression, Trend, Forecast.
Measures of Asymmetry
Skewness and Kurtosis.
Statistics
the science of collecting, summarizing, and drawing valid conclusions from data which involves: selecting models to validate hypotheses and test assumptions, determining the relationships between variables, assessing data trends and trajectories, identifying patterns and groupings, detecting mistakes and outliers.
Uniform Distribution
distribution (continuous or discrete) whose data points lie within a range and all have equal probability of appearing.
Binomial Distribution
discrete probability distribution with parameters n and p of the number of successes in a sequence of n independent experiments and each with its Boolean-valued outcome: success (with probability p) or failure (with probability q = 1-p).
Poisson distribution
discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
Normal distribution
continuous probability distribution whose importance stems from the fact that random variables without known distribution will mimic the distribution if a large enough sample of those random variables are collected (CLT).
Central Limit Theorem
no matter the underlying distribution of the dataset, the sampling distributions of the means would approximate a normal distribution. The mean of the sampling distribution would be equal to the mean of the original distribution and the variance would be n times smaller .
Hypothesis Testing
the testing of a hypothesis (an idea that can be tested and a supposition or proposed explanation made on the basis of limited evidence as a starting point for further explanation.
ANOVA (Analysis of Variance)
a collection of statistical models and their associated estimation procedures used to analyze the difference among means. Based on the law of total variance, ANOVA provides a statistical test of whether two or more population means are equal.
Chi-Squared Analysis
a statistical hypothesis test that is valid to perform when the test statistic is chi-squared distributed under the null hypothesis. Used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table.
Standardization
the normalization of the normal distribution (N(0,1)) .
Z score
the standard score calculated by subtracting the population mean from an individual raw score and dividing the difference by the population standard deviation.
Arithmetic mean, Median, Mode
average of data points, center of data points and data point that appears most frequently.
Range, Average Deviation, Variance
difference between the maximum and minimum data point, number that indicates how data points deviate from the mean, taking the standard deviation and squaring it.
Standard deviation
number that indicates how much data points deviate from the mean.
Covariance
a measure of the joint variability of two variables
Correlation
a measure of the joint variability of two variables. Standardized measure of covariance.
Skewness
a measure of a symmetry that indicates whether the observations in a dataset are concentrated on one side.
Probability Sampling
each element from the population dataset has a chance of being deleted as a sample. Ex. Simple, Stratified, Cluster, and Systematic random sampling.
Non Probability Sampling
the practice of sampling without the assurance that elements have the equal amount of chance of being selected. Ex. Convenience, Voluntary and Snowball sampling, Quota, and Purposive.
Bias
the risk that a subset of a population will not accurately represent the overall population.