EDA Flashcards
-the science and
the art which deals on
interpreting data from facts
and information.
-a science which deals with
the methods of gathering:
presentation, analysis and
interpretation of data.
Statistics
summarizes and organizes
characteristics of a data set. A data set is a
collection of responses or observations from a
sample or entire population.
Descriptive statistics
is a
collection of responses or observations from a
sample or entire population.
Data set
Make inferences and draw conclusions about a population based on sample data. Examples: Hypothesis testing, confidence
intervals, regression analysis, ANOVA (analysis of variance), chi-square tests, t-tests, etc.
Inferential Statistics
There are three main types of descriptive statistics:
Distribution, Central Tendency, Variability
One of main approaches to measuring central tendency that involves calculating central tendency measures from grouped or categorized data. Grouping data involves creating intervals or classes to organize the data into
ranges.
Grouped central tendency
One of main approaches to measuring central tendency that refers to calculating central
tendency measures directly from the individual data points without any prior grouping or categorization.
Ungrouped central tendency
the sum of all values divided by the
total number of values.
Mean
the middle number in an ordered
dataset.
Median
the most frequent value
mode
Measures of variability give you a sense of how spread out the response values are. True or False?
True
Values that divide your data into quarters (ex. 1st, 2nd , 3rd, 4th quarters)
Quartile
is the cut off point for a certain fraction of a sample
Quantiles
sort data into ten equal parts (ex. 1st decile, 2nd decile, up to 10th decile)
Deciles
a number where a certain percentage of scores fall below that number
Percentile
is also referred to as spread, scatter or dispersion. It is most measured with range, interquartile range, standard deviation, variance.
Variability
the difference between the highest and lowest values
range
the range of the middle half of a
distribution
Interquartile range
average distance from the mean
standard deviation
average of squared distances from the mean
variance
is the “middle” value in the first half of the rank-ordered data set.
Q1
is the median value in the set
Q2