EDA Flashcards
-the science and
the art which deals on
interpreting data from facts
and information.
-a science which deals with
the methods of gathering:
presentation, analysis and
interpretation of data.
Statistics
summarizes and organizes
characteristics of a data set. A data set is a
collection of responses or observations from a
sample or entire population.
Descriptive statistics
is a
collection of responses or observations from a
sample or entire population.
Data set
Make inferences and draw conclusions about a population based on sample data. Examples: Hypothesis testing, confidence
intervals, regression analysis, ANOVA (analysis of variance), chi-square tests, t-tests, etc.
Inferential Statistics
There are three main types of descriptive statistics:
Distribution, Central Tendency, Variability
One of main approaches to measuring central tendency that involves calculating central tendency measures from grouped or categorized data. Grouping data involves creating intervals or classes to organize the data into
ranges.
Grouped central tendency
One of main approaches to measuring central tendency that refers to calculating central
tendency measures directly from the individual data points without any prior grouping or categorization.
Ungrouped central tendency
the sum of all values divided by the
total number of values.
Mean
the middle number in an ordered
dataset.
Median
the most frequent value
mode
Measures of variability give you a sense of how spread out the response values are. True or False?
True
Values that divide your data into quarters (ex. 1st, 2nd , 3rd, 4th quarters)
Quartile
is the cut off point for a certain fraction of a sample
Quantiles
sort data into ten equal parts (ex. 1st decile, 2nd decile, up to 10th decile)
Deciles
a number where a certain percentage of scores fall below that number
Percentile
is also referred to as spread, scatter or dispersion. It is most measured with range, interquartile range, standard deviation, variance.
Variability
the difference between the highest and lowest values
range
the range of the middle half of a
distribution
Interquartile range
average distance from the mean
standard deviation
average of squared distances from the mean
variance
is the “middle” value in the first half of the rank-ordered data set.
Q1
is the median value in the set
Q2
is the “middle” value in the second half of the rank-ordered data set.
Q3
Standard deviation (s) is the average amount of variability in your data set. It tells you on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. True or False?
True
Variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean. True or False?
True
is a statistical measure of the dispersion of data points around the mean.
Coefficient of Variation (relative standard deviation)
the number of occurrences (tally)
frequency (fi)
the highest value within the class
Upper class limit (UCL)
t h e lowest value within t h e class
Lower class limit (LCL)
the highest value within the class + 0.5
Upper class boundary (UCB)
the lowest value within the class - 0.5
Lower class boundary (LCB)
the average value of the class limits
class mark (Mi)
It tells the proportion of the total number of
observations associated with e a c h category.
Relative Frequency Distribution
It is the sum of the first frequency a n d all
frequencies below it in a frequency distribution. You must a d d a value with the
next value then add the sum with the next value again and so on till the last. The
last cumulative frequency will b e the total sum of all frequencies.
Cumulative Frequency Distribution (F)
are individual facts, statistics, or items of information, often numeric.
Data
Types of Data
Quantitative and Qualitative
Data that can be measured with numbers, such as duration or speed
Quantitative
non-numerical data that is categorical, such as yes/no responses or eye color
Qualitative
whole numbers that can’t be broken down, such as a number of items
Discrete
whole numbers that can be broken down, such as height or weight
continuous
types of continuous
Interval and Ratio
numbers with known differences between variables, such as time
Interval
Numbers that have measurable intervals
where difference can be determined, such
as height or weight
Ratio
Data used for naming variables, such as hair color
Nominal
Types of Qualitative
Nominal and Ordinal
Data used to describe the order of values, such as 1=happy, 2=neutral, 3=unhappy
Ordinal
is the graphical representation of information and data.
DATA visualization
Give 8 types of DATA VISUALIZATION CHART
Line chart, Area Chart, Bar Chart, Gantt Chart, Histogram, Scatter Plot, Pie Chart, Map Chart
Types of Bar Graph/Chart
Horizontal, Grouped, Vertical
display trends overtime
Line Chart
a line chart with areas below the lines filled with colors
Area Chart
display trends with multiple variables
Bar Chart
showing activities (tasks or events) displayed
against time.
Gantt Chart
display the shape and spread of continuous dataset samples
Histogram
show correlation in a dataset
Scatter Plot
show the contribution of data point inside a whole dataset
Pie Chart
show data with location as variable
Map Chart