EDA Flashcards by Sassy Ryrybom

-the science and
the art which deals on
interpreting data from facts
and information.
-a science which deals with
the methods of gathering:
presentation, analysis and
interpretation of data.

Statistics

How well did you know this?

Not at all

Perfectly

summarizes and organizes
characteristics of a data set. A data set is a
collection of responses or observations from a
sample or entire population.

Descriptive statistics

How well did you know this?

Not at all

Perfectly

is a
collection of responses or observations from a
sample or entire population.

Data set

How well did you know this?

Not at all

Perfectly

Make inferences and draw conclusions about a population based on sample data. Examples: Hypothesis testing, confidence
intervals, regression analysis, ANOVA (analysis of variance), chi-square tests, t-tests, etc.

Inferential Statistics

How well did you know this?

Not at all

Perfectly

There are three main types of descriptive statistics:

Distribution, Central Tendency, Variability

How well did you know this?

Not at all

Perfectly

One of main approaches to measuring central tendency that involves calculating central tendency measures from grouped or categorized data. Grouping data involves creating intervals or classes to organize the data into
ranges.

Grouped central tendency

How well did you know this?

Not at all

Perfectly

One of main approaches to measuring central tendency that refers to calculating central
tendency measures directly from the individual data points without any prior grouping or categorization.

Ungrouped central tendency

How well did you know this?

Not at all

Perfectly

the sum of all values divided by the
total number of values.

Mean

How well did you know this?

Not at all

Perfectly

the middle number in an ordered
dataset.

Median

How well did you know this?

Not at all

Perfectly

the most frequent value

mode

How well did you know this?

Not at all

Perfectly

Measures of variability give you a sense of how spread out the response values are. True or False?

True

How well did you know this?

Not at all

Perfectly

Values that divide your data into quarters (ex. 1st, 2nd , 3rd, 4th quarters)

Quartile

How well did you know this?

Not at all

Perfectly

is the cut off point for a certain fraction of a sample

Quantiles

How well did you know this?

Not at all

Perfectly

sort data into ten equal parts (ex. 1st decile, 2nd decile, up to 10th decile)

Deciles

How well did you know this?

Not at all

Perfectly

a number where a certain percentage of scores fall below that number

Percentile

How well did you know this?

Not at all

Perfectly

is also referred to as spread, scatter or dispersion. It is most measured with range, interquartile range, standard deviation, variance.

Variability

How well did you know this?

Not at all

Perfectly

the difference between the highest and lowest values

range

How well did you know this?

Not at all

Perfectly

the range of the middle half of a
distribution

Interquartile range

How well did you know this?

Not at all

Perfectly

average distance from the mean

standard deviation

How well did you know this?

Not at all

Perfectly

average of squared distances from the mean

variance

How well did you know this?

Not at all

Perfectly

is the “middle” value in the first half of the rank-ordered data set.

How well did you know this?

Not at all

Perfectly

is the median value in the set

How well did you know this?

Not at all

Perfectly

is the “middle” value in the second half of the rank-ordered data set.

Study These Flashcards

Standard deviation (s) is the average amount of variability in your data set. It tells you on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. True or False?

Study These Flashcards

True

Variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean. True or False?

True

is a statistical measure of the dispersion of data points around the mean.

Coefficient of Variation (relative standard deviation)

the number of occurrences (tally)

frequency (fi)

the highest value within the class

Upper class limit (UCL)

t h e lowest value within t h e class

Lower class limit (LCL)

the highest value within the class + 0.5

Upper class boundary (UCB)

the lowest value within the class - 0.5

Lower class boundary (LCB)

the average value of the class limits

class mark (Mi)

It tells the proportion of the total number of observations associated with e a c h category.

Relative Frequency Distribution

It is the sum of the first frequency a n d all frequencies below it in a frequency distribution. You must a d d a value with the next value then add the sum with the next value again and so on till the last. The last cumulative frequency will b e the total sum of all frequencies.

Cumulative Frequency Distribution (F)

are individual facts, statistics, or items of information, often numeric.

Data

Types of Data

Quantitative and Qualitative

Data that can be measured with numbers, such as duration or speed

Quantitative

non-numerical data that is categorical, such as yes/no responses or eye color

Qualitative

whole numbers that can’t be broken down, such as a number of items

Discrete

whole numbers that can be broken down, such as height or weight

continuous

types of continuous

Interval and Ratio

numbers with known differences between variables, such as time

Interval

Numbers that have measurable intervals where difference can be determined, such as height or weight

Ratio

Data used for naming variables, such as hair color

Nominal

Types of Qualitative

Nominal and Ordinal

Data used to describe the order of values, such as 1=happy, 2=neutral, 3=unhappy

Ordinal

is the graphical representation of information and data.

DATA visualization

Give 8 types of DATA VISUALIZATION CHART

Line chart, Area Chart, Bar Chart, Gantt Chart, Histogram, Scatter Plot, Pie Chart, Map Chart

Types of Bar Graph/Chart

Horizontal, Grouped, Vertical

display trends overtime

Line Chart

a line chart with areas below the lines filled with colors

Area Chart

display trends with multiple variables

Bar Chart

showing activities (tasks or events) displayed against time.

Gantt Chart

display the shape and spread of continuous dataset samples

Histogram

show correlation in a dataset

Scatter Plot

show the contribution of data point inside a whole dataset

Pie Chart

show data with location as variable

Map Chart

EDA Flashcards

(57 cards)