EDA Flashcards

1
Q

-the science and
the art which deals on
interpreting data from facts
and information.
-a science which deals with
the methods of gathering:
presentation, analysis and
interpretation of data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

summarizes and organizes
characteristics of a data set. A data set is a
collection of responses or observations from a
sample or entire population.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

is a
collection of responses or observations from a
sample or entire population.

A

Data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Make inferences and draw conclusions about a population based on sample data. Examples: Hypothesis testing, confidence
intervals, regression analysis, ANOVA (analysis of variance), chi-square tests, t-tests, etc.

A

Inferential Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

There are three main types of descriptive statistics:

A

Distribution, Central Tendency, Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

One of main approaches to measuring central tendency that involves calculating central tendency measures from grouped or categorized data. Grouping data involves creating intervals or classes to organize the data into
ranges.

A

Grouped central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

One of main approaches to measuring central tendency that refers to calculating central
tendency measures directly from the individual data points without any prior grouping or categorization.

A

Ungrouped central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the sum of all values divided by the
total number of values.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the middle number in an ordered
dataset.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the most frequent value

A

mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of variability give you a sense of how spread out the response values are. True or False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Values that divide your data into quarters (ex. 1st, 2nd , 3rd, 4th quarters)

A

Quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is the cut off point for a certain fraction of a sample

A

Quantiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sort data into ten equal parts (ex. 1st decile, 2nd decile, up to 10th decile)

A

Deciles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

a number where a certain percentage of scores fall below that number

A

Percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

is also referred to as spread, scatter or dispersion. It is most measured with range, interquartile range, standard deviation, variance.

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the difference between the highest and lowest values

A

range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

the range of the middle half of a
distribution

A

Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

average distance from the mean

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

average of squared distances from the mean

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

is the “middle” value in the first half of the rank-ordered data set.

A

Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

is the median value in the set

A

Q2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

is the “middle” value in the second half of the rank-ordered data set.

A

Q3

24
Q

Standard deviation (s) is the average amount of variability in your data set. It tells you on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. True or False?

A

True

25
Q

Variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean. True or False?

A

True

26
Q

is a statistical measure of the dispersion of data points around the mean.

A

Coefficient of Variation (relative standard deviation)

27
Q

the number of occurrences (tally)

A

frequency (fi)

28
Q

the highest value within the class

A

Upper class limit (UCL)

29
Q

t h e lowest value within t h e class

A

Lower class limit (LCL)

30
Q

the highest value within the class + 0.5

A

Upper class boundary (UCB)

31
Q

the lowest value within the class - 0.5

A

Lower class boundary (LCB)

32
Q

the average value of the class limits

A

class mark (Mi)

33
Q

It tells the proportion of the total number of
observations associated with e a c h category.

A

Relative Frequency Distribution

34
Q

It is the sum of the first frequency a n d all
frequencies below it in a frequency distribution. You must a d d a value with the
next value then add the sum with the next value again and so on till the last. The
last cumulative frequency will b e the total sum of all frequencies.

A

Cumulative Frequency Distribution (F)

35
Q

are individual facts, statistics, or items of information, often numeric.

A

Data

36
Q

Types of Data

A

Quantitative and Qualitative

37
Q

Data that can be measured with numbers, such as duration or speed

A

Quantitative

38
Q

non-numerical data that is categorical, such as yes/no responses or eye color

A

Qualitative

39
Q

whole numbers that can’t be broken down, such as a number of items

A

Discrete

40
Q

whole numbers that can be broken down, such as height or weight

A

continuous

41
Q

types of continuous

A

Interval and Ratio

42
Q

numbers with known differences between variables, such as time

A

Interval

43
Q

Numbers that have measurable intervals
where difference can be determined, such
as height or weight

A

Ratio

44
Q

Data used for naming variables, such as hair color

A

Nominal

45
Q

Types of Qualitative

A

Nominal and Ordinal

46
Q

Data used to describe the order of values, such as 1=happy, 2=neutral, 3=unhappy

A

Ordinal

47
Q

is the graphical representation of information and data.

A

DATA visualization

48
Q

Give 8 types of DATA VISUALIZATION CHART

A

Line chart, Area Chart, Bar Chart, Gantt Chart, Histogram, Scatter Plot, Pie Chart, Map Chart

49
Q

Types of Bar Graph/Chart

A

Horizontal, Grouped, Vertical

50
Q

display trends overtime

A

Line Chart

51
Q

a line chart with areas below the lines filled with colors

A

Area Chart

52
Q

display trends with multiple variables

A

Bar Chart

53
Q

showing activities (tasks or events) displayed
against time.

A

Gantt Chart

54
Q

display the shape and spread of continuous dataset samples

A

Histogram

55
Q

show correlation in a dataset

A

Scatter Plot

56
Q

show the contribution of data point inside a whole dataset

A

Pie Chart

57
Q

show data with location as variable

A

Map Chart