EDA Flashcards

1
Q

-the science and
the art which deals on
interpreting data from facts
and information.
-a science which deals with
the methods of gathering:
presentation, analysis and
interpretation of data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

summarizes and organizes
characteristics of a data set. A data set is a
collection of responses or observations from a
sample or entire population.

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

is a
collection of responses or observations from a
sample or entire population.

A

Data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Make inferences and draw conclusions about a population based on sample data. Examples: Hypothesis testing, confidence
intervals, regression analysis, ANOVA (analysis of variance), chi-square tests, t-tests, etc.

A

Inferential Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

There are three main types of descriptive statistics:

A

Distribution, Central Tendency, Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

One of main approaches to measuring central tendency that involves calculating central tendency measures from grouped or categorized data. Grouping data involves creating intervals or classes to organize the data into
ranges.

A

Grouped central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

One of main approaches to measuring central tendency that refers to calculating central
tendency measures directly from the individual data points without any prior grouping or categorization.

A

Ungrouped central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the sum of all values divided by the
total number of values.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the middle number in an ordered
dataset.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the most frequent value

A

mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of variability give you a sense of how spread out the response values are. True or False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Values that divide your data into quarters (ex. 1st, 2nd , 3rd, 4th quarters)

A

Quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is the cut off point for a certain fraction of a sample

A

Quantiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sort data into ten equal parts (ex. 1st decile, 2nd decile, up to 10th decile)

A

Deciles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

a number where a certain percentage of scores fall below that number

A

Percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

is also referred to as spread, scatter or dispersion. It is most measured with range, interquartile range, standard deviation, variance.

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the difference between the highest and lowest values

A

range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

the range of the middle half of a
distribution

A

Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

average distance from the mean

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

average of squared distances from the mean

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

is the “middle” value in the first half of the rank-ordered data set.

A

Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

is the median value in the set

A

Q2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

is the “middle” value in the second half of the rank-ordered data set.

24
Q

Standard deviation (s) is the average amount of variability in your data set. It tells you on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. True or False?

25
Variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean. True or False?
True
26
is a statistical measure of the dispersion of data points around the mean.
Coefficient of Variation (relative standard deviation)
27
the number of occurrences (tally)
frequency (fi)
28
the highest value within the class
Upper class limit (UCL)
29
t h e lowest value within t h e class
Lower class limit (LCL)
30
the highest value within the class + 0.5
Upper class boundary (UCB)
31
the lowest value within the class - 0.5
Lower class boundary (LCB)
32
the average value of the class limits
class mark (Mi)
33
It tells the proportion of the total number of observations associated with e a c h category.
Relative Frequency Distribution
34
It is the sum of the first frequency a n d all frequencies below it in a frequency distribution. You must a d d a value with the next value then add the sum with the next value again and so on till the last. The last cumulative frequency will b e the total sum of all frequencies.
Cumulative Frequency Distribution (F)
35
are individual facts, statistics, or items of information, often numeric.
Data
36
Types of Data
Quantitative and Qualitative
37
Data that can be measured with numbers, such as duration or speed
Quantitative
38
non-numerical data that is categorical, such as yes/no responses or eye color
Qualitative
39
whole numbers that can’t be broken down, such as a number of items
Discrete
40
whole numbers that can be broken down, such as height or weight
continuous
41
types of continuous
Interval and Ratio
42
numbers with known differences between variables, such as time
Interval
43
Numbers that have measurable intervals where difference can be determined, such as height or weight
Ratio
44
Data used for naming variables, such as hair color
Nominal
45
Types of Qualitative
Nominal and Ordinal
46
Data used to describe the order of values, such as 1=happy, 2=neutral, 3=unhappy
Ordinal
47
is the graphical representation of information and data.
DATA visualization
48
Give 8 types of DATA VISUALIZATION CHART
Line chart, Area Chart, Bar Chart, Gantt Chart, Histogram, Scatter Plot, Pie Chart, Map Chart
49
Types of Bar Graph/Chart
Horizontal, Grouped, Vertical
50
display trends overtime
Line Chart
51
a line chart with areas below the lines filled with colors
Area Chart
52
display trends with multiple variables
Bar Chart
53
showing activities (tasks or events) displayed against time.
Gantt Chart
54
display the shape and spread of continuous dataset samples
Histogram
55
show correlation in a dataset
Scatter Plot
56
show the contribution of data point inside a whole dataset
Pie Chart
57
show data with location as variable
Map Chart