Visualizations and Exploratory Analysis Flashcards

1
Q

Think about a database. What is an attribute (or dimension, feature, variable)?

A

An attribute is a data field, representing a characteristic or a feature of a data object.

e.g., customer_ID, name, address

It is essentially what you see in columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When talking about a database, the rows are data objects and the columns are object’s attributes. True or false?

A

Very true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give some examples of the following attribute types:

a. Nominal
b. Binary
c. Ordinal
d. Numeric (quantity, interval, ratio)

A

Nominal = categories, states, or “names of things”
e.g., hair color, marital status, occupation

Binary

  • nominal attribute with only two states (0 and 1)
  • symmetric binary = both outcomes are equally important (e.g. gender)
  • asymmetric binary = outcomes are not equally important (e.g. medical test, positive or negative)

Ordinal = values have a meaningful order but the magnitude between successive values is unknown
e.g., small medium large

Numeric

  • quantity: integer, real number
  • interval: measured on a scale of equal-sized units
  • ratio: e.g., length, monetary quantities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between discrete and continuous attributes?

A

Discrete - has only a finite or countably infinite set of values (e.g., zip codes, profession)

Continuous - has real numbers (e.g., temperature, height, weight), technically they can be unbounded, but in reality not really

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mean = ?

A

Average. you know what it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Median = ?

A

Middle value if odd number of values, or average of the middle two values otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mode = ?

A

Value that occurs most frequently in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a bell shaped distribution, mean = median = mode. True or false?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a box-plot graph display?

A

Minimum, Q1 (first quartile), Median, Q3, Maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an outlier?

A

A point beyond a specified outlier threshold. Can be easily (or not?) spotted in graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Do you know the properties of a normal distribution curve?

A

You should. Check online.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a histogram?

A

A graph display of tabulated frequencies, shown as bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a quantile plot?

A

A plot that displays all of the data for two variables, allowing the user to assess both overall behavior and unusual occurrences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is a scatter plot useful?

A

Provides a first look at bivariate data to see clusters of points, outliers etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly