Exploratory Data Analysis and Data Visualization Flashcards

1
Q

What are data sets made up of?

A

Data objects
i.e. medical dataset: patients, treatments, medicine

Database rows - data objects
Database columns - attributes

(also called samples, examples, instances, data points, objects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are attributes in a dataset?

A

A data field representing a characteristic or feature of the data object
i.e. customer_ID

columns of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What different attribute types do we know?

A
  1. Nominal: Categories, states or names of things
    i. e. Hair-color = {black, blond, red, etc.}
  2. Binary: Nominal attributes with only 2 states (0 and 1)
    i. e. gender, medical test
  3. Ordinal: Values have a meaningful order (ranking) but magnitude between successive values in not known
    i. e. size = {small, medium, large} or army ranks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between discrete and continuous attributes?

A
  1. Discrete attributes
    - Only a finite OR countable infinite set of values (zip codes)
    - binary attributes are a special case of discrete attributes
  2. Continuous attribute
    - has real numbers as attribute values
    e. g. height, temperature, weight
    - usually represented as floating point variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the mean?

A

The average value of a dataset

calculated by the sum of all values divided by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the median?

A

Middle value –> odd number of values

average of two middle values –> even number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mode?

A

Value that occurs the most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Boxplot?

A
  1. The end of the boxes mark the quartiles
  2. The median is marked
  3. The whiskers mark the outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a Histogram display?

A

Graph display of tabulated frequencies , shown as bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between bar chart and histogram?

A

Difference to bar chart: The bar denotes the value not the height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a scatter plot good for?

A

Provides a look at clusters of points and outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What interface do you use in python to visualize data?

A

pyplot interface of the matplotlibrary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly