Visualization Flashcards
Categorical Variables
Categorical Variables are variables that cannot be classified as a numerical value
Categorical variables can be classified as either nominal or ordinal
Nominal
- No logical order
- Examples include eye color, dog breed, nationality, etc.
Ordinal
- Has an inherit order
- Examples include shirt size, education level, letter grade, etc
Numerical Variables
Continuous
- Can take on any value within a range
- Includes decimal or fractional values
Discrete
- Only defined for whole numbers
- Does not make sense for fractional values
Making a Bar chart in plotly
fig = px.bar( data, x=”Language”, y=”Users”, orientation=”v”, height=350, width=450)
You can make a horizontal Bar chart by making orientation=”h”
Making a pie chart in plotly
fig = px.pie(data, names=”Race/Ethnicity”, values=”Population Percentage”, height=400, width=500)
Common Categorical Graph Types
Vertical Bar Charts
Horizontal Bar Charts
Pie Charts
Common Numerical Bar Charts
Line Graphs
Scatterplots
Histograms
Boxplots
Making a scatterplot in plotly
fig = px.scatter(data, “Height”, “Weight”)
Line Graph
Line graphs are similar to scatterplots, but have some key differences
While scatterplots often have multiple y-values for each x-value, line graphs have exactly one
Line graphs are most often used when the x-axis is a time period with a constant interval
While scatterplots often have multiple y-values for each x-value, line graphs have exactly one
Most importantly, line graphs represent trend.
Histogram
Bar : distributions of quantitative variables
Bar width: Can be uneven
Bar order: contiguous
Frequency: By area of bars
Given a list of numerical values, a histogram displays the distribution of these values by grouping them in bins and showing the frequency of values within each bin.
Relative Frequency Histogram
Display the bins along the x-axis, and the percentage of data (relative frequency) that is contained within that bin on the y-axis
Formulae for computing height and area of a relative frequency histogram bar:
Height = % of data per x axis unit
Area = % of data in the bin
Total area = 100%
Binning
Binning is the process of grouping data into “bins” of a given size.