Data Visualization Flashcards
What is matplotlib
A library that specializes in data visualization. The oldest and by far the most popular python plotting library
How to import matplotlib library
import matplotlib.pyplot as plt
Function used to create line graphs
plt.plot()
Function used to create bar graphs
plt.bar()
Function used to create a scatter plot
plt.scatter()
What are Histograms?
It is a common practice to create histograms to explore your data as it can give you a general idea of what your data looks like
How to create histogram chart?
By entering in this function with value
df[‘Object’].hist()
How to remove scientific notation?
df[‘object’].hist()
plt.tickable_format(useOffset=False, style=’plain’)
How to address overlapping after removing scientific notation
df[‘object’].hist()
plt.xticks(rotation = 45)
plt.ticklabel_format(useOffset=False, style=’plain’)
What are bins in histogram?
Each bin is a plotted bar whose height corresponds to how many data points are in that bin
What are Boxplots?
A boxplot is a standardized way of displaying the distribution of data based on a five number summary(min,median,max). It can tell you about your outliers and what their values are.
How to use seaborn boxplot
sns.boxplot(x=’categorical’, y=’numerical’, pallet=’blugrn’)
What does correlated mean?
As a data scientist, we are often interested in if (and how) different features of a dataset may be related. For numerical data, one key way to describe a relationship is the correlation
How to check for correlation in python?
By using df.corr() function
What is exploratory visualization?