DC Code Flashcards
What way do we use matplotlib?
Using the main object-oriented interface provided through the pyplot submodule.
How do we import the pyplot submodule?
import matplotlib.pyplot as plt
What does the plt.subplots() command do?
Creates two different objects - an axes and figure object.
What is the figure object?
A container which holds everything you see on the page.
What is the axes object?
The part of the page which holds the data.
What is the code to create an empty axes?
import matplotlib as plt
fig, ax = plt.subplots()
plt.show()
How do we add data to the axes?
ax.plot(data[“COL-1”], data[“COL-2”])
How do you add data to a figure?
Using the Axes object
What ways can you customise plots?
- Linestyle
- Point style (markers)
- Colours
- Add axes labels
- Add title label
Why are markers useful?
If a plot appears continuous, markers show us where the data exists and which parts are just lines connecting the data points.
How do you add a marker?
ax.plot(marker = “o”)
How do you edit the lifestyle?
ax.plot(linestyle=”–”)
How do you remove a line from the plot?
ax.plot(linestyle=”None”)
How do you edit the colour of the plotted data?
ax.plot(color=”r”)
How do you add axes labels?
ax.set_xlabel(“Text”)
ax.set_ylabel(“Text”)
What convention is used for capitalising the title?
Write it as you would a sentence - only first words and proper nouns are capitalised.
How do you add a plot title?
ax.set_title()
What are small multiples and what do they achieve?
Small multiples are used to plot several datasets side-by-side. They show similar data across different conditions.
They are a way to reduce the clutter/mess associated when too much data is plotted on one graph. This helps to view trends better.
What are small multiples called in matplotlib?
Subplots
What does calling subplots() with no inputs achieve?
Creates one subplot.
What does passing inputs to subplots() achieve?
subplots(X, Y)
Small multiples are arranged on the page as a grid with rows and columns - X rows and Y columns.
When creating small multiples, what can be said of the variable ax?
Ax is no longer only one aces object, it is now an array of axes object with shape X by Y.
How can you investigate the number of small multiples?
ax.shape
Shows the shape of the ax array.
When we have small multiples, how do we add data using ax.plt()?
We now need to index the axes object and call the plot method on an element of the array.
eg ax[0,0].plt()
If there is only one row or one column of plots, how do we call ax.plot?
The resulting array is one dimensional, so you only provide one index to access the elements of the array.
eg ax[0].plt and ax[1].plt
When adding x and y labels to small multiples, what should we consider?
Eg for 2 x 1, we only need to add one X axes label to the bottom plot.
How do we account for different axis ranges of plots?
plt.subplots(2, 1, sharey=True)
Want to make sure that the subplots have the same range of y axis values. We initialise the figure and its subplots with the keyword argument sharey=True
This improves comparison across datasets
If we import a pandas dataframe which represents a time series, what do we need to do?
Tell Pandas to Parse the date column as a date.
import pandas as pd
df = pd.read_csv(“file.csv”, parse_dates = [“date”], index_col = “date”)
Where date is the column name.
How do we access the index (now a date) of a dataframe?
df.index
Once the date is set as the index, how do we plot the time series data?
ax.plot(df.index, df[|”col”])
How do we plot a particular time period?
Slice the DataFrame using two strings to denote the start and end date.
sixties = df[“1960-01-01”:”1969-12-31”]
Use this dataframe in the plot.
Matplotlib will automatically change the axis ticks.
If we want to plot two variables with different scales on the same plot, what can we do?
Plot on the same subplot using two different y axis plots. Utilising the twins() method to create a twin of the axes. Share the same x axis, but the y axes are separate.
ax2 = ax.twinx()
How can we further highlight the different axis?
Giving each variable its own colour, and the y axis, labels and ticks are the same colour. Add colour to ax.plot(), ax.set_ylabel() and ax.tick_params()
ax.tick_params() clarified in next point
How do we set the colour of the axis ticks?
ax.tick_params(“y”, colors = “blue”)
- NB: s in colorS
- First argument takes either x or y
How can we prevent retyping out the same code to colour the data of a time series?
Make a function we can reuse.
def plot_timeseries(axes, x, y, color, xlabel, ylabel):
axes.plot(x, y, color = color)
axes.set_xlabel(xlabel)
axes.set_ylabel(ylabel, color = color)
axes.tick_params(“y”, colors = color)
Then call this function passing in the relevant variables.
How do we add an annotation to a visualisation?
Using a method of the axes object, annotate.
ax.annotate()
What are the minimum inputs for annotate()
ax.annotate(“Annotation text”, xy=(pd.Timestamp(“2015-10-06”), 1))
At the very least it takes the annotation text as input (string) and the xy coordinate we want to annotate.
If the x position to annotate is a time stamp, how do we define it?
Using the Pandas object
pd.Timestamp(“2015-10-06”)
How do we position the text in an appropriate place?
add argument xytext=(pd.Timestamp(“2008-10-06”), -0.2)
May need to experiment to get a good position
How do we connect an arrow between the annotation text and the annotated data?
Add the keyword arrowprops
arrowprops={}
Passing in an empty dictionary results in the default arrow.
How do we customise the arrow?