Python & Plots Flashcards
Package
A collection of modules
Library
A collection of packages
Module
A bunch of related code saved in a file
Framework
A collection of modules and packages that contain the basic flow and architecture of an application 
Pandas
Open source python package used to manipulate and analyse tabular data. Built on numpy.
Scatter plots
Great for viewing unordered data points
inflation_unemploy.plot(kind='scatter', x='unemployment_rate', y='cpi')
sns.scatterplot(x = "age", y = "value", size = "mpg", data = valuation)
sns.jointplot(x = 'age', y = 'value', data = valuation)
Line plots
Great for viewing ordered data points
dow_bond.plot(kind='line', x='date',y=['close_dow', 'close_bond'], rot=90)
Bar Charts
Great for viewing categorical data
- Bar plots cannot display logarithms because they need to start at 0 and the log of 0 is undefined.
Horizontal Bar Plots
df.plot.barh(x=’val’,y=’lab’)
OR
sns.barplot(x=”val”, y=”lab”, data=df)
Histogram plot
Great for visualising the distribution of values in a data set.
The data is chunked into bins and the data falls into each bins.
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist ()
To draw multiple histograms
~~~
dogs[[“height_cm”, “weight_kg”]].hist()
~~~
Series
A one dimensional array, more than one make a data frame
Pandas LOC
Df.loc [string ]
Df.loc[row], [col]]
A single bracket gives you a series and a double bracket gives you a Df
Pandas iloc
Df.iloc[[1]]
Is used for integer-location based indexing.
print(df.iloc[:, 1:])
Box plot
Used to compare the distribution of continuous variables for each category 
- Answers questions about the spread of variables.
- In a box plot, sorting by the IQR makes it easier to answer questions about how much variation there was among the “typical” population.
Numpy Comparisons
* logical_and () * logical_or() * logical_not ()
np. logical_and (bmi > 21, bmi < 22)
Enumerate a list
fam = [1.73, 1.68, 1.71, 1.89] for index, height in enumerate(fam) : print("index " + str (index) + ": " + str (height))
Looping in dictionaries
world = { "afghanistan": 30.55, "albania":2.77, "algeria":39.21 } for key, value in world.items () : print (key + " - -- " + str (value))
Looping 2d arrays
import numpy as np np_height = np.array (l1.73, 1.68, 1.71, 1.89, 1.79]) np_weight = np.array ([65.4, 59.2, 63.6, 88.4, 68.71) meas = np.array ([p_height, np_weight]) for val in np.nditer (meas) : print(val)
Looping pandas df
import pandas as pd brics = pd.read_csv("brics.csv", index_col = 0) for lab, row in brics.iterrows: print (lab) print (row)
Pandas apply
- Can be used to add a new column and apply some logic to it, it’s more efficient than a loop
apply dfloop.py import pandas as pd brics = pd.read_csv ("brics.csv", index_col = 0) brics ["name_length"] = brics["country"].apply (Len) print(brics)
In panadas, what do the following functions do?
- .head()
- .info()
- .shape
- .describe()
- .head() returns the first few rows (the “head” of the DataFrame).
- .info() shows information on each of the columns, such as the data type and number of missing values.
- .shape returns the number of rows and columns of the DataFrame.
- .describe() calculates a few summary statistics for each column.
In pandas, what do the following functions do :
- .values
- .columns
- .index
- .values: A two-dimensional NumPy array of values.
- .columns: An index of columns: the column names.
- .index: An index for the rows: either row numbers or row names.
How do you drop duplicates in pandas?
unique_dogs = vet_visits.drop_duplicates (subset= ["name", "breed"])
How do you count values in a column in pandas?
unique_dogs ["breed"].value_counts () unique_dogs ["breed"].value_counts(sort=True) s.value_counts(normalize=True) = returns porportion of total s.value_counts(normalize=True).sort_index()
The normalize transforms the result into percentages.