Python Flashcards
Import all libraries.
import numpy as np
import scipy as sp
import scipy.stats as stats
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
Create a dictionary and turn it into a dataframe.
dict1 = { ‘Age’:[23,45,17,64,57,32],
‘Height’:[1.7,1.9,1.55,1.8,1.75,1.65],
‘Names’:[‘Aoife’,’Brian’,’Catherine’,’Daniel’,’Eamonn’,’Fiona’],
‘Weight’:[82,88,55,101,75,67]}
df = DataFrame(dict1)
Use pandas to read a csv file.
df = pd.read_csv(‘file_name.csv’)
What does df.iloc[a, b] do?
Returns the ath row and bth column entry.
What does df.iloc[3:5, :2] do?
Returns the 3rd row to the 4th row and returns the 0 and 1 column.
What does .tail() and .head() do?
.tail() prints the last five rows and .head prints the first five rows.
How can a summary description be created?
data.describe()
How is the mean calculated for (a) a single column (b) all numeric columns? How would these calculations change to calculate standard deviation and maximum?
data[“ColumnName”].mean()
data.mean(numeric_only=True)
Replace mean with std or max.
How is the loc function used? How can the id of the maximum be found?
data.loc[id, ‘columnnamewithid]
data[‘columnnamewithmax’].idxmax()
Code to create a scatterplot?
plt.figure()
plt.scatter(data.Coursework,data.Exam)
plt.xlabel(‘Coursework grade’)
plt.ylabel(‘Exam grade’)
plt.axis([0,100,0,100])
Code to create a histogram?
plt.figure()
plt.hist(data.Coursework,bins=10,alpha=0.5,label=’Coursework’)
plt.hist(data.Exam,bins=10,alpha=0.5,label=’Exam’)
plt.xlabel(‘Grade’)
plt.ylabel(‘Number of students’)
plt.legend()
Code to create a boxplot?
plt.figure()
data.boxplot()
plt.ylabel(‘Grade’)
plt.ylim(0,100)
How to get the number of times each outcome in a row occurs sorted by the outcome and not the number of times?
How to seed the random value?
How to select a random value from a column?
dataFrameName.ColumnName.value_counts(sort=False)
np.random.seed(seedNum)
np.random.choice(dataFrameName.ColumnName)
What does the .unique() function do?
Print the unique values in a column.
Simulate a binomial trial, where a coin is flipped ten times, use a sample size of 15.
stats.binom.rvs(n=10, p=0.5, size=15)