Python Flashcards

1
Q

Import all libraries.

A

import numpy as np
import scipy as sp
import scipy.stats as stats
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Create a dictionary and turn it into a dataframe.

A

dict1 = { ‘Age’:[23,45,17,64,57,32],
‘Height’:[1.7,1.9,1.55,1.8,1.75,1.65],
‘Names’:[‘Aoife’,’Brian’,’Catherine’,’Daniel’,’Eamonn’,’Fiona’],
‘Weight’:[82,88,55,101,75,67]}
df = DataFrame(dict1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Use pandas to read a csv file.

A

df = pd.read_csv(‘file_name.csv’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does df.iloc[a, b] do?

A

Returns the ath row and bth column entry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does df.iloc[3:5, :2] do?

A

Returns the 3rd row to the 4th row and returns the 0 and 1 column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does .tail() and .head() do?

A

.tail() prints the last five rows and .head prints the first five rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can a summary description be created?

A

data.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the mean calculated for (a) a single column (b) all numeric columns? How would these calculations change to calculate standard deviation and maximum?

A

data[“ColumnName”].mean()
data.mean(numeric_only=True)

Replace mean with std or max.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the loc function used? How can the id of the maximum be found?

A

data.loc[id, ‘columnnamewithid]
data[‘columnnamewithmax’].idxmax()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Code to create a scatterplot?

A

plt.figure()
plt.scatter(data.Coursework,data.Exam)
plt.xlabel(‘Coursework grade’)
plt.ylabel(‘Exam grade’)
plt.axis([0,100,0,100])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Code to create a histogram?

A

plt.figure()
plt.hist(data.Coursework,bins=10,alpha=0.5,label=’Coursework’)
plt.hist(data.Exam,bins=10,alpha=0.5,label=’Exam’)
plt.xlabel(‘Grade’)
plt.ylabel(‘Number of students’)
plt.legend()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Code to create a boxplot?

A

plt.figure()
data.boxplot()
plt.ylabel(‘Grade’)
plt.ylim(0,100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to get the number of times each outcome in a row occurs sorted by the outcome and not the number of times?
How to seed the random value?
How to select a random value from a column?

A

dataFrameName.ColumnName.value_counts(sort=False)
np.random.seed(seedNum)
np.random.choice(dataFrameName.ColumnName)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the .unique() function do?

A

Print the unique values in a column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Simulate a binomial trial, where a coin is flipped ten times, use a sample size of 15.

A

stats.binom.rvs(n=10, p=0.5, size=15)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Calculate the probability of observing heads 40 times out of 60 on an unbiased coin?

A

stats.binom.pmf(40, n=60, p=0.5)

17
Q

Calculate the probability of observing 10 or less heads when an unbiased coin is flipped 30 times?

A

stats.binom.cdf(10, n=30, p=0.5)

18
Q

Simulate a hypergeometric trial. Calculate probability of hypergeometric trial.

A

stats.hypergeom.rvs(N,s,n,size=10420)
stats.hypergeom.pmf(x,N,s,n)

19
Q

Simulate a poisson trial with a rate of 100, and sample size 10. Calculate the probability of 60 occurring with the same rate. Calculate the probability of 60 or less occuring with the same rate.

A

stats.poisson.rvs(100, size=10)
stats.poisson.pmf(60, 100)
stats.poisson.cdf(60, 100)

20
Q

Code for normal distribution.

A

x = np.linspace(-5,5,100)
plt.figure()
plt.plot(x,stats.norm.pdf(x,loc=0,scale=1))
plt.axis([-5,5,0,0.41])
plt.xlabel(‘$x$’,fontsize=16)
plt.ylabel(‘$f(x)$’,fontsize=16)
plt.xticks(fontsize=14)
plt.yticks([0,0.1,0.2,0.3,0.4],fontsize=14)
plt.title(‘Standard Normal Distribution’,fontsize=16)

21
Q

Simulate a standard normal distribution. Calculate probability of less than 0.5 occuring on a standard normal distribution.

A

stats.norm.rvs(loc=0,scale=1,size=100)
stats.norm.cdf(0.5,loc=0,scale=1)

22
Q

Code for exponential graph.

A

x = np.linspace(0,15,100)
plt.figure()
plt.plot(x,stats.expon.pdf(x,scale=4))
plt.axis([0,15,0,0.25])
plt.xlabel(‘iPad lifetime (years)’,fontsize=16)
plt.ylabel(‘Probability density’,fontsize=16)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

23
Q

Simulate an exponential distribution with rate 4 and sample size 50. Calculate the probability of less than or equal to 5 occuring with the same rate.

A

stats.expon.rvs(scale=4,size=50)
stats.expon.cdf(5,scale=4)

24
Q

How do you sqrt n?

A

np.sqrt(n)

25
Q

How do you calcuate a sample std deviation as opposed to population s.d.?

A

np.std(X, ddof=1)

26
Q

Code to perform two-sided t-test?
Code to perform two-sample t-test?

A

t_statistic, p_value = stats.ttest_1samp(X,mu0)
stats.ttest_ind(data A, data B)

27
Q

Code to calculate correlation coefficent.

A

df.corr()

28
Q

How to print out summary data for linear regression?

A

import statsmodels.formula.api as smf
model = smf.ols(formula=’Y~X’,data=df).fit()
print(model.summary())

29
Q

What function is used to predict values?

A

model.predict({‘ColumnName’: Value})

30
Q

Code to plot data with fitted regression line.

A

xrange = np.linspace(140,200,100)
plt.plot(xrange,model.params.Intercept+model.params.Height*xrange)
plt.scatter(df.Height,df.Weight)
plt.xlabel(‘Height (cm)’)
plt.ylabel(‘Weight (kg)’)

30
Q

Code to plot residuals and check assumptions.

A

residuals = model.resid
plt.figure()
plt.hist(residuals)
plt.axis([-20,20,0,20])
plt.xlabel(‘Residuals’)
plt.ylabel(‘Frequency’)
plt.figure()
plt.scatter(df.Height,residuals)
plt.axis([142,195,-20,20])
plt.xlabel(‘Height’)
plt.ylabel(‘Residuals’

31
Q

Syntax for printing a variable within a string to 4 decimal places.

A

print(f”{variable:.4f}”)