Numpy Statistics Flashcards
What is the standard import procedure for numpy?
import numpy as np
How do you generate a list of 10000 random normalized data points, centered at “x” with standard deviation of “y”.
list_name = np.random.normal(x,y,10000)
How do you get the mean of a numpy list?
list_mean = np.mean(list_name)
How do you get the median of a numpy list?
list_median = np.median(list_name)
How do you generate a list of 10000 random integers, ranging from “x” to “y”?
list_name = np.random.randint(x,y,10000)
How do you get the mode of a numpy list (remember the import statement)?
from scipy import stats
list_mode = stats.mode(list_name)
How do you get the standard deviation of a numpy list?
list_std = list_name.std()
How do you get the variance of a numpy list?
list_var = list_name.var()
How do you generate a list of 10000 random uniform data points, ranging from “x” to “y”?
list_name = np.random.uniform(x,y,10000)
How do you generate an evenly-spaced list of numbers from x to y with a spacing of “gap”?
list_name = np.arange(-3, 3, 0.001)
How do you visualize the probability density function with a given list (include necessary import statements)?
from scipy.stats import norm
…
plt.plot(list_name, norm.pdf(list_name))
How do you visualize the exponential probability density function with a given list (include necessary import statements)?
from scipy.stats import expon
…
plt.plot(list_name, expon.pdf(list_name))
How do you visualize the binomial probability mass function with a given list (include necessary import statements)?
from scipy.stats import binom
import matplotlib.pyplot as plt
n, p = 10, 0.5
list_name = np.arange(0, 10, 0.001)
plt.plot(list_name, binom.pmf(list_name, n, p))
How do you visualize the Poisson probability mass function with a given list (include necessary import statements)?
from scipy.stats import poisson
import matplotlib.pyplot as plt
mu = 500
list_name = np.arange(400, 600, 0.5)
plt.plot(x, poisson.pmf(list_name, mu))
How do you calculate the nth percentile of a numpy list?
per = np.percentile(list_name,n)
What is a statistical moment?
A quantitative measure of the shape of a probability density function.
Give the first four moments that were discussed.
First moment: mean.
Second moment: variance.
Third moment: skew.
Fourth moment: kurtosis.
What is the scale of the skew value (what does it mean to have negative, zero, or positive skew)?
A longer tail to the left represents negative skew.
A longer tail to the right represents positive skew.
A perfectly normal model has zero skew.
What is kurtosis?
It represents the shape of the tail and peak.
What is the scale of the kurtosis value (what does it mean to have low or high kurtosis)?
A sharp peak represents high kurtosis.
Normal models have zero kurtosis.
How do you get the skew of a numpy list?
import scipy.stats as sp
sp.skew(list_name)
How do you get the kurtosis of a numpy list?
sp.kurtosis(list_name)
What is covariance?
The measure of how two variables vary in tandem with their means. A near-zero covariance implies a low correlation, while a large covariance implies a high correlation.
What is correlation?
The measure of a relationship between two variables.
How do you get the correlation of two numpy lists?
np.corrcoef(x_list, y_list)
This returns a 2x2 array with the correlation at (0,1) and (1,0).
How do you get the covariance of two numpy lists?
np.cov(x_list, y_list)
This returns a 2x2 array with the covariance at (0,1) and (1,0).
What is Bayes’ Theorem?
P(A|B) = P(A) * P(B|A) / P(B)
How do you get a linear regression from two variables?
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x_list,y_list)
How do you generate two lists of data that have varying linearity?
import numpy as np
from pylab import *
pageSpeeds = np.random.normal(3.0, 1.0, 1000) purchaseAmount = 100 - (pageSpeeds + np.random.normal(0, 0.1, 1000)) * 3
scatter(pageSpeeds, purchaseAmount)
When is a polynomial regression appropriate?
If the data is clearly non-linear.
How do you perform an nth degree polynomial regression?
import numpy as np
x = np.array(x_list)
y = np.array(y_list)
pn = np.poly1d(np.polyfit(x, y, n))
Does a high degree polynomial regression necessarily improve things? (Y/N)
No.
How is multivariate regression written?
y = A + B1var_1 + B2var_2 + …
How do you grab an Excel file at a specific link (don’t forget the import statement)?
import pandas as pd
df = pd.read_excel(‘link_name’)
How do you show a few lines from an Excel file “df”?
df.head()
How do you create a multivariate regression summary?
import statsmodels.api as sm
df['Model_ord'] = pd.Categorical(df.Model).codes X = df[['Mileage', 'Model_ord', 'Doors']] y = df[['Price']]
X1 = sm.add_constant(X) est = sm.OLS(y, X1).fit()
est.summary()
What is a multi-level model?
A model that contains a hierarchy of interdependent events.