Lesson 15 Statistics Flashcards
Import the packages for maths, stats and scipy
import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd
Create a list inserting a nan between 2.5 and 4
x = [8.0, 1, 2.5, 4, 28.0]
x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]
What are the three different ways of getting a nan value?
- float(‘nan’)
- math.nan
- np.nan
Create np.ndarray and pd.Series objects that correspond to x and x_with_nan from the following lists:
x = [8.0, 1, 2.5, 4, 28.0]
x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]
y, y_with_nan = np.array(x), np.array(x_with_nan)
z, z_with_nan = pd.Series(x), pd.Series(x_with_nan)
Find the mean using in in built python function.
mean_ = statistics.mean(x)
mean_
What is another function to calculate the mean?
mean_ = statistics.fmean(x)
mean_
What value will the mean return if there are nan values present?
nan
How do you calculate the mean with numpy
mean_ = np.mean(y)
mean_
Write the code to calculate the mean but ignore any Nan values.
np.nanmean(y_with_nan)
x = [8.0, 1, 2.5, 4, 28.0]
w = [0.1, 0.2, 0.3, 0.25, 0.15]
y, z, w = np.array(x), pd.Series(x), np.array(w)
wmean = np.average(y, weights=w)
print(wmean)
Calculate the weighted mean of NumPy array or Pandas series
wmean = np.average(z, weights=w)
wmean = np.average(z, weights=w)
Calculate the harmonic mean using statistics library
hmean = statistics.harmonic_mean(x)
What happens if you input the following for a harmonic mean:
nan value
0
negative number
nan
0
error
Calculate the geometric mean
gmean = statistics.geometric_mean(x)
What is the main difference btween the behaviour of the mean and median?
The main difference between the behavior of the mean and median is related to dataset outliers or extremes. The mean is heavily affected by outliers, but the median only depends on outliers either slightly or not at all.
x is [1, 2.5, 4, 8.0, 28.0]
Find the median of the list x
median_ = statistics.median(x)