Lesson 15 Statistics Flashcards

1
Q

Import the packages for maths, stats and scipy

A

import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Create a list inserting a nan between 2.5 and 4

x = [8.0, 1, 2.5, 4, 28.0]

A

x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three different ways of getting a nan value?

A
  • float(‘nan’)
  • math.nan
  • np.nan
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create np.ndarray and pd.Series objects that correspond to x and x_with_nan from the following lists:

x = [8.0, 1, 2.5, 4, 28.0]

x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]

A

y, y_with_nan = np.array(x), np.array(x_with_nan)
z, z_with_nan = pd.Series(x), pd.Series(x_with_nan)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Find the mean using in in built python function.

A

mean_ = statistics.mean(x)
mean_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is another function to calculate the mean?

A

mean_ = statistics.fmean(x)
mean_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What value will the mean return if there are nan values present?

A

nan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate the mean with numpy

A

mean_ = np.mean(y)
mean_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write the code to calculate the mean but ignore any Nan values.

A

np.nanmean(y_with_nan)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

x = [8.0, 1, 2.5, 4, 28.0]
w = [0.1, 0.2, 0.3, 0.25, 0.15]

y, z, w = np.array(x), pd.Series(x), np.array(w)
wmean = np.average(y, weights=w)
print(wmean)

Calculate the weighted mean of NumPy array or Pandas series

wmean = np.average(z, weights=w)

A

wmean = np.average(z, weights=w)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate the harmonic mean using statistics library

A

hmean = statistics.harmonic_mean(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens if you input the following for a harmonic mean:

nan value
0
negative number

A

nan
0
error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate the geometric mean

A

gmean = statistics.geometric_mean(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main difference btween the behaviour of the mean and median?

A

The main difference between the behavior of the mean and median is related to dataset outliers or extremes. The mean is heavily affected by outliers, but the median only depends on outliers either slightly or not at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

x is [1, 2.5, 4, 8.0, 28.0]

Find the median of the list x

A

median_ = statistics.median(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

x is [1, 2.5, 4, 8.0, 28.0]. Slice the list so you remove the 28.0 and find the median.

A

median_ = statistics.median(x[:-1])

17
Q

If the number of elements is even there are two middle values: find the lower median value from this list:

x is [1, 2.5, 4, 8.0, 28.0]

A

statistics.median_low(x[:-1])

18
Q

If the number of elements is even there are two middle values: find the higher median value from this list:

x is [1, 2.5, 4, 8.0, 28.0]

A

statistics.median_high(x[:-1])

19
Q

Calculate the mode returning a single value.

A

mode_ = statistics.mode(u)

20
Q

Calculate the mode returning all modes

A

mode_ = statistics.multimode(u)

21
Q

Calculate the mode using the following series (finish the code):

u, v, w = pd.Series(u), pd.Series(v), pd.Series(

A

u, v, w = pd.Series(u), pd.Series(v), pd.Series([2, 2, math.nan])

22
Q

Calculate the variance

A

var_ = statistics.variance(x)

23
Q

Calculate the variance using NumPy

A

var_ = np.var(y, ddof=1)
OR
var_ = y.var(ddof=1)

24
Q

Calculate the variance to include nans

A

np.nanvar(y_with_nan, ddof=1)

25
Q

Calculate variance with pandas (it will automatically include nans).

A

z_with_nan.var(ddof=1)

26
Q

Calculate the standard deviation

A

std_ = statistics.stdev(x)

27
Q

Use numpy to calculate standard deviation

A

np.std(y, ddof=1)
OR

y.std(ddof=1)

28
Q

Use this list to show the sample 25th and 75th percentiles.

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]

A

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]
statistics.quantiles(x, n=4, method=’inclusive’)

29
Q

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]

y = np.array(x)

In a given array x, find the 5th percentile
FInd the 95th percentile

A

find 5th percentile
np.percentile(y, 5)

find 95th percentile
np.percentile(y, 95)

30
Q

Find the percentil in an array with nan values

A

np.nanpercentile(y_with_nan, [25, 50, 75])

31
Q

Make a cov matrix to show the correlation coefficients from the following arrays:

np.array([14.2, 16.4,15.2, 22.6, 17.2])
np.array([215,325, 332, 445, 408])

A

cov_matrix = np.corrcoef(np.array([14.2, 16.4,15.2, 22.6, 17.2]), np.array([215,325, 332, 445, 408]))