Lesson 15 Statistics Flashcards

Question 1

Q

Import the packages for maths, stats and scipy

Answer

A

import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

Question 2

Q

Create a list inserting a nan between 2.5 and 4

x = [8.0, 1, 2.5, 4, 28.0]

Answer

A

x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]

Question 3

Q

What are the three different ways of getting a nan value?

Answer

A

float(‘nan’)
math.nan
np.nan

Question 4

Q

Create np.ndarray and pd.Series objects that correspond to x and x_with_nan from the following lists:

x = [8.0, 1, 2.5, 4, 28.0]

x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]

Answer

A

y, y_with_nan = np.array(x), np.array(x_with_nan)
z, z_with_nan = pd.Series(x), pd.Series(x_with_nan)

Question 5

Q

Find the mean using in in built python function.

Answer

A

mean_ = statistics.mean(x)
mean_

Question 6

Q

What is another function to calculate the mean?

Answer

A

mean_ = statistics.fmean(x)
mean_

Question 7

Q

What value will the mean return if there are nan values present?

Question 8

Q

How do you calculate the mean with numpy

Answer

A

mean_ = np.mean(y)
mean_

Question 9

Q

Write the code to calculate the mean but ignore any Nan values.

Answer

A

np.nanmean(y_with_nan)

Question 10

Q

x = [8.0, 1, 2.5, 4, 28.0]
w = [0.1, 0.2, 0.3, 0.25, 0.15]

y, z, w = np.array(x), pd.Series(x), np.array(w)
wmean = np.average(y, weights=w)
print(wmean)

Calculate the weighted mean of NumPy array or Pandas series

wmean = np.average(z, weights=w)

Answer

A

wmean = np.average(z, weights=w)

Question 11

Q

Calculate the harmonic mean using statistics library

Answer

A

hmean = statistics.harmonic_mean(x)

Question 12

Q

What happens if you input the following for a harmonic mean:

nan value
0
negative number

Answer

A

nan
0
error

Question 13

Q

Calculate the geometric mean

Answer

A

gmean = statistics.geometric_mean(x)

Question 14

Q

What is the main difference btween the behaviour of the mean and median?

Answer

A

The main difference between the behavior of the mean and median is related to dataset outliers or extremes. The mean is heavily affected by outliers, but the median only depends on outliers either slightly or not at all.

Question 15

Q

x is [1, 2.5, 4, 8.0, 28.0]

Find the median of the list x

Answer

A

median_ = statistics.median(x)

Question 16

Q

x is [1, 2.5, 4, 8.0, 28.0]. Slice the list so you remove the 28.0 and find the median.

Answer

A

median_ = statistics.median(x[:-1])

Question 17

Q

If the number of elements is even there are two middle values: find the lower median value from this list:

x is [1, 2.5, 4, 8.0, 28.0]

Answer

A

statistics.median_low(x[:-1])

Question 18

Q

If the number of elements is even there are two middle values: find the higher median value from this list:

x is [1, 2.5, 4, 8.0, 28.0]

Answer

A

statistics.median_high(x[:-1])

Question 19

Q

Calculate the mode returning a single value.

Answer

A

mode_ = statistics.mode(u)

Question 20

Q

Calculate the mode returning all modes

Answer

A

mode_ = statistics.multimode(u)

Question 21

Q

Calculate the mode using the following series (finish the code):

u, v, w = pd.Series(u), pd.Series(v), pd.Series(

Answer

A

u, v, w = pd.Series(u), pd.Series(v), pd.Series([2, 2, math.nan])

Question 22

Q

Calculate the variance

Answer

A

var_ = statistics.variance(x)

Question 23

Q

Calculate the variance using NumPy

Answer

A

var_ = np.var(y, ddof=1)
OR
var_ = y.var(ddof=1)

Question 24

Q

Calculate the variance to include nans

Answer

A

np.nanvar(y_with_nan, ddof=1)

Question 25

Q

Calculate variance with pandas (it will automatically include nans).

Answer

A

z_with_nan.var(ddof=1)

Question 26

Q

Calculate the standard deviation

Answer

A

std_ = statistics.stdev(x)

Question 27

Q

Use numpy to calculate standard deviation

Answer

A

np.std(y, ddof=1)
OR

y.std(ddof=1)

Question 28

Q

Use this list to show the sample 25th and 75th percentiles.

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]

Answer

A

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]
statistics.quantiles(x, n=4, method=’inclusive’)

Question 29

Q

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]

y = np.array(x)

In a given array x, find the 5th percentile
FInd the 95th percentile

Answer

A

find 5th percentile
np.percentile(y, 5)

find 95th percentile
np.percentile(y, 95)

Question 30

Q

Find the percentil in an array with nan values

Answer

A

np.nanpercentile(y_with_nan, [25, 50, 75])

Question 31

Q

Make a cov matrix to show the correlation coefficients from the following arrays:

np.array([14.2, 16.4,15.2, 22.6, 17.2])
np.array([215,325, 332, 445, 408])

Answer

A

cov_matrix = np.corrcoef(np.array([14.2, 16.4,15.2, 22.6, 17.2]), np.array([215,325, 332, 445, 408]))

Brainscape's Knowledge GenomeTM

Lesson 15 Statistics Flashcards

Brainscape's Knowledge Genome^TM