DA (Python) Flashcards
Mean (Pandas)
All Means
Mean for a column
df.mean()
df[‘a’].mean()
Mode (Pandas)
all Modes
Mode for a column
df.mode()
df[‘a’].mode()
Median (Pandas)
all Medians
Median for a column
df.median()
df[‘a’].median()
Standard Deviation (Pandas)
all Standard Deviations
Standard Deviation for a column
df.std()
df[‘a’].std()
Load data
import pandas as pd
df = pd.read_csv(‘SeaLevels.csv’)
Read data
df.head()
First 5 rows
dealing with duplicates
z = [1,2,3,1,4,5,1]
seen = set()
cz = [x for x in z if not (x in seen or seen.add(x))]
print (cz)
[1,2,3,4,5]
Slices
Ranges from a list
myList = [1,2,3,4,5]
print(myList[:3]) - up to third
print(myList[1:]) - from 1 onwards
print(myList[2:4]) - from 3 to 4
[1, 2, 3]
[2, 3, 4, 5]
[3, 4]
Creating a dataframe
N = [‘Jack’, ‘Jill’, ‘John’]
H = [180, 170, 200]
S = [9, 5, 8]
df = pd.DataFrame({‘N’: N, ‘H’: H, ‘S’: S})
print(df)
loop code
myList = [1,2,3,4,5,6]
sum = 0.0
for item in myList:
sum = sum + item
print(sum)
built in functions
print(np.max(z))
print(np.min(z))
print(np.sum(z))
print(np.mean(z))
print(np.median(z))
conditional selection
e = df[(df[‘N’] != ‘Jack’ ) & (df[‘H’] > 170)]
print(e)
function
def MULT_of_3(num):
return num % 3 == 0
mO3 = [num for num in multList if MULT_of_3(num)]
print(“Multiples of 3 in the list:”, mO3)
Drop nulls
completeRows = df.dropna()
- Select the 2nd and 3rd shoe size
- Select 1st and 2nd Name
- Find the mean height
- Find the max height
- Find the min shoe size
- Find the median shoe size
multList = [4,3,6,7,43,56,453,67,544,322,37,87,77,79,36,25,320]
print(multList[1:3])
print(multList[0:2])
print(np.mean(multList))
print(np.max(multList))
print(np.min(multList))
print(np.median(multList))