Pandas revision from HW Flashcards
How many rows and columns does the data have?
titanic.shape
create a database titanic from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv
import pandas as pd
titanic = pd.read_csv(‘https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv’)
Show the datatypes of each series
titanic.dtypes
Count the number of survivors by gender
only_survivors = titanic[titanic[‘survived’]==1]
only_survivors.groupby(‘sex’)[‘survived’].count()
Show all the total missing values from each series
all_missing = titanic.isna()
all_missing
total_missing = all_missing.sum()
total_missing
Delete the rows which do not have information about the age of the person.
clean_titanic_df = titanic
delete any rows where there is no values for the age
clean_titanic_df = clean_titanic_df.dropna(subset=[‘age’])
Group the passenger by age group for every 10 years.
using np arrange to create bins with a step size of ten (a bin is a range of values in pandas that has been grouped together for processing)
import numpy as np
creating new series to include the age in ranges of 10 years
age_sort = clean_titanic_df.sort_values(‘age’)
bins = np.arange(0, 100, 10)
group_names = [‘0-10’,’11-20’,’21-30’,’31-40’, ‘41-50’,’51-60’,’61-70’,’71-80’, ‘81-90’]
age_sort[‘age_groups’] = pd.cut(age_sort[‘age’],bins, labels=group_names)
age_sort[[‘survived’,’sex’,’age_groups’,’age’]].tail(10)
How many passengers are male in each age category?
def count_male(x):
return (x == ‘male’).sum()
count_male_df = age_sort.groupby(‘age_groups’)[‘sex’].apply(count_male)
count_male_df
How many travelled in first class?
def first_class(x):
return (x == 1).sum()
first_class_df = age_sort.groupby(‘age_groups’)[‘pclass’].apply(first_class)
first_class_df