Pandas revision from HW Flashcards

Question 1

Q

How many rows and columns does the data have?

Answer

A

titanic.shape

Question 2

Q

create a database titanic from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv

Answer

A

import pandas as pd
titanic = pd.read_csv(‘https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv’)

Question 3

Q

Show the datatypes of each series

Answer

A

titanic.dtypes

Question 4

Q

Count the number of survivors by gender

Answer

A

only_survivors = titanic[titanic[‘survived’]==1]
only_survivors.groupby(‘sex’)[‘survived’].count()

Question 5

Q

Show all the total missing values from each series

Answer

A

all_missing = titanic.isna()
all_missing
total_missing = all_missing.sum()
total_missing

Question 6

Q

Delete the rows which do not have information about the age of the person.

Answer

A

clean_titanic_df = titanic

delete any rows where there is no values for the age
clean_titanic_df = clean_titanic_df.dropna(subset=[‘age’])

Question 7

Q

Group the passenger by age group for every 10 years.

Answer

A

using np arrange to create bins with a step size of ten (a bin is a range of values in pandas that has been grouped together for processing)

import numpy as np

creating new series to include the age in ranges of 10 years
age_sort = clean_titanic_df.sort_values(‘age’)

bins = np.arange(0, 100, 10)
group_names = [‘0-10’,’11-20’,’21-30’,’31-40’, ‘41-50’,’51-60’,’61-70’,’71-80’, ‘81-90’]

age_sort[‘age_groups’] = pd.cut(age_sort[‘age’],bins, labels=group_names)
age_sort[[‘survived’,’sex’,’age_groups’,’age’]].tail(10)

Question 8

Q

How many passengers are male in each age category?

Answer

A

def count_male(x):
return (x == ‘male’).sum()

count_male_df = age_sort.groupby(‘age_groups’)[‘sex’].apply(count_male)
count_male_df

Question 9

Q

How many travelled in first class?

Answer

A

def first_class(x):
return (x == 1).sum()

first_class_df = age_sort.groupby(‘age_groups’)[‘pclass’].apply(first_class)
first_class_df

(9 cards)