Pandas revision from HW Flashcards

1
Q

How many rows and columns does the data have?

A

titanic.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

create a database titanic from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv

A

import pandas as pd
titanic = pd.read_csv(‘https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Show the datatypes of each series

A

titanic.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Count the number of survivors by gender

A

only_survivors = titanic[titanic[‘survived’]==1]
only_survivors.groupby(‘sex’)[‘survived’].count()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Show all the total missing values from each series

A

all_missing = titanic.isna()
all_missing
total_missing = all_missing.sum()
total_missing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Delete the rows which do not have information about the age of the person.

A

clean_titanic_df = titanic

delete any rows where there is no values for the age
clean_titanic_df = clean_titanic_df.dropna(subset=[‘age’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Group the passenger by age group for every 10 years.

A

using np arrange to create bins with a step size of ten (a bin is a range of values in pandas that has been grouped together for processing)

import numpy as np

creating new series to include the age in ranges of 10 years
age_sort = clean_titanic_df.sort_values(‘age’)

bins = np.arange(0, 100, 10)
group_names = [‘0-10’,’11-20’,’21-30’,’31-40’, ‘41-50’,’51-60’,’61-70’,’71-80’, ‘81-90’]

age_sort[‘age_groups’] = pd.cut(age_sort[‘age’],bins, labels=group_names)
age_sort[[‘survived’,’sex’,’age_groups’,’age’]].tail(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How many passengers are male in each age category?

A

def count_male(x):
return (x == ‘male’).sum()

count_male_df = age_sort.groupby(‘age_groups’)[‘sex’].apply(count_male)
count_male_df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many travelled in first class?

A

def first_class(x):
return (x == 1).sum()

first_class_df = age_sort.groupby(‘age_groups’)[‘pclass’].apply(first_class)
first_class_df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly