Pandas lesson 2 Flashcards
Find out how many sheets are in the excel
len(all_sheets)
Create a variable that reads only the sheet name from the 2000s
selected_sheet_only_w_index = pd.read_excel(r’C:\Users\User\Documents\CFG_DATA\Data_files\movies.xls’, sheet_name = 1)
Create a variable that reads all sheets from the excel file.
all_sheets = pd.read_excel(r’C:\Users\User\Documents\CFG_DATA\Data_files\movies.xls’, sheet_name = None)
How do I find out the number of rows in excel tab 2010
selected_sheet_2010s = pd.read_excel(r’C:\Users\User\Documents\CFG_DATA\Data_files\movies.xls’, sheet_name = ‘2010s’)
len(selected_sheet_2010s)
can also do
len(all_sheets[‘2010s’])
Create a new column where you increase the movie budget by 10%. Do this for only the first sheet.
def inc_by_10_pc(num):
return num * 1.10
movies_df[‘incrsd_budget’] = movies_df[‘Budget’].apply(inc_by_10_pc)
movies_df[[‘Budget’, ‘incrsd_budget’]]
Create a new column where you decrease the IMDB score by 1. Do this for only the first sheet.
movies_df = first_sheet_only
#adding column - subtract 1 point from IMDB Score
def minus1(num):
return num-1
movies_df[‘Decrease IMDB Score’] = movies_df[‘IMDB Score’].apply(minus1)
movies_df[[‘Decrease IMDB Score’, ‘IMDB Score’]]
Group data by year and then calculate mean gross earnings for each year. Use the movies_df
movies_mean = movies_df.groupby(‘Year’)[‘Gross Earnings’].mean()
movies_mean
Count number of movies by each country.
count_country = movies_df.groupby(‘Country’)[‘Title’].count()
count_country
Create a data series and add an index
emojis = dict(
USA = ‘🇺🇸’,
UK = ‘🇬🇧’,
France = ‘🇫🇷’,
Canada = ‘🇨🇦’,
Australia = ‘🇦🇺’,
Germany = ‘🇩🇪’,
Italy = ‘🇮🇹’,
Japan = ‘🇯🇵’,
Spain = ‘🇪🇸’ )
emojis = dict(
USA = ‘🇺🇸’,
UK = ‘🇬🇧’,
France = ‘🇫🇷’,
Canada = ‘🇨🇦’,
Australia = ‘🇦🇺’,
Germany = ‘🇩🇪’,
Italy = ‘🇮🇹’,
Japan = ‘🇯🇵’,
Spain = ‘🇪🇸’
)
flag_df = pd.Series(emojis, name=’flag’).reset_index()
flag_df
Create a new variable, movies_by_country_df where you create a new DataFrame from the series movies_by_country. The column headings should be Title and Title counts.
movies_by_country_df = movies_by_country_count.to_frame().rename(columns={“Title”: “Title counts”})
Join the two Dataframes movies_by_country and flag_df (left - Country, right - index)
movies_by_country_df.merge(flag_df, left_on=’Country’, right_on=’index’)
Join two Dataframes movies_by_country and flag_df on one column (Country)
movies_by_country_df.merge(flag_df, on=’Country’)