pandas lesson 3 Flashcards
How do I show only the first 10 rows (use slicing).
data[‘duration’][:10]
How do I check if column has NaN values? Do this for the column country in the data df.
data.country.isna()
Replace any NaN values inthe country column in the data df.
data.country.fillna(‘’)
Replace any NaN values in the column duration with the mean value.
data.duration = data.duration.fillna(data.duration.mean())
Get rid of any rows that have a missing value.
data.dropna()
Drop rows that have all na values
data.dropna(how=’all’)
Drop rows where there are 5 na values.
data.dropna(thresh=5)
Drop rows within a column that has na values in it. Do this for the column ‘title_year’.
data.dropna(subset=[‘title_year’])
Show columns that contain any na values. Filter out, don’t drop.
data[data.isna().any(axis=1)]
Drop columns that are all na values.
data.dropna(axis=1, how=’all’)
Drop columns that have any na values.
data.dropna(axis=1, how=’any’)
Save the results to a csv file
data.to_csv(r’C:\Users\User\Documents\CFG_DATA\Data_files\movie_metadata.csv’)
Read the csv file and ensure that duration is an integer.
data = pd.read_csv(r’C:\Users\User\Documents\CFG_DATA\Data_files\movie_metadata.csv’, dtype={‘duration’: int})
Read the csv file and ensure that actor_2_facebook_likes is a string.
data = pd.read_csv(r’C:\Users\User\Documents\CFG_DATA\Data_files\movie_metadata.csv’, dtype={‘actor_2_facebook_likes’: str})
Change all characters in column movie_title to capital letters.
data[‘movie_title’].str.upper()