pandas lesson 1 Flashcards

Question 1

Q

Read in excel file ‘movies’ and look at the first five columns from the sheet 1900s.

Answer

A

import pandas as pd
excel_df = pd.read_excel(‘C:/Users/User/Documents/CFG_DATA/Data_files/movies.xls’, sheet_name = ‘1900s’)
excel_df.head(5)

Question 2

Q

Show me a sample set of 5 rows from the excel file ‘movies’. Only select the following columns: [‘Title’, ‘Year’, ‘Duration’].

Answer

A

columns_to_select = [‘Title’, ‘Year’, ‘Duration’]
excel_df[columns_to_select].sample(n=5)

Question 3

Q

Filter the table to just the following columns and show the first 5 rows¶
[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]

Answer

A

test_df = excel_df[[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]]
test_df.head(5)

Question 4

Q

Check if the year 1920 is true or false

Answer

A

test_df[‘Year’] == 1920

Question 5

Q

Filter the test_df to rows where the IMDB Score is greater than 5

Answer

A

test_df[test_df[‘IMDB Score’] > 5]

Question 6

Q

Explain why two sets of square brackets are needed when filtering columns in pandas

Answer

A

The first set of square brackets ([]) contains a boolean expression that evaluates to a series of True or False values. The expression test_df[‘IMDB Score’] > 5 returns a series of True or False values depending on whether the ‘IMDB Score’ in each row is greater than 5 or not.

The second set of square brackets ([]) is used to select the rows of the dataframe where the boolean expression in the first set of square brackets evaluates to True. In other words, it selects all the rows where the ‘IMDB Score’ is greater than 5.

The reason why we need two sets of square brackets is because the first set returns a boolean series, which cannot be used to index a dataframe directly. However, by passing the boolean series inside another set of square brackets, we can use it to select only the rows of the dataframe where the condition is True.

Question 7

Q

Filter the test_df to films from the USA

Answer

A

test_df[test_df[‘Country’] == ‘USA’]

Question 8

Q

reset the test_df so that you only show the columns [‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]. Then create three conditions where c1 country is usa, c2 country is uk and c3 country is germany. create a test_df where countries are from usa or UK.

Answer

A

test_df = excel_df[[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]]

condition_1 = (test_df[‘Country’] == ‘USA’)
condition_2 = (test_df[‘Country’] == ‘UK’)
condition_3 = (test_df[‘Country’] == ‘Germany’)

test_df= condition_1 | condition_2

print(test_df)

Question 9

Q

Filter the test_df to films which were made in the 1920s

Answer

A

test_df[(test_df[‘Year’] >= 1920) & (test_df[‘Year’] <=1929)]

Question 10

Q

what function can i use to get a numeric sense of the data

Answer

A

test_df.describe()

Question 11

Q

How can I find how many rows and columns we have

Question 12

Q

How do I find the minimum, maximum, mean

Answer

A

.mean()
.min()
.max()

Question 13

Q

What code do you write if you want to specify that the CSV file that you are reading the file contains column headers.

Answer

A

header = 0

Question 14

Q

What does parse_dates=True mean?

Answer

A

The date column should be parsed as a date-time object, so it will not be treated as a string column.

pandas lesson 1 Flashcards

(14 cards)