pandas lesson 1 Flashcards

1
Q

Read in excel file ‘movies’ and look at the first five columns from the sheet 1900s.

A

import pandas as pd
excel_df = pd.read_excel(‘C:/Users/User/Documents/CFG_DATA/Data_files/movies.xls’, sheet_name = ‘1900s’)
excel_df.head(5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Show me a sample set of 5 rows from the excel file ‘movies’. Only select the following columns: [‘Title’, ‘Year’, ‘Duration’].

A

columns_to_select = [‘Title’, ‘Year’, ‘Duration’]
excel_df[columns_to_select].sample(n=5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Filter the table to just the following columns and show the first 5 rows¶
[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]

A

test_df = excel_df[[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]]
test_df.head(5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Check if the year 1920 is true or false

A

test_df[‘Year’] == 1920

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Filter the test_df to rows where the IMDB Score is greater than 5

A

test_df[test_df[‘IMDB Score’] > 5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain why two sets of square brackets are needed when filtering columns in pandas

A

The first set of square brackets ([]) contains a boolean expression that evaluates to a series of True or False values. The expression test_df[‘IMDB Score’] > 5 returns a series of True or False values depending on whether the ‘IMDB Score’ in each row is greater than 5 or not.

The second set of square brackets ([]) is used to select the rows of the dataframe where the boolean expression in the first set of square brackets evaluates to True. In other words, it selects all the rows where the ‘IMDB Score’ is greater than 5.

The reason why we need two sets of square brackets is because the first set returns a boolean series, which cannot be used to index a dataframe directly. However, by passing the boolean series inside another set of square brackets, we can use it to select only the rows of the dataframe where the condition is True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Filter the test_df to films from the USA

A

test_df[test_df[‘Country’] == ‘USA’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

reset the test_df so that you only show the columns [‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]. Then create three conditions where c1 country is usa, c2 country is uk and c3 country is germany. create a test_df where countries are from usa or UK.

A

test_df = excel_df[[‘Title’, ‘Year’, ‘Genres’, ‘Language’, ‘Country’, ‘Content Rating’, ‘Budget’, ‘IMDB Score’]]

condition_1 = (test_df[‘Country’] == ‘USA’)
condition_2 = (test_df[‘Country’] == ‘UK’)
condition_3 = (test_df[‘Country’] == ‘Germany’)

test_df= condition_1 | condition_2

print(test_df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Filter the test_df to films which were made in the 1920s

A

test_df[(test_df[‘Year’] >= 1920) & (test_df[‘Year’] <=1929)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what function can i use to get a numeric sense of the data

A

test_df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can I find how many rows and columns we have

A

.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do I find the minimum, maximum, mean

A

.mean()
.min()
.max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What code do you write if you want to specify that the CSV file that you are reading the file contains column headers.

A

header = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does parse_dates=True mean?

A

The date column should be parsed as a date-time object, so it will not be treated as a string column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly