Basic Python Flashcards

1
Q

Write a lambda function that adds a column to the dataframe ‘shoes’ that returns Not Vegan if the shoe is made out of leather and Vegan if it is not.

A

df[‘vegan’] = df.shoe_type.apply(lambda x: ‘Not_Vegan’ if x == ‘Leather’ else ‘Vegan’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Use the columns ‘gender’ and ‘last_name’ from ‘df’ to make a new column called ‘salutation’ that returns ‘Dear Mr.’ and the last name if gender is male or ‘Dear Ms.’ and last name if female.

A

df[‘salutations’] = df.apply(lambda row: “Dear Mr. “ + row.last_name if row.gender == ‘male’ else “Dear Ms. “, axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Get the last name only from a column that has the format “Skye Long” called “name”

A

get_last_name = lambda x: x.split()[-1]

df[‘last_name’] = df.name.apply(get_last_name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rename individual, compared to all, columns in a df

A

df.rename(columns = {‘old’:’new, ‘old2’:’new2’}, inplace = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Rename the following columns in order:

name, last name, user

A

df.columns = [‘first_name’, ‘last_name’, ‘id’]

Only works if you have the same # of columns as in the original. You can also accidentally overwrite something. Better to use the df.rename(columns={old:new}, inplace = True).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What would this produce:

[1,2] + [3,4]

A

[1,2,3,4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the simple syntax to apply lambda to a row in a dataframe?

A

df.apply(lambda row: “what will be returned” row[‘name of row to act on’] , axis = 1)

Example:

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) if row.hours_worked > 40 else row.hourly_wage * row.hours_worked

df[‘total_earned’] = df.apply(total_earned, axis = 1)

Here the lambda is written seperatly, but it could be combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the syntax for the .apply attribute?

A

df.col.apply(func)

EXample:

get_last_name = lambda x: x.split()[-1]

df[‘last_name’] = df.name.apply(get_last_name)

df.apply(row func, axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nested List Comprehensions

List 1 = [0,1,2,3,4,5]

List2 = [‘a’,’b’,’c’,’d’,’e’,’f’]

I want the value of list 2 for every value of list 1 that is less than 3

A

new_list = [list2[i] for i in range (0, len(list2)) if list1[i] < 3]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Create a new list named double_nums by multiplying each number in nums by two

nums = [4, 8, 15, 16, 23, 42]

A

double_nums = [i * 2 for i in nums]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

create a list from 0 to 5

A

new_list = range(6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write this for loop as a list comp:

for x in l:

if x>=45: x+1

else: x+5

A

[x+1 if x >=45 else x + 5 for x in l]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write this as an if statement

[x+1 for x in l if x >= 45]

A

if x >= 45:

x+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What will this code produce?

nums = [4, 8, 15, 16, 23, 42]

parity = [0 if i%2 == 0 else 1 for i in nums]

print(parity)

A

[0, 0, 1, 0, 1, 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write this as a list comp:

nums2 = [4, 8, 15, 16, 23, 42]

parity2 = []

for i in nums2:

if i%2 == 0:

parity2.append(0)

else:

parity2.append(1)

A

parity = [0 if i%2 == 0 else 1 for i in nums]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If numbers are above 45 then add 1, if num <10 subtract 1, else add 5

l = [22, 13, 45, 50, 98, 69, 43, 44, 1]

A

for i in l:

if i >45:

l_2.append(i+1)

elif i

l_2.append(i-1)

else:

l_2.append(i+5)

print(l_2)

l_3 = [i+1 if i > 45 else (i-1 if i<10 else i+5) for i in l]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Create a new list named first_character that contains the first character from every name in the list names

names = [“Elaine”, “George”, “Jerry”, “Cosmo”]

A

names = [“Elaine”, “George”, “Jerry”, “Cosmo”]

first_character = [i[0] for i in names]

print(first_character)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Create a new list called greater_than_two, in which an entry at position i is True if the entry in nums at position i is greater than 2.

A

nums = [5, -10, 40, 20, 0]

greater_than_two = [True if i >2 else False for i in nums]

print(greater_than_two)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Create a new list named product that contains the product of each sub-list of nested_lists

A

product = [x1 * x2 for (x1, x2) in nested_lists]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Create a new list named greater_than that contains True if the first number in the sub-list is greater than the second number in the sub-list, and False otherwise.

A

nested_lists = [[4, 8], [16, 15], [23, 42]]

greater_than = [True if x1 > x2 else False for (x1, x2) in nested_lists]

https://colab.research.google.com/github/skyelong/Code-Academy/blob/master/Python/List_Comprehension_Code_Challenge.ipynb#scrollTo=f9aK6u58ROeq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Create a new list named first_only that contains the first element in each sub-list of nested_lists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Use list comprehension and the zip function to create a new list named sums that sums corresponding items in lists a and b. For example, the first item in the new list should be 5 from adding 1 and 4 together.

a = [1.0, 2.0, 3.0]

b = [4.0, 5.0, 6.0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You’ve been given two lists: a list of capitals and a list of countries. Create a new list named locations that contains the string “capital, country” for each item in the original lists. For example, if the 5th item in the capitals list is “Lima” and the 5th item in the countries list is “Peru”, then the 5th item in the new list should be “Lima, Peru”

A

capitals = [“Santiago”, “Paris”, “Copenhagen”]

countries = [“Chile”, “France”, “Denmark”]

locations = [x1 + “, “ + x2 for (x1,x2) in zip(capitals, countries)]

https://colab.research.google.com/github/skyelong/Code-Academy/blob/master/Python/List_Comprehension_Code_Challenge.ipynb#scrollTo=jOeCN_NHUIy7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

You’ve been given two lists: a list of names and a list of ages. Create a new list named users that contains the string “Name: name, Age: age” for each pair of elements in the original lists. For example, if the 5th item in the names list is “John”and the 5th item in ages is 42, then the 5th item in the new list should be”Name: John, Age: 42”.

As you did in the previous exercise, concatenate your strings together using +. Make sure to add proper capitalization and spaces.

A

names = [“Jon”, “Arya”, “Ned”]

ages = [14, 9, 35]

users = [“Name: “ + x1 + “, Age: “ + str(x2) for (x1,x2) in zip(names, ages)]

print(users)

https://colab.research.google.com/github/skyelong/Code-Academy/blob/master/Python/List_Comprehension_Code_Challenge.ipynb#scrollTo=6Ol4V5VjUva5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Create a new list named greater_than that contains True or False depending on whether the corresponding item in list a is greater than the one in list b. For example, if the 2nd item in list a is 3, and the 2nd item in list b is 5, the 2nd item in the new list should be False.

A

a = [30, 42, 10]

b = [15, 16, 17]

greater_than2= [True if x1 > x2 else False for (x1, x2) in zip(a,b)]

print(greater_than2)

https://colab.research.google.com/github/skyelong/Code-Academy/blob/master/Python/List_Comprehension_Code_Challenge.ipynb#scrollTo=qefmBpfVVkUh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Create a lambda function named contains_a that takes an input word and returns True if the input contains the letter ‘a’. Otherwise, return False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Create a lambda function named long_string that takes an input str and returns True if the string has over 12 characters in it. Otherwise, return False.

A

long_string = lambda x: True if len(x) > 12 else False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Create a lambda function named ends_in_a that takes an input str and returns True if the last character in the string is an a. Otherwise, return False.

A

ends_in_a = lambda x: True if x[-1] == “a” else False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Create a lambda function named add_random that takes an input named num. The function should return num plus a random integer number between 1 and 10 (inclusive).

A

add_random = lambda num: num + random.randint(1,10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

You run an online clothing store called Panda’s Wardrobe. You need a DataFrame containing information about your products.

Create a DataFrame with the following data that your inventory manager sent you:

Product ID Product Name Color

1 t-shirt blue

2 t-shirt green

3 skirt red

4 skirt black

A

df1 = pd.DataFrame({

‘Product ID’: [1, 2, 3, 4],

‘Product Name’: [‘t-shirt’, ‘t-shirt’, ‘skirt’, ‘skirt’],

‘Color’: [‘blue’, ‘green’, ‘red’, ‘black’]})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

from this dataframe select all row for clinic north and clinic south

A

clinic_north_south = df[[‘clinic_north’, ‘clinic_south’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Use iloc to return the third row from df

A

march = df.iloc[2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

select rows 1-5 with .iloc from ‘df’

A

df_1 = df.iloc[1:6]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

select rows from ‘df’ where the month is equal to ‘january’ and store it into a new series

A

january = df[df.month == ‘January’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

select all rows for both ‘march’ and ‘april’ from df and store them in a new df called march_april

A

march_april = df[(df.month == ‘March’) | (df.month == ‘April’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Use .isin to find rows containing ‘january’ and ‘march’ in the column month

A

january_february_march = df[df.month.isin([‘January’, ‘February’, ‘March’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

reset the index for a dataframe that you have subsettted. remove the old index

A

df2.reset_index(drop=True, inplace = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Create a new column that changes the names to lower case using the str.lower and .apply

df = pd.DataFrame([

[‘JOHN SMITH’, ‘john.smith@gmail.com’],

[‘Jane Doe’, ‘jdoe@yahoo.com’],

[‘joe schmo’, ‘joeschmo@hotmail.com’]

],

columns=[‘Name’, ‘Email’])

A

df[‘Lowercase Name’] = df.Name.apply(str.lower)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

subject = [“physics”, “calculus”, “poetry”, “history”]

append ‘computer science’ to this list

A

subject.append(“computer science”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

subject = [“physics”, “calculus”, “poetry”, “history”]

grades = [98, 97, 85, 88]

zip these two together

and add ‘visual arts’ and the grade ‘93’

A

gradebook = list(zip(subject, grades))

gradebook.append((“visual arts”, 93))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

inventory = [‘twin bed’, ‘twin bed’, ‘headboard’, ‘queen bed’, ‘king bed’, ‘dresser’, ‘dresser’, ‘table’, ‘table’, ‘nightstand’, ‘nightstand’, ‘king bed’, ‘king bed’, ‘twin bed’, ‘twin bed’, ‘sheets’, ‘sheets’, ‘pillow’, ‘pillow’]

find the len of this inventory

A

inventory_len = len(inventory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

inventory = [‘twin bed’, ‘twin bed’, ‘headboard’, ‘queen bed’, ‘king bed’, ‘dresser’, ‘dresser’, ‘table’, ‘table’, ‘nightstand’, ‘nightstand’, ‘king bed’, ‘king bed’, ‘twin bed’, ‘twin bed’, ‘sheets’, ‘sheets’, ‘pillow’, ‘pillow’]

return the third object

return the last object

return objects 1-4

A

third = inventory[2]

last = inventory[-1]

inventory1_4 = inventory[1:5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

inventory = [‘twin bed’, ‘twin bed’, ‘headboard’, ‘queen bed’, ‘king bed’, ‘dresser’, ‘dresser’, ‘table’, ‘table’, ‘nightstand’, ‘nightstand’, ‘king bed’, ‘king bed’, ‘twin bed’, ‘twin bed’, ‘sheets’, ‘sheets’, ‘pillow’, ‘pillow’]

return the number of twin beds in the inventory

A

twin_beds = inventory.count(‘twin bed’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Write a function named append_sum that has one parameter — a list named named lst.

The function should add the last two elements of lst together and append the result to lst. It should do this process three times and then return lst.

For example, if lst started as [1, 1, 2], the final result should be [1, 1, 2, 3, 5, 8].

A

def append_sum(lst):

for x in range(3):

lst.append(lst[-1] + lst[-2])

return lst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Write a function named larger_list that has two parameters named lst1 and lst2.

The function should return the last element of the list that contains more elements. If both lists are the same size, then return the last element of lst1.

A

def larger_list(lst1, lst2):

list1_len = len(lst1)

list2_len = len(lst2)

if list1_len > list2_len:

return lst1[-1]

elif list1_len < list2_len:

return lst2[-1]

elif list1_len == list2_len:

return lst1[-1]

46
Q

Create a function named more_than_n that has three parameters named lst, item, and n.

The function should return True if item appears in the list more than n times. The function should return False otherwise.

A

def more_than_n(lst,item,n):

if lst.count(item) > n :

return True

else:

return False

47
Q

Create a function called append_size that has one parameter named lst.

The function should append the size of lst (inclusive) to the end of lst. The function should then return this new list.

For example, if lst was [23, 42, 108], the function should return [23, 42, 108, 3] because the size of lst was originally 3.

A

def append_size(lst):

x = len(lst)

lst.append(x)

return lst

lst = [23,42,108]

append_size(lst)

48
Q

Create a function called every_three_nums that has one parameter named start.

The function should return a list of every third number between start and 100 (inclusive). For example, every_three_nums(91) should return the list [91, 94, 97, 100]. If start is greater than 100, the function should return an empty list.

A

def every_three_nums(start):

if start <= 100:

lst = list(range(start, 101, 3))

return lst

else:

lst = []

return lst

49
Q

Create a function named remove_middle which has three parameters named lst, start, and end.

The function should return a list where all elements in lst with an index between start and end (inclusive) have been removed.

For example, the following code should return [4, 23, 42] because elements at indices 1, 2, and 3 have been removed: remove_middle([4, 8 , 15, 16, 23, 42], 1, 3)

A

def remove_middle(lst,start,end):

new_list = lst[0:start] + lst[end+1:]

return new_list

50
Q

Create a function named more_frequent_item that has three parameters named lst, item1, and item2.

Return either item1 or item2 depending on which item appears more often in lst.

If the two items appear the same number of times, return item1.

A

def more_frequent_item (lst, item1, item2):

cnt_item1 = lst.count(item1)

cnt_item2 = lst.count(item2)

if cnt_item1 > cnt_item2:

return item1

elif cnt_item1 < cnt_item2:

return item2

elif cnt_item1 == cnt_item2:

return item1

print(more_frequent_item([2, 3], 2, 3))

51
Q

Create a function named double_index that has two parameters: a list named lst and a single number named index.

The function should return a new list where all elements are the same as in lst except for the element at index. The element at index should be double the value of the element at index of the original lst.

If index is not a valid index, the function should return the original list.

For example, the following code should return [1,2,6,4] because the element at index 2 has been doubled:

double_index([1, 2, 3, 4], 2)

A

def double_index(lst, index):

before = lst[:index]

after = lst[index+1:]

new = [lst[index] * 2]

new_list = before + new + after

return new_list

52
Q

Given a dataframe df, add a new column square which contains the square of each value in the points column for each row.

A

df[‘square’] = df.points.apply(lambda x: x**2)

53
Q

Select the rows from the column location that contain the information for Staten Island from the dataframe inventory and save them to staten_island.

A

staten_island = inventory[inventory.location == ‘Staten Island’]

54
Q

A customer just emailed you asking what products are sold at your Staten Island location. Select the column product_description from staten_island and save it to the variable product_request.

A

product_request = staten_island.product_description

55
Q

Another customer emails to ask what types of seeds are sold at the Brooklyn location.

Select all rows where location is equal to Brooklyn and product_type is equal to seeds and save them to the variable seed_request.

A

seed_request = inventory[(inventory.location == ‘Brooklyn’) | (inventory.product_type == ‘seeds’)]

56
Q

Add a column to inventory called in_stock which is True if quantity is greater than 0 and False if quantity equals 0.

A

inventory[‘in_stock’] = inventory.quantity.apply(lambda x: True if x > 0 else False)

57
Q

Petal Power wants to know how valuable their current inventory is.

Create a column called total_value that is equal to price multiplied by quantity.

A

total = lambda row: row.price * row.quantity

inventory[‘total_value’] = inventory.apply(total, axis = 1)

58
Q

The DataFrame customers contains the names and ages of all of your customers. You want to find the median age:

A

median_price = orders.price.median()

print(median_price)

59
Q

how many unique types of shoes from the df orders were bought? The name of the column is shoe_type

A

unique_type = orders.shoe_type.nunique()

print(unique_type)

60
Q

Print out all the unique types of shoes that are in ‘shoe_type’ in the dataframe orders

A

Print out all the unique types of shoes that are in ‘shoe_type’ in the dataframe orders

61
Q

Our finance department wants to know the price of the most expensive pair of shoes purchased. Save your answer to the variable most_expensive.

A

most_expensive = orders.price.max()

62
Q

Our fashion department wants to know how many different colors of shoes we are selling. Save your answer to the variable num_colors.

A

num_colors = orders.shoe_color.nunique()

num_colors

63
Q

Suppose we have a grade book with columns student, assignment_name, and grade. We want to get an average grade for each student across all assignments. We could do some sort of loop, but Pandas gives us a much easier option: the method .groupby. Use .groupby to get the average grade

A

grades = df.groupby(‘student’).grade.mean()

64
Q

This is the general syntax of .groupby

A

df.groupby(‘column1’).column2.measurement()

65
Q

In the previous exercise, our finance department wanted to know the most expensive shoe that we sold.

Now, they want to know the most expensive shoe for each shoe_type (i.e., the most expensive boot, the most expensive ballet flat, etc.).

Save your answer to the variable pricey_shoes.

A

pricey_shoes = orders.groupby(‘shoe_type’).price.max()

66
Q

Usually, we’d prefer that those indices were actually a column. In order to get that, we can use reset_index(). This will transform our Series into a DataFrame and move the indices into their own column.

Generally, you’ll always see a groupby statement followed by reset_index:

A

df.groupby(‘column1’).column2.measurement().reset_index()

67
Q

For example, suppose we have a DataFrame teas containing data on types of tea:

id tea category caffeine price

0 earl grey black 38 3 1

english breakfast black 41 3 2

irish breakfast black 37 2.5 3

jasmine green 23 4.5 4

matcha green 48 5 5

camomile herbal 0 3 …
We want to find the number of each category of tea we sell.

A

teas_counts = teas.groupby(‘category’).id.count().reset_index()

68
Q

use rename to rename the columns ‘id’ to ‘counts’

A

df = df.rename(columns = {‘id’ : ‘counts’})

69
Q

Modify the code that finds the most expensive shoe from each shoe type so that it ends with reset_index, which will change pricey_shoes into a DataFrame.

A

pricey_shoes = orders.groupby(‘shoe_type’).price.max().reset_index()

pricey_shoes

70
Q

we have a DataFrame of employee information called df that has the following columns:

id: the employee’s id number name: the employee’s name wage: the employee’s hourly wage category: the type of work that the employee does Our data might look something like this:

id name wage category

10131 Sarah Carney 39 product

14189 Heather Carey 17 design

15004 Gary Mercado 33 marketing

11204 Cora Copaz 27 design …

If we want to calculate the 75th percentile (i.e., the point at which 75% of employees have a lower wage and 25% have a higher wage) for each category, we can use the following combination of apply and a lambda function

A

high_earners = df.groupby(‘category’).wage.apply(lambda x: np.percentile(x,75)).reset_index()

71
Q

Let’s calculate the 25th percentile for shoe price for each shoe_color to help Marketing decide if we have enough cheap shoes on sale. Save the data to the variable cheap_shoes.

A

cheap_shoes = orders.groupby(‘shoe_color’).price.apply(lambda x: np.percentile(x, 25)).reset_index()

72
Q

Pivot Tables general syntax

A

df.pivot(columns=’ColumnToPivot’, index=’ColumnToBeRows’, values=’ColumnToBeValues’)

73
Q

pivot this table to make it easier to read, save as shoe_counts_pivot. We want tp lmpw tje number of orders for each color for each shoe type

shoe\_type shoe\_color id

0 ballet flats black 2

1 ballet flats brown 5

2 ballet flats red 3
A

shoe_counts.pivot(columns= ‘shoe_color’, index= ‘shoe_type’, values= ‘id’).reset_index()

74
Q

The column utm_source contains information about how users got to ShoeFly’s homepage. For instance, if utm_source = Facebook, then the user came to ShoeFly by clicking on an ad on Facebook.com.

Use a groupby statement to calculate how many visits came from each of the different sources. Save your answer to the variable click_source.

Remember to use reset_index()!

A

click_source = user_visits.groupby(‘utm_source’).id.count().reset_index()

75
Q

Our Marketing department thinks that the traffic to our site has been changing over the past few months. Use groupby to calculate the number of visits to our site from each utm_source for each month. Save your answer to the variable click_source_by_month.

A

click_source_by_month = user_visits.groupby([‘utm_source’, ‘month’]).id.count().reset_index()

76
Q

The head of Marketing is complaining that this table is hard to read. Use pivot to create a pivot table where the rows are utm_source and the columns are month. Save your results to the variable click_source_by_month_pivot.

A

click_source_by_month_pivot = click_source_by_month.pivot(index=’utm_source’, columns=’month’, values=’id’).reset_index()

77
Q

A movie review website employs several different critics. They store these critics’ movie ratings in a DataFrame called movie_ratings, which has three columns: critic, movie, and rating.

Write a command to find the average rating for each movie

A

movie_ratings.groupby(‘movie’).rating.mean().reset_index()

78
Q

The City Library has several branches throughout the area. They collect all of their book checkout data in a DataFrame called checkouts. The DataFrame contains the columns ‘location’, ‘date’, and ‘book_title’. If we want to compare the total number of books checked out at each branch, what code could we use?

A

checkouts.groupby(‘location’).book_title.count().reset_index()

79
Q

ad_clicks[‘is_click’] = ~ad_clicks\ .ad_click_timestamp.isnull()

What does ~ do in this operation?

A

The ~ is a NOT operator, and isnull() tests whether or not the value of ad_click_timestamp is null.

80
Q

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

A

ad_clicks[‘is_click’] = ~ad_clicks\ .ad_click_timestamp.isnull()

81
Q

What is the difference in outcome between these two codes:

crushing_it = sales_vs_targets[sales_vs_targets.revenue > sales_vs_targets.target]

crushing_it = sales_vs_targets.revenue > sales_vs_targets.target

A

The top will provide you with a df that contains only the rows where revenue is greater than targets

the bottom will provide you with a series of True/False for the conditions revenue>target

82
Q

What is a left merge?

A

. A Left Merge includes all rows from the first (left) table, but only rows from the second (right) table that match the first table.

83
Q

What is an inner merge

A

this will result in a table that has only the rows with matching values. Non matching values will be dropped

84
Q

What is an outer merge

A

this will result in a table with ‘NaN’ values for rows that do not match

85
Q

What is a Right Merge

A

Here, the merged table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table.

86
Q

What does pd.concat([df1,df2]) do?

A

It stacks the two dfs together. This is most useful when the two dataframes are chuncks of the same original df

87
Q

A veterinarian’s office stores all of their data on pets and their owners in two dataframes: pets and owners. The owners dataframe has the columns ‘id’, ‘first_name’, ‘last_name’ and ‘address’. The ‘pets’ dataframe has the columns id, name, owner_id, and type. If the office wanted to combine the two dataframes into one dataframe called pets_owners, what code could work?

A

pets_owners = pd.merge(pets, customers.rename(columns = {‘id’:’owner_id’}))

88
Q

A veterinarians office is run by two vets, Greg and Susan, and stores each of their appointment data in separate DataFrames, called greg_appointments and susan_appointments respectively. These DataFrames have the same columns. If the vet office wanted to combine the two DataFrames into a single DataFrame called appointments_all which of the following commands would they use?

A

appointments_all = pd.concat([greg_appointments, susan_appointments])

89
Q

What is the correct syntax for performing an outer merge on two Dataframes: df_one and df_two?

A

merged_df = pd.merge(df1, df2, how=’outer’)

90
Q

How would I select all the null values from a column of a dataframe?

A

null_df = df[df.column1.isnull()]

91
Q

basic syntax of matplot lib

A

x_values = [0, 1, 2, 3, 4] y_values = [0, 1, 4, 9, 16] plt.plot(x_values, y_values) plt.show()

92
Q

Specify a different color for a line in matplot lob

A

plt.plot(days, money_spent, color=’green’) plt.plot(days, money_spent_2, color=’#AAAAAA’)

93
Q

Which line of code will get the axes object of a plot and store it in a variable ax?

A

ax = plt.subplot()

94
Q

Which line of code will create a figure with a height of 7 inches and a width of 6 inches?

A

plt.figure(figsize=(6,7))

95
Q

What is the command to set a plot to display from x=-5 to x=5 and from y=0 to y=10?

A

plt.axis([-5,5,0,10])

96
Q

What is the command to label the x-axis with the label ‘Time’?

A

plt.xlabel(‘Time’)

97
Q

Which line of code will set the y-axis ticks to be at 0, 1, 2, 4, and 9?

A

ax.set_yticks([0,1,2,4,9])

98
Q

Which line of code will set the x-axis labels to be [“Monday”, “Tuesday”, “Wednesday”]?

A

ax.set_xticklabels([‘Monday’,’Tuesday’,’Wednesday’])

99
Q

What is the command to set the color of a line to be ‘green’?

A

plt.plot = (x,y,color=’green’)

100
Q

What is the command to set the linestyle of a line to be dashed?

A

plt.plot(x,y,linestyle=’–’

101
Q

What is the command to add a legend to a plot with the labels [‘Cats’, ‘Dogs’]?

A

plt.legend([‘cats’,’Dogs’])

102
Q

What is the command to create a figure with 3 rows and 2 columns, and a subplot in the second row and the first column?

A

plt.subplot(3,2,3)

103
Q

What is the command to set the horizontal spacing of subplots within a figure to 0.35?

A

plt.subplots_adjust(wspace=0.35)

104
Q

What is the result of adding autopct=’%d%%’ to a plt.pie function call?

A

pie chart will show the percentages of each slice to the nearest int

105
Q

What does it mean to normalize a histogram?

A

dividing the height of each column by a constant so the area under the curve sums to 1. maintains the relationship of the data, but allows you to compare data that has different distributions

106
Q

What is the command to stack a set of bars representing y2 on top of the set of bars representing y1?

A

plt.bar(range(len(y2)), y2, bottom=y1)

107
Q

In the following function call, what does the list [0, 2, 4, 6, 8] represent?

plt.fill_between(range(5), [0, 2, 4, 6, 8], [4, 6, 8, 10, 12], alpha=0.2)

A

lower bound y values

108
Q

What is the command to set x-axis ticks to be “Carbohydrates”, “Lipids”, “Protein”?

A

ax.set_xticklabels([’])

109
Q

What is a KDE plot

A
110
Q

extract a column from pandas df as a list

A

newlist = df.col.values

counts = cuisine_counts.name.values

111
Q
A