DATA STRUCTURES Flashcards

1
Q

How do you create an empty list?

A

empty_list_1 = []

or

empty_list_2 = list()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

phrase = [‘Astra’, ‘inclinant’, ‘sed’, ‘non’, ‘obligant’]

print(phrase[1])

A

inclinant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

phrase = [‘Astra’, ‘inclinant’, ‘sed’, ‘non’, ‘obligant’]

print(phrase[-1])

A

obligant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

phrase = [‘Astra’, ‘inclinant’, ‘sed’, ‘non’, ‘obligant’]

print(phrase[1:4])

A

[‘inclinant’, ‘sed’, ‘non’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

phrase = [‘Astra’, ‘inclinant’, ‘sed’, ‘non’, ‘obligant’]

print(phrase[:3])

print(phrase[3:])

A

[‘Astra’, ‘inclinant’, ‘sed’]

[‘non’, ‘obligant’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

my_list = [‘Macduff’, ‘Malcolm’, ‘Duncan’, ‘Banquo’]

my_list[2] = ‘Macbeth’

print(my_list)

A

[‘Macduff’, ‘Malcolm’, ‘Macbeth’, ‘Banquo’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

my_list = [‘Macduff’, ‘Malcolm’, ‘Macbeth’, ‘Banquo’]

my_list[1:3] = [1, 2, 3, 4]

print(my_list)

A

[‘Macduff’, 1, 2, 3, 4, ‘Banquo’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

num_list = [1, 2, 3]
char_list = [‘a’, ‘b’, ‘c’]

num_list + char_list

A

[1, 2, 3, ‘a’, ‘b’, ‘c’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

list_a = [‘a’, ‘b’, ‘c’]

list_a * 2

A

[‘a’, ‘b’, ‘c’, ‘a’, ‘b’, ‘c’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

num_list = [2, 4, 6]

print(5 in num_list)
print(5 not in num_list)

A

False
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

my_list = [0, 1, 1, 2, 3]
variable = 5

my_list.append(variable)

print(my_list)

A

[0, 1, 1, 2, 3, 5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

my_list = [‘a’, ‘b’, ‘d’]

my_list.insert(2, ‘c’)

print(my_list)

A

[‘a’, ‘b’, ‘c’, ‘d’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

my_list = [‘a’, ‘b’, ‘d’, ‘a’]

my_list.remove(‘a’)

print(my_list)

A

[‘b’, ‘d’, ‘a’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

my_list = [‘a’, ‘b’, ‘c’]

print(my_list.pop())

print(my_list)

A

c
[‘a’, ‘b’]

pop() removes and returns the last item in the list:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

my_list = [‘a’, ‘b’, ‘c’]

my_list.clear()

print(my_list)

A

[]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

my_list = [‘a’, ‘b’, ‘c’, ‘a’]

my_list.index(‘a’)

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

my_list = [‘a’, ‘b’, ‘c’, ‘a’]

my_list.count(‘a’)

A

2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

char_list = [‘b’, ‘c’, ‘a’]
num_list = [2, 3, 1]

char_list.sort()
num_list.sort(reverse=True)

print(char_list)
print(num_list)

A

[‘a’, ‘b’, ‘c’]
[3, 2, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

print(list(‘rocks’))

print(list((‘stones’, ‘water’, ‘underground’)))

A

[‘r’, ‘o’, ‘c’, ‘k’, ‘s’]
[‘stones’, ‘water’, ‘underground’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

num_list = [1, 2, 3]
num_list[0] = 5446

print(num_list)

A

[5446, 2, 3]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

test1 = (1)
test2 = (2,)

print(type(test1))
print(type(test2))

A

<class ‘int’>
<class ‘tuple’>

Note: When using parentheses to declare a tuple with just a single element, you must use a trailing comma.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

tuple1 = 1,
tuple2 = 2, 3

print(type(tuple1))
print(type(tuple2))

A

<class ‘tuple’>
<class ‘tuple’>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Common uses of tuples include:

A

Returning multiple values from a function

Packing and unpacking sequences: You can use tuples to assign multiple values in a single line of code.

Dictionary keys: Because tuples are immutable, they can be used as dictionary keys, whereas lists cannot. (You’ll learn more about dictionaries later.)

Data integrity: Due to their immutable nature, tuples are a more secure way of storing data because they safeguard against accidental changes.

Methods:

Because tuples are built for data security, Python has only two methods that can be used on them:

count() returns the number of times a specified value occurs in the tuple.

index() searches the tuple for a specified value and returns the index of the first occurrence of the value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

cities = [‘Paris’, ‘Lagos’, ‘Mumbai’]
countries = [‘France’, ‘Nigeria’, ‘India’]

places = zip(cities, countries)

print(places)
print(list(places))

A

<zip object at 0x7f6a4f94b8c8>

[(‘Paris’, ‘France’), (‘Lagos’, ‘Nigeria’), (‘Mumbai’, ‘India’)]

____________________________________
Notice that, in this case, the list() function is used to generate a list of tuples from the iterator object. Here are a few things to keep in mind when using the zip() function.

It works with two or more iterable objects. The given example zips two sequences, but the zip() function will accept more sequences and apply the same logic.

If the input objects are of unequal length, the resulting iterator will be the same length as the shortest input.

If you give it only one iterable object as an argument, the function will return an iterator that produces tuples containing only one element from that iterable at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
scientists = [('Nikola', 'Tesla'), ('Charles', 'Darwin'), ('Marie', 'Curie')] given_names, surnames = zip(*scientists) print(given_names) print(surnames)
('Nikola', 'Charles', 'Marie') ('Tesla', 'Darwin', 'Curie') _____________________________________ (UNZIPPING) You can also unzip an object with the * operator. Here’s the syntax: Note that this operation unpacks the tuples in the original list element-wise into two tuples, thus separating the data into different variables that can be manipulated further.
26
letters = ['a', 'b', 'c'] for index, letter in enumerate(letters): print(index, letter)
0 a 1 b 2 c _________________________________________________ The enumerate() function is another built-in Python function that allows you to iterate over a sequence while keeping track of each element’s index. Similar to zip(), it returns an iterator that produces pairs of indices and elements.
27
letters = ['a', 'b', 'c'] for index, letter in enumerate(letters, 2): print(index, letter)
2 a 3 b 4 c Note that the default starting index is zero, but you can assign it to whatever you want when you call the enumerate() function. In this case, the number two was passed as an argument to the function, and the first element of the resulting iterator had an index of two. The enumerate() function is useful when an element’s place in a sequence must be used to determine how the element should be handled in an operation.
28
List comprehension One of the most useful tools in Python is list comprehension. List comprehension is a concise and efficient way to create a new list based on the values in an existing iterable object. List comprehensions take the following form: my_list = [expression for element in iterable if condition] In this syntax: expression - refers to an operation or what you want to do with each element in the iterable sequence. element - is the variable name that you assign to represent each item in the iterable sequence. iterable - is the iterable sequence. condition - is any expression that evaluates to True or False. This element is optional and is used to filter elements of the iterable sequence.
numbers = [1, 2, 3, 4, 5] new_list = [x + 10 for x in numbers] print(new_list)
29
numbers = [1, 2, 3, 4, 5] new_list = [x + 10 for x in numbers] print(new_list)
[11, 12, 13, 14, 15] x + 10 is the expression, x is the element, and numbers is the iterable sequence. There is no condition.
30
words = ['Emotan', 'Amina', 'Ibeno', 'Sankwala'] new_list = [(word[0], word[-1]) for word in words if len(word) > 5] print(new_list)
[('E', 'n'), ('S', 'a')] ________________________________________ list comprehension extracts the first and last letter of each word as a tuple, but only if the word is more than five letters long.
31
zip(), enumerate(), and list comprehension
make code more efficient by reducing the need to rely on loops to process data and simplifying working with iterables. Understanding these common tools will save you time and make your process much more dynamic when manipulating data.
32
state_county_tuples = [('Arizona', 'Maricopa'), ('California', 'Alameda'), ('California', 'Sacramento'), ('Kentucky', 'Jefferson'), ('Louisiana', 'East Baton Rouge')] List comprehension to filter California counties ca_counties = [county for state, county in state_county_tuples if state == "California"] Print the result print(ca_counties)
['Alameda', 'Sacramento']
33
state_county_tuples = [('Arizona', 'Maricopa'), ('California', 'Alameda'), ('California', 'Sacramento'), ('Kentucky', 'Jefferson'), ('Louisiana', 'East Baton Rouge')] ca_counties = [] for (state, county) in state_county_tuples: if state == "California": ca_counties.append(county) print(ca_counties)
['Alameda', 'Sacramento']
34
Is this valid syntax? isthisvalid_dict = {'numbers': 1, 2, 3}
No. Each key can only correspond to a single value; so, for example, this will throw an error:
35
Is this valid syntax? isthisvalid_dict = {'numbers': [1, 2, 3]} print(isthisvalid_dict)
Yes If you enclose multiple values within another single data structure, you can create a valid dictionary. For example: {'numbers': [1, 2, 3]}
36
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } print(my_dict['nums'])
[1, 2, 3] To access a specific value in a dictionary, you must refer to its key using brackets:
37
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } print(my_dict.values())
dict_values([[1, 2, 3], ['a', 'b', 'c']]) To access all values in a dictionary, use the values() method:
38
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } Add a new 'floats' key my_dict['floats'] = [1.0, 2.0, 3.0] print(my_dict)
{'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'], 'floats': [1.0, 2.0, 3.0]} Dictionaries are mutable data structures in Python. You can add to and modify existing dictionaries. To add a new key to a dictionary, use brackets:
39
smallest_countries = {'Africa': 'Seychelles', 'Asia': 'Maldives', 'Europe': 'Vatican City', 'Oceania': 'Nauru', 'North America': 'St. Kitts and Nevis', 'South America': 'Suriname' } print('Africa' in smallest_countries) print('Asia' not in smallest_countries)
True False To check if a key exists in a dictionary, use the in keyword:
40
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } del my_dict['abc'] print(my_dict)
{'nums': [1, 2, 3]} To delete a key-value pair from a dictionary, use the del keyword:
41
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } print(my_dict.items())
dict_items([('nums', [1, 2, 3]), ('abc', ['a', 'b', 'c'])]) Dictionaries are a core Python class. As you’ve learned, classes package data with tools to work with it. Methods are functions that belong to a class. Dictionaries have a number of built-in methods that are very useful. Some of the most commonly used methods include:
42
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } print(my_dict.keys())
dict_keys(['nums', 'abc'])
43
my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } print(my_dict.values())
dict_values([[1, 2, 3], ['a', 'b', 'c']])
44
example_a = [1, 2, 2.0, '2'] set(example_a)
{1, 2, '2'} It's a set. Notice that, in the preceding example, 2 and 2.0 are evaluated as equivalent, even though one is an integer and the other is a float.
45
example_b = ('apple', (1, 2, 2, 2, 3), 2) set(example_b)
{'apple', 2, (1, 2, 2, 2, 3)} example, (1, 2, 2, 2, 3) is a tuple, which is hashable (≈ immutable) and thus treated as a distinct single element in the resulting set.
46
example_c = [1.5, {'a', 'b', 'c'}, 1.5] set(example_c)
Error on line 2: set(example_c) TypeError: unhashable type: 'set' The preceding example throws an error because each element of a set must be hashable (≈ immutable), but {‘a’, ‘b’, ‘c’} is a set, which is a mutable (unhashable) object.
47
example_d = {'mother', 'hamster', 'father'} example_d.add('elderberries') example_d
{'hamster', 'mother', 'father', 'elderberries'}
48
Return a new set with elements from the set and all others. The operator for this function is the pipe ( | ). set_1 = {'a', 'b', 'c'} set_2 = {'b', 'c', 'd'} print(set_1.union(set_2)) print(set_1 | set_2)
{'c', 'b', 'd', 'a'} {'c', 'b', 'd', 'a'} UNION
48
example_e = [1.5, frozenset(['a', 'b', 'c']), 1.5] set(example_e)
{1.5, frozenset({'a', 'c', 'b'})} Unlike example_c previously, this set does not throw an error. This is because it contains a frozenset, which is an immutable type and can therefore be used in sets.
49
Return a new set with elements common to the set and all others. The operator for this function is the ampersand (&). set_1 = {'a', 'b', 'c'} set_2 = {'b', 'c', 'd'} print(set_1.intersection(set_2)) print(set_1 & set_2)
{'b', 'c'} {'b', 'c'} INTERSECTION
50
Return a new set with elements in the set that are not in the others. The operator for this function is the subtraction operator ( - ). set_1 = {'a', 'b', 'c'} set_2 = {'b', 'c', 'd'} print(set_1.difference(set_2)) print(set_1 - set_2)
{'a'} {'a'} DIFFERENCE
51
Return a new set with elements in either the set or other, but not both. The operator for this function is the caret ( ^ ). set_1 = {'a', 'b', 'c'} set_2 = {'b', 'c', 'd'} print(set_1.symmetric_difference(set_2)) print(set_1 ^ set_2)
{'d', 'a'} {'d', 'a'} SYMMETRIC DIFFERENCE
52
How could you create a list of tuples called epa_tuples using the following lists?: state_list county_list aqi_list
epa_tuples = list(zip(state_list, county_list, aqi_list))
53
Fill in the blank: In Python, a dictionary’s _____ must be immutable.
keys
54
In Python, what does the items() method retrieve?
Both a dictionary’s keys and values
55
A data professional is working with two Python sets. What function can they use to combine the sets (i.e., find all of the distinct elements that exist in one or both sets)?
union()
56
How can you import numpy and then name it np?
import numpy as np
57
What are common aliases for: numpy pandas seaborn matplotlib.pyplot
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
58
import numpy as np To use the array() function on [2, 4, 6], you’d write:
np.array([2, 4, 6])
59
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) array_2d
[[1 2 3] [4 5 6]]
60
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) array_3d
[[[1 2] [3 4]] [[5 6] [7 8]]]
61
np.zeros((3, 2))
[[ 0. 0.] [ 0. 0.] [ 0. 0.]]
62
np.ones((2, 2))
[[ 1. 1.] [ 1. 1.]]
63
np.full((5, 3), 8)
[[ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.]] And this creates an array of a designated shape that is pre-filled with a specified value: These functions are useful for various situations: To initialize an array of a specific size and shape, then fill it with values derived from a calculation To allocate memory for later use To perform matrix operations
64
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.flatten()
[[1 2 3] [4 5 6]] [1 2 3 4 5 6]
65
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.reshape(3, 2)
[[1 2 3] [4 5 6]] [[1 2] [3 4] [5 6]] This gives a new shape to an array without changing its data.
66
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.reshape(3, -1)
[[1 2 3] [4 5 6]] [[1 2] [3 4] [5 6]] Adding a value of -1 in the designated new shape makes the process more efficient, as it indicates for NumPy to automatically infer the value based on other given values.
67
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.tolist()
[[1 2 3] [4 5 6]] [[1, 2, 3], [4, 5, 6]] This converts an array to a list object. Multidimensional arrays are converted to nested lists.
68
a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() print(a.max()) print(a.mean()) print(a.min()) print(a.std())
[[1 2 3] [4 5 6]] 6 3.5 1 1.70782512766 NumPy arrays also have many methods that are mathematical functions: ndarray.max() : returns the maximum value in the array or along a specified axis. ndarray.mean() : returns the mean of all the values in the array or along a specified axis. ndarray.min() : returns the minimum value in the array or along a specified axis. ndarray.std() : returns the standard deviation of all the values in the array or along a specified axis.
69
array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() print(array_2d.shape) print(array_2d.dtype) print(array_2d.size) print(array_2d.T)
[[1 2 3] [4 5 6]] (2, 3) int64 6 [[1 4] [2 5] [3 6]] NumPy arrays have several attributes that enable you to access information about the array. Some of the most commonly used attributes include the following: ndarray.shape : returns a tuple of the array’s dimensions. ndarray.dtype : returns the data type of the array’s contents. ndarray.size : returns the total number of elements in the array. ndarray.T : returns the array transposed (rows become columns, columns become rows).
70
a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() print(a[1]) print(a[0, 1]) print(a[1, 2])
[[1 2 3] [4 5 6]] [4 5 6] 2 6 Access individual elements of a NumPy array using indexing and slicing. Indexing in NumPy is similar to indexing in Python lists, except multiple indices can be used to access elements in multidimensional arrays.
71
a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() a[:, 1:]
[[1 2 3] [4 5 6]] [[2 3] [5 6]] Slicing may also be used to access subarrays of a NumPy array:
72
a = np.array([(1, 2, 3), (4, 5, 6)]) b = np.array([[1, 2, 3], [1, 2, 3]]) print('a:') print(a) print() print('b:') print(b) print() print('a + b:') print(a + b) print() print('a * b:') print(a * b)
a: [[1 2 3] [4 5 6]] b: [[1 2 3] [1 2 3]] a + b: [[2 4 6] [5 7 9]] a * b: [[ 1 4 9] [ 4 10 18]] NumPy arrays support many operations, including mathematical functions and arithmetic. These include array addition and multiplication, which performs element-wise arithmetic on arrays:
73
a = np.array([(1, 2), (3, 4)]) print(a) print() a[1][1] = 100 a
[[1 2] [3 4]] [[ 1 2] [ 3 100]] NumPy arrays are mutable, but with certain limitations. For instance, an existing element of an array can be changed:
74
a = np.array([1, 2, 3]) print(a) print() a[3] = 100 a
Arrays cannot be lengthened or shortened: Error on line 5: a[3] = 100 IndexError: index 3 is out of bounds for axis 0 with size 3
75
A _____________ is a two-dimensional labeled data structure—essentially a table or spreadsheet—where each column and row is represented by a Series.
DataFrame
76
A _______________ is a one-dimensional labeled array that can hold any data type. It’s similar to a column in a spreadsheet or a one-dimensional NumPy array. Each element in a series has an associated label called an index. The index allows for more efficient and intuitive data manipulation by making it easier to reference specific elements of your data.
Series
77
How do you import Pandas.
import pandas as pd
78
d = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=d) df
col1 col2 0 1 3 1 2 4 from a Dictionary
79
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c']) df2
a b c 0 1 2 3 1 4 5 6 2 7 8 9 from a NumPy
80
df3 = pd.read_csv('/file_path/file_name.csv')
Dataframe from a CSV
81
df = pd.DataFrame({ 'A': ['alpha', 'apple', 'arsenic', 'angel', 'android'], 'B': [1, 2, 3, 4, 5], 'C': ['coconut', 'curse', 'cassava', 'cuckoo', 'clarinet'], 'D': [6, 7, 8, 9, 10] }, index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4']) df
A B C D row_0 alpha 1 coconut 6 row_1 apple 2 curse 7 row_2 arsenic 3 cassava 8 row_3 angel 4 cuckoo 9 row_4 android 5 clarinet 10 loc[] lets you select rows by name. Here’s an example:
82
print(df.loc['row_1'])
A apple B 2 C curse D 7 Name: row_1, dtype: object The row index of the dataframe contains the names of the rows. Use loc[] to select rows by name:
83
print(df.loc[['row_1']])
A B C D row_1 apple 2 curse 7 Inserting just the row index name in selector brackets returns a Series object. Inserting the row index name as a list returns a DataFrame object:
84
print(df.loc[['row_2', 'row_4']])
A B C D row_2 arsenic 3 cassava 8 row_4 android 5 clarinet 10 To select multiple rows by name, use a list within selector brackets:
85
print(df.loc['row_0':'row_3'])
A B C D row_0 alpha 1 coconut 6 row_1 apple 2 curse 7 row_2 arsenic 3 cassava 8 row_3 angel 4 cuckoo 9 You can even specify a range of rows by named index:
86
print(df) print() print(df.iloc[1])
A B C D row_0 alpha 1 coconut 6 row_1 apple 2 curse 7 row_2 arsenic 3 cassava 8 row_3 angel 4 cuckoo 9 row_4 android 5 clarinet 10 A apple B 2 C curse D 7 Name: row_1, dtype: object iloc[] lets you select rows by numeric position, similar to how you would access elements of a list or an array. Here’s an example.
87
print(df.iloc[[1]])
A B C D row_1 apple 2 curse 7 Inserting just the row index number in selector brackets returns a Series object. Inserting the row index number as a list returns a DataFrame object:
88
print(df.iloc[[0, 2, 4]])
A B C D row_0 alpha 1 coconut 6 row_2 arsenic 3 cassava 8 row_4 android 5 clarinet 10 To select multiple rows by index number, use a list within selector brackets:
89
print(df.iloc[0:3])
A B C D row_0 alpha 1 coconut 6 row_1 apple 2 curse 7 row_2 arsenic 3 cassava 8 Specify a range of rows by index number:
90
print(df['C'])
row_0 coconut row_1 curse row_2 cassava row_3 cuckoo row_4 clarinet Name: C, dtype: object Column selection works the same way as row selection, but there are also some shortcuts to make the process easier. For example, to select an individual column, simply put it in selector brackets after the name of the dataframe:
91
print(df[['A', 'C']])
A C row_0 alpha coconut row_1 apple curse row_2 arsenic cassava row_3 angel cuckoo row_4 android clarinet And to select multiple columns, use a list in selector brackets:
92
print(df.A)
row_0 alpha row_1 apple row_2 arsenic row_3 angel row_4 android Name: A, dtype: object Dot notation It’s possible to select columns using dot notation instead of bracket notation. For example:
93
print(df) print() print(df.loc[:, ['B', 'D']])
A B C D row_0 alpha 1 coconut 6 row_1 apple 2 curse 7 row_2 arsenic 3 cassava 8 row_3 angel 4 cuckoo 9 row_4 android 5 clarinet 10 B D row_0 1 6 row_1 2 7 row_2 3 8 row_3 4 9 row_4 5 10 Note that when using loc[] to select columns, you must specify rows as well. In this example, all rows were selected using just a colon (:).
94
print(df.iloc[:, [1,3]])
B D row_0 1 6 row_1 2 7 row_2 3 8 row_3 4 9 row_4 5 10 Similarly, you can use iloc[] notation. Again, when using iloc[], you must specify rows, even if you want to select all rows:
95
print(df.loc['row_0':'row_2', ['A','C']])
A C row_0 alpha coconut row_1 apple curse row_2 arsenic cassava Both loc[] and iloc[] can be used to select specific rows and columns together.
96
print(df.iloc[[2, 4], 0:3])
A B C row_2 arsenic 3 cassava row_4 android 5 clarinet Again, notice that when using loc[] to select a range, the final element in the range is included in the results.
97
print(df.loc[0:3, ['D']])
Error on line 1: print(df.loc[0:3, ['D']]) Note that, when using rows with named indices, you cannot mix numeric and named notation. For example, the following code will throw an error:
98
This is most convenient for VIEWING: print(df.iloc[0:3][['D']]) But this is best practice/more stable for assignment/manipulation: print(df.loc[df.index[0:3], 'D'])
D row_0 6 row_1 7 row_2 8 row_0 6 row_1 7 row_2 8 Name: D, dtype: int64 To view rows [0:3] at column ‘D’ (if you don’t know the index number of column D), you’d have to use selector brackets after an iloc[] statement:
99
df = pd.DataFrame({ 'A': ['alpha', 'apple', 'arsenic', 'angel', 'android'], 'B': [1, 2, 3, 4, 5], 'C': ['coconut', 'curse', 'cassava', 'cuckoo', 'clarinet'], 'D': [6, 7, 8, 9, 10] }, ) df
A B C D 0 alpha 1 coconut 6 1 apple 2 curse 7 2 arsenic 3 cassava 8 3 angel 4 cuckoo 9 4 android 5 clarinet 10 However, in many (perhaps most) cases your rows will not have named indices, but rather numeric indices. In this case, you can mix numeric and named notation. For example, here’s the same dataset, but with numeric indices instead of named indices.
100
print(df.loc[0:3, ['D']])
D 0 6 1 7 2 8 3 9 Notice that the rows are enumerated now. Now, this code will execute without error:
101
________________________ is a filtering technique that overlays a Boolean grid onto a dataframe in order to select only the values in the dataframe that align with the True values of the grid.
Boolean masking
102
data = {'planet': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'], 'radius_km': [2440, 6052, 6371, 3390, 69911, 58232, 25362, 24622], 'moons': [0, 0, 1, 2, 80, 83, 27, 14] } df = pd.DataFrame(data) df
moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
103
print(df['moons'] < 20) moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
0 True 1 True 2 True 3 True 4 False 5 False 6 False 7 True Name: moons, dtype: bool
104
print(df[df['moons'] < 20]) moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 7 14 Neptune 24622
105
mask = df['moons'] < 20 df[mask] ____________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 7 14 Neptune 24622
106
mask = df['moons'] < 20 df2 = df[mask] df2 _______________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 7 14 Neptune 24622
107
mask = df['moons'] < 20 df.loc[mask, 'planet'] _______________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
0 Mercury 1 Venus 2 Earth 3 Mars 7 Neptune Name: planet, dtype: object
108
mask = (df['moons'] < 10) | (df['moons'] > 50) mask ______________________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
0 True 1 True 2 True 3 True 4 True 5 True 6 False 7 False Name: moons, dtype: bool
109
mask = (df['moons'] < 10) | (df['moons'] > 50) df[mask] ______________________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232
110
mask = (df['moons'] > 20) & ~(df['moons'] == 80) & ~(df['radius_km'] < 50000) df[mask] ___________________________________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 5 83 Saturn 58232
111
mask = (df['moons'] > 20) & (df['moons'] != 80) & (df['radius_km'] >= 50000) df[mask] ______________________________________________________ moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622
moons planet radius_km 5 83 Saturn 58232
112
clothes = pd.DataFrame({'type': ['pants', 'shirt', 'shirt', 'pants', 'shirt', 'pants'], 'color': ['red', 'blue', 'green', 'blue', 'green', 'red'], 'price_usd': [20, 35, 50, 40, 100, 75], 'mass_g': [125, 440, 680, 200, 395, 485]}) clothes
color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants
113
grouped = clothes.groupby('type') print(grouped) print(type(grouped))
114
grouped = clothes.groupby('type') grouped.mean() color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants
mass_g price_usd type pants 270.0 45.000000 shirt 505.0 61.666667
115
clothes.groupby(['type', 'color']).min()
mass_g price_usd type color pants blue 200 40 red 125 20 shirt blue 440 35 green 395 50
116
clothes.groupby(['type', 'color']).size()
type color pants blue 1 red 2 shirt blue 1 green 2 dtype: int64 To simply return the number of observations there are in each group, use the size() method. This will result in a Series object with the relevant information:
117
count(): The number of non-null values in each group sum(): The sum of values in each group mean(): The mean of values in each group median(): The median of values in each group min(): The minimum value in each group max(): The maximum value in each group std(): The standard deviation of values in each group var(): The variance of values in each group
count(): The number of non-null values in each group sum(): The sum of values in each group mean(): The mean of values in each group median(): The median of values in each group min(): The minimum value in each group max(): The maximum value in each group std(): The standard deviation of values in each group var(): The variance of values in each group
118
clothes[['price_usd', 'mass_g']].agg(['sum', 'mean']) _______________________________________ color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants
119
clothes.agg({'price_usd': 'sum', 'mass_g': ['mean', 'median'] }) _______________________________________ color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants
120
clothes[['price_usd', 'mass_g']].agg(['sum', 'mean'], axis=1) _______________________________________ color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants
121
clothes.groupby('color').agg({'price_usd': ['mean', 'max'], 'mass_g': ['mean', 'max']})
price_usd mass_g mean max mean max color blue 37.5 40 320.0 440 green 75.0 100 537.5 680 red 47.5 75 305.0 485 the items in clothes are grouped by color, then each of those groups has the mean() and max() functions applied to them at the price_usd and mass_g columns.
122
grouped = clothes.groupby(['color', 'type']).agg(['mean', 'min']) grouped
mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
123
grouped.loc[:, 'price_usd'] _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
mean min color type blue pants 40.0 40 shirt 35.0 35 green shirt 75.0 50 red pants 47.5 20
124
grouped.loc[:, ('price_usd', 'min')] _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
color type blue pants 40 shirt 35 green shirt 50 red pants 20 Name: (price_usd, min), dtype: int64
125
grouped.loc['blue', :] _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
mass_g price_usd mean min mean min type pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35
126
grouped.loc[('green', 'shirt'), :] _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
mass_g mean 537.5 min 395.0 price_usd mean 75.0 min 50.0 Name: (green, shirt), dtype: float64
127
grouped.loc[('blue', 'shirt'), ('mass_g', 'mean')] _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
440.0
128
clothes.groupby(['color', 'type'], as_index=False).mean() _____________________________________________________ mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20
color type mass_g price_usd 0 blue pants 200.0 40.0 1 blue shirt 440.0 35.0 2 green shirt 537.5 75.0 3 red pants 305.0 47.5
129
always returns the first 5 rows of the DataFrame, regardless of the order of the DataFrame.
.head()
130
What is a pandas method that groups rows of a dataframe together based on their values at one or more columns?
groupby()
131
Fill in the blank: In pandas, a _____ is a one-dimensional, labeled array.
series
132
A data professional wants to join two dataframes together. The dataframes contain identically formatted data that needs to be combined vertically. What pandas function can the data professional use to join the dataframes?
concat()
133
A data professional is working with a list named cities that contains data on global cities. What Python code can they use to add the string 'Mumbai' as the second element in the list?
cities.insert(1, ‘Mumbai’)
134
Fill in the blank: In Python, a dictionary’s keys must be _____.
immutable
135
A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve both the dictionary’s keys and values?
employees.items()
136
A data professional is working with two Python sets. What function can they use to find the elements present in one set, but not the other?
difference()
137
A data professional is working with a NumPy array that has three rows and two columns. They want to change the data into two rows and three columns. What method can they use to do so?
reshape() The reshape() method in NumPy is used to change the shape of an array without altering its data. In this case, if the array has three rows and two columns, the data professional can use reshape() to transform it into two rows and three columns.
138
A data professional is working with a pandas dataframe named sales that contains sales data for a retail website. They want to know the price of the most expensive item. What code can they use to calculate the maximum value of the Price column?
sales['Price'].max()
139
A data professional is working with a pandas dataframe. They want to select a subset of rows and columns by index. What method can they use to do so?
iloc[] iloc is used to select rows and columns based on their integer index positions, rather than their labels. You can specify the row and column indices to select the desired subset.
140
A data professional wants to merge two pandas dataframes. They want to join the data so all of the keys in the left dataframe are included—even if they are not in the right dataframe. What technique can they use to do so?
Left join