2 Introduction to Python (II) Flashcards
Matplotlib, dictionaries, dataframes... https://colab.research.google.com/drive/1fKMFrRbIJQE8Tpa06us0qQPnamBn957z?usp=sharing
1 What is Matplotlib?
A plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI
2 Complete code:
import matplot… as …
import matplotlib.pyplot as plt
3 Make a line plot (year x-axis, pop y-axis)
year=[‘1975’,’1976’,’1977’]
pop=[2340,2405,2890]
import matplotlib.pyplot as plt
plt. plot(year,pop)
plt. show()
4 How to display a matplotlib plot?
plt.show()
5 Print the last item of the list year:
year=[‘1975’,’1976’,’1977’]
print(year[-1])
print(year[2])
6 What is a scatter plot?
A type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data
7 Complete code (scatter plot):
x = [1,3,5] y= [2,6,7]
’'’import mat….
…
plt.show()’’’
import matplotlib.pyplot as plt
plt. scatter(x,y)
plt. show()
8 Change the line plot below to a scatter plot
year=[‘1975’,’1976’,’1977’]
pop=[2340,2405,2890]
import matplotlib.pyplot as plt
plt. plot(year,pop)
plt. show()
plt. scatter(year,pop)
plt. show()
9 Put the x-axis on a logarithmic scale
day=[‘1’,’2’,’3’]
virus=[18,55,320]
import matplotlib.pyplot as plt
plt. scatter(day,virus)
plt. show()
plt. scatter(day,virus)
plt. xscale(‘log’)
plt. show()
10 What is a correlation coefficient?
A value that indicates the strength of the relationship between variables. The coefficient can take any values from -1 to 1.
11 What is a histogram?
An approximate representation of the distribution of numerical or categorical data
12 Create histogram
years = [1975,1976,1978,1975]
import matplotlib.pyplot as plt
plt. hist(years)
plt. show()
13 Create histogram with 5 bins using data (list)
data = [random.randint(1, 5) for _ in range(100)]
plt.hist(data,bins=5)
14 What is the use of plt.clf() ?
Cleans a plot up again so you can start afresh
15 You want to visually assess if the grades on your exam follow a particular distribution. Which plot do you use?
Histogram
16 You want to visually assess if longer answers on exam questions lead to higher grades. Which plot do you use?
Scatter plot
17 Add labels
year =list(range(1975,2000))
scores = list(range(1,26))
plt.scatter(year,scores)
…
plt. xlabel(‘year’)
plt. ylabel(‘scores’)
plt. show()
18 Add ‘scores’ as a title
data = [int(random.randint(1, 5)) for _ in range(100)]
plt.hist(data,bins=5)
…
plt.plot()
plt.title(‘years’)
19 Add log scale
year =list(range(1975,2000))
scores= [2**n for n in range(25)]
plt.scatter(year,scores)
…
plt. yscale(‘log’)
plt. show()
20 What are ticks in matplotlib?
Ticks are the values used to show specific points on the coordinate axis. It can be a number or a string.
21 What is a legend in matplotlib?
The legend of a graph reflects the data displayed in the graph’s Y-axis
22 Change the ticks in the x-axis to strings
x=[1, 3, 5]
y=[1, 5, 9]
import matplotlib.pyplot as plt
plt.scatter(x,y)
plt. xticks(x, [“one”,”three”,”five”])
plt. show()
23 Write a scatter plot with gdp as independent variable and population size as the size argument
gdp=[100, 200, 300]
life_exp=[50, 70, 82]
pop_size=[30,20,40]
import matplotlib.pyplot as plt
plt. scatter(gdp, life_exp, s =pop_size)
plt. show()
24 What is a dependent variable?
A variable (often denoted by y ) whose value depends on that of another.
25 What is an independent variable?
A variable (often denoted by x ) whose variation does not depend on that of another.
26 Code: Scatter plot with text ‘A’ pointing at the second element
gdp=[100, 200, 300]
life_exp=[50, 70, 82]
import matplotlib.pyplot as plt
plt. scatter(gdp, life_exp)
plt. text(195,65,’A’)
plt. show()
27 Add a grid to a matplot figure
plt.grid(True)
28 Get the position of germany
countries = [‘spain’, ‘france’, ‘germany’, ‘norway’]
countries.index(‘germany’)
29 What is the difference between list and dictionary in Python?
A list is an ordered sequence of objects, whereas dictionaries are unordered sets. But the main difference is that items in dictionaries are accessed via keys and not via their position.
30 Get the keys
europe = {‘spain’:’madrid’, ‘france’:’paris’, ‘germany’:’berlin’, ‘norway’:’oslo’ }
Outcome:
dict_keys([‘spain’, ‘france’, ‘germany’, ‘norway’])
print(europe.keys())
31 Get the capital of norway
europe = {‘spain’:’madrid’, ‘france’:’paris’, ‘germany’:’berlin’, ‘norway’:’oslo’ }
Outcome: oslo
print(europe[‘norway’])
32 Add italy and rome to the dictionary
europe = {‘spain’:’madrid’, ‘france’:’paris’,
‘germany’:’berlin’ }
europe[‘italy’]=’rome’
33 Check whether the dictionary has spain
europe = {‘spain’:’madrid’, ‘france’:’paris’,
‘germany’:’berlin’ }
print(‘spain’ in europe)
34 Outcome of:
europe = {‘spain’:’madrid’, ‘france’:’paris’, ‘germany’:’berlin’, ‘norway’:’oslo’ }
print(‘madrid’ in europe)
FALSE
35 Delete spain
europe = {‘spain’:’madrid’, ‘france’:’paris’,
‘norway’:’oslo’}
del(europe[‘spain’])
36 Update the capital of spain with madrid
europe = {‘spain’:’Barcelona’, ‘france’:’paris’,
‘norway’:’oslo’}
europe[‘spain’]=’madrid’
37 Get the capital of france
europe = { ‘spain’:
{ ‘capital’:’madrid’, ‘population’:46.77 },
‘france’: { ‘capital’:’paris’, ‘population’:66.03 }}
print(europe[‘france’][‘capital’])
38 Complete Code
dr =[False, False, True] names = ['Spain','France','UK'] ... ... #Outcome: country drives_right 0 Spain False 1 France False 2 UK True
import pandas as pd
my_dict={‘country’:names, ‘drives_right’:dr}
print(pd.DataFrame(my_dict))
39 Use row_labels as index of the dataframe
ages = [i for i in range(3)] df_ages = pd.DataFrame(ages, columns = ['Ages']) names = ['Jon','Jorge','Ana']
Ages
Jon 0
Jorge 1
Ana 2
df_ages.index = names
print(df_ages)
40 Transform the csv to a dataframe called cars
cars.csv
import pandas as pd
cars = pd.read_csv(‘cars.csv’)
41 Set the first column as row labels
import pandas as pd
cars = pd.read_csv(‘cars.csv’,..(code)..)
cars = pd.read_csv(‘cars.csv’, index_col = 0)
42 What is a panda series?
A one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). Pandas Series is nothing but a column in an excel sheet.
43 Print the column country of df as Panda Series
countries = [‘Spain’,’France’,’UK’]
df =pd.DataFrame(countries, columns = [‘country’])
0 Spain
1 France
2 UK
Name: country, dtype: object
print(df[[‘country’]])
44 Print the column country of df as dataframe
countries = [‘Spain’,’France’,’UK’]
df =pd.DataFrame(countries, columns = [‘country’])
#Outcome: country 0 Spain 1 France 2 UK
print(df[[‘country’]])
45 Print out columns a, b from df
print(df[[‘a’,’b’]])
46 Print out first 2 observations (2 methods)
import pandas as pd
n = [i for i in range(3)]
df =pd.DataFrame(n, columns = [‘number’])
Outcome
print(df[:2])
print(df.head(2))
number
0 0
1 1
47 Print out the fourth, fifth and sixth observation
import pandas as pd
n = [i for i in range(0,20,2)]
df =pd.DataFrame(n, columns = [‘number’])
print(df.iloc[3:6])
48 What is loc in python?
A method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame
49 What is a DataFrame in Python?
is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)
50 Use iloc to get jon’s row as dataframe
name age
0 nick 15
1 jon 18
Outcome:
df.iloc[1,]
name jon
age 18
Name: 1, dtype: object
51 Use iloc to get nick value
name age
0 nick 15
1 jon 18
#Outcome: nick
print(df.iloc[0,0])
52 Use loc to get nick’s row as dataframe
name age
rank_1 nick 15
rank_2 jon 18
Outcome:
print(df.loc[[‘rank_1’]])
name age
rank_1 nick 15
53 Output of:
dict ={'name': ['nick','jon'], 'age':[15,18]} index_rows = ['rank_1','rank_2'] df = pd.DataFrame(dict) df.index = index_rows
df.loc[‘rank_2’]
name jon
age 18
Name: rank_2, dtype: object
54 Use loc to get jon’s age:
name age
rank_1 nick 15
rank_2 jon 18
df.loc[‘rank_2’,’age’]
55 Use iloc to get age column as a dataframe
name age
rank_1 nick 15
rank_2 jon 18
df.iloc[:,[1]]
56 Outcome of:
print(True == False)
FALSE
57 Outcome of:
print(- 1!= 75)
TRUE
58 Outcome of:
print(True == 1)
TRUE
59 Outcome of:
print(True == 0)
FALSE
60 Outcome of:
x = -3 * 6
print(x>=-10)
FALSE
61 Complete code:
import numpy as np
my_house = np.array([18.0, 20.0, 10.75])
…
#Outcome: [ True True False]
There are many possible answer
#Answer: print(my_house>11)
62 List out and name comparison operators
Equal: 2 == 2 True Not equal: 2 != 2 False Greater than: 2 > 3 False Less than: 2 < 3 True Greater than or equal to: 2 >= 3 True Less than or equal to: 2 <= 3 True
63 Outcome of:
a,b =[2,3]
a > b and a < b
FALSE
64 Outcome of:
a,b =[2,3]
a > b or a < b
TRUE
65 Outcome of:
a,b =[2,3]
not(a < 3)
FALSE
66 List out the three Numpy Boolean operators
np. logical_and()
np. logical_or()
np. logical_not()
67 Use a numpy boolean
my_house = np.array([18.0, 20.0, 10.75])
print(np.logical_and(my_house>18, my_house<21))
68 What is flow control statement in python
Order in which the program’s code executes. The control flow of a Python program is regulated by conditional statements, loops, and function calls.
69 Outcome of:
for i in range(4): if(i <2) : print("small") elif(i ==2 ) : print("medium") else : print("large")
small
small
medium
large
70 Complete code:
house=[2,4,6] ...house: ...(i <4) : print("small") ...(i ==4 ) : print("medium") else : print("large")
small
medium
large
house=[2,4,6] for i in house: if(i <4) : print("small") elif(i ==4 ) : print("medium") else : print("large")
Outcome:
#71 Filtering in pandas #Complete code
name age
0 nick 15
1 jon 18
filter_= …
selection= df[filter_]
print(selection)
name age
0 nick 15
filter_ = df[‘name’] == ‘nick’
selection =df[filter_]
print(selection)
#Filtering in pandas #Complete code
Name Country
rank1 Tom Spain
rank2 Jack USA
…[df……]
#Outcome: Name Country rank1 Tom Spain
df[df[‘Country’]==’Spain’]
72 Complete code using np boolean and
data = [['tom', 10], ['nick', 15], ['juli', 14]] df = pd.DataFrame(data, columns = ['Name', 'Age'])
age = …
between = np…(…>10,..<15)
df[]
Name Age
2 juli 14
age = df[‘Age’]
between = np.logical_and(age>10,age<15)
df[between]
73 Complete code
x = 1
…x < 4 :
print(x)
x = x…
1
2
3
x = 1
while x < 4 :
print(x)
x = x + 1
74 Outcome of:
offset=4
while offset !=0:
offset=offset-1
print(‘correcting…’)
print(offset)
correcting... 3 correcting... 2 correcting... 1 correcting... 0
75 Loop over areas and print each element
areas = [11.25, 18.0, 20.0, 10.75, 9.50]
for area in areas :
print(area)
76 Loop and enumerate
areas = [11.25, 18.0, 20.0]
1-11.25
2-18.0
3-20.0
for index, area in enumerate(areas,1) :
print( str(index)+ “-“ + str(area))
77 Loop and use enumerate
house = [[“hallway”, 11.25],
[“kitchen”, 18.0],
[“living room”, 20.0]]
hallway-11.25
kitchen-18.0
living room-20.0
for x in house :
print( str(x[0]) + “-“ + str(x[1]) )
Outcome:
#78 Loop over dictionary #Complete code
world = { “afghanistan”:30.55,
“albania”:2.77,
“algeria”:39.21 }
for …in world …() :
…(key + “ – “ + str(value))
afghanistan – 30.55
albania – 2.77
algeria – 39.21
for key, value in world.items() :
print(key + “ – “ + str(value))
79 Outcome of:
import numpy as np x = [i for i in range(1,8,2)] np_x=np.array(x) for i in np_x: print(i**2)
1
9
25
49
#80 Loop over DataFrame (two ways)
name age
rank_1 nick 15
rank_2 jon 18
#Output: rank_1 15 rank_2 18
for ind,col in df.iterrows():
print(ind)
print(col[1])
81 Build this dataframe:
Name Country
rank1 Tom Spain
rank2 Jack USA
import pandas as pd
data = {'Name':['Tom', 'Jack'],'Country':['Spain','USA']} df = pd.DataFrame(data, index =['rank1', 'rank2'])
82 Loop over the dataframe and create a column with the length of them names
Name Country
0 Tom Spain
1 Jack USA
for lab, row in df.iterrows() :
df.loc[lab, “name_length”] = len(row[“Name”])
Outcome:
Name Country name_length
0 Tom Spain 3.0
1 Jack USA 4.0
83 How does work .seed() method?
Seeding a pseudo-random number generator gives it its first “previous” value. Each seed value will correspond to a sequence of generated values for a given random number generator.
84 Generate the same random number twice
import numpy as np
np.random.seed(123) #any number
print(np.random.rand())
np.random.seed(123)
print(np.random.rand())
85 Use randint() to simulate the throw of a dice
print(np.random.randint(1,7))
86 Use control flow and random numbers to simulate a simple walk with a dice:
Instructions:
np.random.seed(124)
1 or 2 is a step back
3 or 4 no step
5 or 6 step forward
dice: 5
step: 1
import numpy as np np.random.seed(124) step = 0 dice=np.random.randint(1,7) if dice <= 2 : step = step - 1 elif dice>4 : step=step+1 else: step = step
print(‘dice:’,dice)
print(‘step:’,step)
Outcome:
#87 Simulate a random walk with a dice: #How many meters did the ‘person’ advance:
Instructions:
np.random.seed(124)
1 or 2 is a step back
3 or 4 no step
5 or 6 step forward
steps_walked: 10
meters_forward: 3
np.random.seed(124)
random_walk=[0] step = 0 for i in range(10): dice=np.random.randint(1,7) if dice <= 2 : step = step - 1 elif dice>4 : step=step+1 else: step = step random_walk.append(random_walk[-1]+step)
meters_forward = random_walk[-1] steps_walked = len(random_walk)-1 #First step is 0
print(‘steps_walked:’, steps_walked)
print(‘meters_forward:’, meters_forward)
88 Get the maximum value of this list comprehension
[i for i in range(10)]
max_value=max([i for i in range(10)])
89 What are list comprehensions used for?
They are used for creating new lists from other iterables.
random_walk =[0,1,2,3,2,3,4,5,6] 0=starting position
#90 Get the amount the meters advance in this random_walk. #Get the number of steps given #Use matplotlib line plot to display the walk
steps_walked: 8
meters_forward: 3
random_walk =[0,1,2,3,2,3,4,5,6] 0=starting position
#Get the amount the meters advance in this random_walk. #Get the number of steps given #Use matplotlib line plot to display the walk
import matplotlib.pyplot as plt
random_walk =[0,1,1,0,-1,0,1,2,3] steps_walked = len(random_walk) -1 meters_forward = random_walk [-1] print('steps_walked:',steps_walked) print('meters_forward:',meters_forward)
plt. plot(random_walk)
plt. show()