First Flashcards
Type of variable
type(x)
Python lists
y = [‘a’,’b’,’c’]
List of lists
y = [[‘a’,’b’],[‘c,’d’],[‘e’,’f’]]
Subsetting lists
y[6]
Subset last variable in list
y[-1]
List slicing
y[#:#]
[inclusive:exclusive]
Remove from a list
del(y[#])
del()
List copy
When you copy a list, you create a reference not a new list.
To create a new list, you have to slice" x = ['a', 'b', 'c'] y = list(x) or y = x[:]
Find maximum
max()
Round
round(df, #)
Length of a list or string
len()
List in ascending order
sorted()
Find where something is indexed
index()
> y = [‘a’, 1, ‘b’, 2, ‘de’]
y.index(‘b’)
2
Change/add to your list
append()
> y = [‘a’, 1, ‘b’, 2, ‘de’]
y.append(44)
[‘a’, 1, ‘b’, 2, ‘de’, 44]
Make all upper case
string.upper()
Count occurrences of x in a string
string.count(‘x’)
Remove first x of a list to a matched input
list.remove()
> y = [‘a’, 1, ‘b’, 2, ‘de’]
y.remove(1)
[‘a’, ‘b’, 2, ‘de’, 44]
Reverse the order of elements in the list
list.reverse()
Create numpy array
y = np.array(list)
Numpy subsetting
> y = array([1, 3, 5])
> y[1]
3
> y > 3
array[(False, False, True)]
> y[y > 3]
array[(5)]
Numpy dimensions of an 2-D array
df.shape
> y = array([1, 3, 5],
[4, 5, 6])
y.shape
(2, 3) # 2 rows, 3 cols
Numpy Subsetting 2-D array
> y = array([1, 3, 5],
[4, 5, 6])
> y[0][2]
5
> y[0,2]
5
> y[: , 1:2]
array([3, 5],
[5, 6])
> y[1, :]
array([4, 5, 6])
Numpy mean
np.mean()
also subset with
np.mean(df[:, 0])
Numpy median
np.median()
also subset with
np.median(df[:, 0])
Numpy coefficient
Are two things related
np.corrcoef(x, y)
also subset with
np.corrcoef(df[:, 0], df[:,1])
Numpy std
np.std(x)
also subset with
np.std(df[:, 0])
Numpy sum
np.sum(x)
also subset with
np.sum(df[:, 0])
Numpy join two different lists into a single array
np.columnstack((df_x, df_y))
Matplotlib Line Chart
plt. plot(x, y)
plt. show()
Matplotlib Scatter Plot
plt. scatter(x, y)
plt. show()
Matplotlib Histogram
plt. hist(x, bins = #)
plt. show()
Matplotlib Customize (x axis, y axis, title, ticks)
plt. xlabel(‘x’)
plt. ylabel(‘y’)
plt. title(‘title’)
plt. yticks([0,1,2,3,4])
plt. xticks([0,1,2,3], [‘0’, ‘1B’, ‘2B’, ‘3B’]) # Reassign numbers on y -axis and change the name of y-axis ticks)
Dictionary
dict = {‘k’:v, ‘k1’:v1, ‘k2’,v2….}
world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
Dictionary find all keys
dict.keys()
> world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
print(world.keys())
Nepal, India, Bhutan
Dictionary add Key
dict[‘k’] = v
> world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
world[‘China’] = 1050
print(world)
{‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5, ‘China’ : 1050}
Dictionary delete key
del(dict[‘k’])
> world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
del(world[‘Bhutan’])
world
world = {‘Nepal’: 30.5, ‘India’: 1000}
Pandas dataframe
pd.DataFrame(dict)
> world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
df = pd.DataFrame(world)
Pandas CSV
pd.read_csv(‘path/to/dataframe.csv’, index_col = 0)
index_col = 0 means that the pd will not index the df
Pandas select columns
df[‘colname’]
Pandas select columns but keep in df
df[[‘colname’]]
Pandas select two columns
df[[‘col1’, ‘col2’]]
Pandas select rows
df[#:#]
Pandas Label Based Discovery
df.loc[[‘k’]]
> df.loc[[‘RU’]]
Country Capital Area
RU Russsia Moscow 17.1
Pandas Label Discovery Multiple Rows
df.loc[[‘k1’, ‘k2’, ‘k3’]]
> df.loc[[‘RU’, ‘IN’]]
Country Capital Area
RU Russsia Moscow 17.1
IN India Delhi 3.2
Pandas Label Discovery Multiple Rows and columns
df.loc[[‘k1’, ‘k2’, ‘k3’], [‘col1’, ‘col2’]
> df.loc[[‘RU’, ‘IN’], [‘Country’, ‘Capital’]]
Country Capital
RU Russsia Moscow
IN India Delhi
Pandas Index Discovery
df.iloc[[#]]
> df.iloc[[1]]
Country Capital Area
RU Russsia Moscow 17.1
Pandas Index Discovery Multiple Rows
df.iloc[[#, #, #]]
> df.loc[[1, 2]]
Country Capital Area
RU Russsia Moscow 17.1
IN India Delhi 3.2
Pandas Index Discovery Multiple Rows and Columns
df.iloc[[#, #, #], [#, #]
> df.loc[[1, 2], [0, 1]]
Country Capital
RU Russsia Moscow
IN India Delhi
Pandas [ ] vs [[ ]]
[ ] is a pd. series where as [[ ]] is a pd. dataframe
and
both booleans need to be true
> False and False
True
> x = 12
x > 7 and x < 15
True
> False and True
False
or
at least one boolean needs to be true
> True or False
True
> x = 5
x < 7 or x > 13
Numpy array equivalent of: and, or, not
logical_and()
logical_or()
logical_not()
> y = [[5, 7, 9]]
np.logical_and(y > 5, y <9)
[[False, True, False]]
Filtering (subset) pd dataframe
Filter
> df2 = df[‘col’] > #
or
Subset
> df2 = df[df[‘col’] > #]
Subset using NP
Filter
> np.logical_and(df[‘col’] > #, df[‘col’] < #)
or
subset
> df[np.logical_and(df[‘col’] > #, df[‘col’] < #)]
Enumerate FOR loop
> fam = [1.5, 1.6, 1.7]
> for index, height in enumerate(fam): > print(str(index) + ' : ' + str(height)) 1 : 1.5 2 : 1.6 3 : 1.7
FOR loop over a dictionary
First always key and then value
> world = {‘Nepal’: 30.5, ‘India’: 1000, ‘Bhutan’ : 0.5}
for k, v in world.items():
print(k + ‘ : ‘ + str(v))
Nepal : 30.5
India : 1000
Bhutan : 0.5
FOR loop over rows
iterrows()
not very efficient because on every iteration you are creating a new pandas series
> for lab, row in brics.iterrows():
print(lab + ‘ : ‘ + row[‘captial’]
BR : Brasilia
RU : Moscow
Calculate new column (Non math)
apply()
> brics[‘name_length’] = brics[‘country’].apply(len)
name_length BR Brazil Brasilia 6
Random number generator bw 1 - 0
np.random.rand()
Set random number manually
np.random.seed(#)
sets the random seed, so that your results are reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated.
np.random.rand()
if you don’t specify any arguments, it generates a random float between zero and one.
> np.random.seed(123)
coin np.random.rand(0, 2) #Randomly generate 1 or 0
Transpose and array
np.transpose(df)