Preliminary Code Flashcards
Why do we need to be concerned with data handling?
Before starting to train a model, we need to get the data into a form which is compatible with the training and testing
Why do we train and test in batches?
Training all the full dataset all at once as this would take too long.
What is the main aim of building models (in classification or regression)?
To make predictions on unseen data.
This is known as generalisation.
What is the square root function?
np.sqrt(num)
What is a useful feature of NumPy for scientific computing?
It can deal with sets of numbers in arrays. Arrays only contain with one type, often numbers.
How can we create an array?
array = np.array([1,2,3])
ie passing it a list
How do we check the type of an object?
type(object)
How do we access elements of a 1D array?
Same indexing as a list
array[1] - single index
array[2:] - every element from position 2 onwards
What kinds of arrays are usually used for image data”?
A 2D array is often used for an image, where each element of the array is the value of a pixel in the image.
How do you create a 2D array?
Combining multiple 1D arrays with the same size. Enclosing each in a curved bracket and separate with a comma.
array = np.array([(1,2), (3,4)])
How can you investigate the size of an array in each dimension?
array.shape
How do we tell python what type of 1D vector we want (column or row vector)?
rowVec = array.reshape(1,-1)
rowVec .shape is (1,7)
colVec = array.reshape(-1, 1)
colVec.shape is (7, 1)
How can you see how many dimensions are in your array?
array.ndim
How can you see the total number of elements in a multi-dimensional array?
array.size
How do you index a 2D array?
array[1, 2]
What does the : by itself indicate in selection?
You select the whole 1D array along that dimension.
eg array2D([3, :])
What are two built-in ways to quickly build arrays?
- linspace()
- arrange()
Both outpace 1D array of numbers
What does linspace do?
Outputs evenly spaced numbers between the “start” and “stop” values
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
The default number of elements is 50.
Endpoint set to true, last number included by default.
What does arange do?
Allows us to define spacing and the length of the array.
arange(start, stop, step , dtype=None)
Default start is 0 and default step is 1.
What are linspace and arange functions useful for?
Useful for building iterators.
eg x = np.arrange(0 , 5.1, 0.1)
for n in x:
print(x)
What should you check before joining arrays?
Ensure they have the same size along the dimension where you want to join them.
What are functions for stacking arrays?
vstack - vertical stacking
hstack - horizontal stacking
dstack - stack in depth
What are different ways you can initialise arrays?
- Pass a list
- Create an array of all zeros
- Create an array of all ones
- Create a constant arrays
- Create a random array
How do you create an array of zeroes?
np.zeros((2,2))
How do you create an array of ones?
np.ones((2,2))
How do you create a constant array?
np.full((2,2), 7)
How do you create an array of random numbers?
np.random.random((4,4))
What is the importance of being able to generate random numbers?
In machine learning random numbers are important.
Eg to make an initial guess
How do you add/subtract/multiply a matrix by a scalar?
a * 2
a + 2
a - 2
How do you perform element-wise addition/subtraction/multiplication/division of matrices?
a+b
a-b
a*b
a/b
How do you perform matrix multiplication?
np.matmul(a,b)
This is what PyTorch will use.
c_ij is the dot product of the ith row of a and the jth column of b.
How do we create a replicate of an array, rather than just pointing to the same point in memory?
b = a.copy()
How do we import pyplot?
import matplotlib.pyplot as plt
How do we plot data?
plt.plot(x, y)
How do we add an x or y label?
plt.xlabel(“x”)
plt.ylabel(“y”)
What must we add to the end of each plot?
plt.plot()
What do we need to do when plotting two curves on one plot?
Add labels to distinguish the curves.
eg plt.plot(x, y, label=”sin(x)”)
And need to add
plt.legend()
How do we see what curve corresponds to what function?
plt.legend()
How do we plot curves of functions?
Generate a set of data points using linspace or arange - set as x.
Y is a function of x.
Eg:
x = np.linspace(0, 20, 1000)
y = np.sin(x)
How do you customise the line colour and style of a plot?
Add argument “r–”
eg plt.plot(x, y, ‘r–’, label = ‘y = sin(x)’)
“r–” - red dashed line
“k-“ - solid black line
How do you plot two curves side by side?
fig, axs = plt.subplots(1, 2, figsize=(10, 4), tight_layout=True)
axs[0].set_xlabel(“x”, fontsize = 20)
axs[0].set_xlim([0, 20])
axs[0].tick_params(axis = ‘x’, labelsize = 20)
axs[0].legend(loc = ‘lower right’, fontsize = 20)
How do we import pandas?
import pandas as pd
How do we create a data frame?
From an array:
pd.DataFrame(array)
From a dictionary:
pd.DataFrame(dictionary, index=col_name)
How do we make a data series or a data frame from a data frame column?
df[[‘col_name’]]
[] - data series
[[]] - keep as dataframe
How do you select a row by index label?
df.loc[[“label”]]
How do you load in a dataset from a file?
pd.read_csv(‘file.csv’,index_col=0)
How do you plot information in a dataframe?
df.plot(x=”col”, y=”col”, kind=”scatter”)
How do you change the scale of a plot?
plt.yscale(‘log’)
How do you plot by a categorical variable?
cats = set(df[‘category’].values)
Then can iterate over the categories. Subset the dataframe to have a df of this category.
When plotting the categories, use list(cats)
How do you extract the values of a column in a dataframe?
.values
How do you reduce things to a 1D structure?
.flatten()
How do you ensure that two side by side graphs do not overlap?
Add tight_layout=False to plt.subplots()
How do you plot with a log scale on the y axis?
ax.semilogy
How do you sort values of a data frame?
df.sort_values(“col”, ascending=False)
How can you see the methods and attributes each object has?
dir(object)
How do we capitalise a string?
“hello”.capitalize()
What does string1+string2
concatenates the two strings together
What is def __init__(self, vars)?
A special method run each time an instance is created.
What is the first argument of any method created within a class?
self
This refers to the created instance of the object. This is used to refer to object methods.
How do we define a function?
def function(vars):