W2 Flashcards

1
Q

How are images stored in a computer?

A
  • An image is composed of pixels stored in a 2-dimensional grid.
  • Each pixel has only one colour.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data stored in each pixel?

A
  • Each pixel contains finite, discrete quantities of numeric representation for its
    intensity

EXAMPLE: Each number represents the intensity of the black colour of the
corresponding pixel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What infomation is needed to represent a greyscale image in python?

A
  • Information needed:
  • Pixel values
  • Locations
  • We can use:
  • An integer to represent the grey level of each pixel in a nested sequence (like list of lists to represent the 2-dimensional grid), and we can specify each element by their location
  • EXAMPLE:

zero = [[ 0, 0, 5,13, 9, 1, 0, 0],
[ 0, 0,13,15,10,14, 5, 0],
[ 0, 3,15, 2, 0,11, 8, 0],
[ 0, 4,12, 0, 0, 8, 8, 0],
[ 0, 5, 8, 0, 0, 9, 8, 0],
[ 0, 4,11, 0, 1,12, 7, 0],
[ 0, 2,14, 5,10,12, 0, 0],
[ 0, 0, 6,13,10, 0, 0, 0]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you ensure you import the relevant module for visualising colours?

A

import matplotlib.pyplot as plt

plt.figure(figsize=(2,2))

plt.imshow(zero, cmap=plt.cm.binary)

plt.gca().get_xaxis().set_ticks([])

plt.gca().get_yaxis().set_ticks([]);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you locate values in a list of list?

A

list [list 1][index 2]

EXAMPLE:
zero[1][2] # “row” with index 1 and “column” with index 2 (2nd “row” and 3rd “column”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the RGB colour model?

A

RGB is an additive colour model where colours are created by combining varying intensities of primary colours (red, green, and blue) light.

  • By mixing red, green, and blue colour light with different intensities, we can form a broad range of colours.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you represent each colour in the RGB colour model?

A

We can represent each colour by 3 numbers in the RGB24 format (colour channels), which correspond to the intensities of red, green, and blue.

*Each colour can be represented by 3 integers (r, g, b), with each integer in the range of [0, 255]
- 0 means the lowest intensity and 255 the highest intensity
- Each colour intensity is stored in a byte, so a colour is represented by 24 bits

  • There are discrete combinations of R, G, and B values are allowed
  • They can represent millions of different colours with different hue, saturation, and brightness.
  • EXAMPLE:
    (255, 127, 80) represents an orange colour. It has relatively high red, medium green, and low blue intensity
  • EXAMPLE:
  • Black is rgb(0, 0, 0)
  • White is rgb(255, 255, 255)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we represent a greyscale image in Python?

A

Using a nested sequence like a list of lists to represent a grid and then for each pixel, assigning a number to represent its grey level.

gray_img = [[0, 128],
[256, 0]]
plt.imshow(gray_img, cmap=plt.cm.gray)
plt.gca().get_xaxis().set_ticks([]);plt.gca().get_yaxis().set_ticks([]);

NOTE: Lower values represent darker colours so it behaves similar to the RGB model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we represent a colour image in Python?

A

For colour images, each pixel has 3 numbers to represent its colour. Therefore, now we need a nested sequence like a list of list of list to represent an image.

EXAMPLE:

colour_img = [[[255, 127, 80], [222, 49, 99]],
[[159, 226, 191], [64, 224, 208]]]

plt.imshow(colour_img)
plt.gca().get_xaxis().set_ticks([]);plt.gca().get_yaxis().set_ticks([]);

  • NOTE: To locate each value, 3 indexes are required. For instance: “row” with index 0,
    “column” with index 1,
    The “B” value of the pixel is given by colour_img[0][1][2]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an extension of the RGB model with a fourth channel?

A

RGBA (red green blue alpha) is an RGB colour model supplemented with a fourth-channel alpha

  • Alpha indicates how opaque (or transparent) each pixel is, with 0 being fully transparent and 255 being fully opaque.
  • EXAMPLE: With the last integer representing how opaque each pixel is:
    colour_img = [[[255,0,0,0], [255,0,0,64], [255,0,0,128], [255,0,0,192], [255,0,0,255]]]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the HSB colour model?

A

HSB (hue, saturation, brightness) is another colour model.

  • It describes colours based on three psychological dimensions of colours:
  1. Hue: “Type” of colour like red, green
    - Hue can typically be represented quantitatively by a single number, like an angular position around a colour wheel
  2. Saturation (or chroma):
    - “Colourfulness”, “vividness” of a colour
    – 0%: grayscale image
    – 100%: full saturated
  3. Brightness (or value):
    - 0%: black
    - 100%: full intensity

NOTE: HSB is a cylindrical-coordinate representation of colours:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you import HSB and display: 1. Different hues (same saturation and brightness), 2. Different saturations (same hues and brightness), 3. Different brightnesses (same hues and saturations)?

A

from matplotlib.colors import hsv_to_rgb

    1. Different hues (same saturation and brightness)

hsv_1 = [[[h, 1, 1] for h in np.arange(0, 1, 0.125)]]

rgb_1 = (hsv_to_rgb(hsv_1)*255).round().astype(np.uint8)

plt.imshow(rgb_1)

plt.gca().get_xaxis().set_ticks([]);plt.gca().get_yaxis().set_ticks([]);

** 2. Different saturations (same hues and brightness)

hsv_2 = [[[0, s, 1] for s in np.linspace(0, 1, 9)]]

rgb_2 = (hsv_to_rgb(hsv_2)*255).round().astype(np.uint8)

plt.imshow(rgb_2)

plt.gca().get_xaxis().set_ticks([]);plt.gca().get_yaxis().set_ticks([]);

*** 3. Different brightnesses (same hues and saturations)

hsv_3 = [[[0, 1, v] for v in np.linspace(0, 1, 9)]]

rgb_3 = (hsv_to_rgb(hsv_3)*255).round().astype(np.uint8)

plt.imshow(rgb_3)

plt.gca().get_xaxis().set_ticks([]);plt.gca().get_yaxis().set_ticks([]);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is digital audio saved in a computer and what information is needed?

A

In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a sequence.

  • Two types of information are needed:
    1. Samples, which is represented as a sequence of numbers
  • It can also be 2 sequences (for left and right ears)
  1. Sample rate (how many samples to take per second)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you import sound data in python?

A

from scipy.io import wavfile

samplerate, sound_data = wavfile.read(‘data/meow.wav’)

NOTE: You can then check the samplerate and sound_data by using them as commands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you play sound in python?

A

import sounddevice

sounddevice.play(sound_data, samplerate=samplerate)

NOTE. You may need to install ‘sounddevice’ first. You can then read the sound_data and samplerate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you visualise a sound wave on Python?

A

import numpy as np

import matplotlib.pyplot as plt

time = np.arange(0, len(sound_data))/samplerate

plt.plot(time, sound_data); plt.xlabel(“Time [s]”); plt.ylabel(“Amplitude”);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How are videos saved in a computer?

A

In a computer, video is composed of a sequence of still images (“frames”) with audio.

  • Given that images and sounds are sequential data, shuffling an image or audio will mean that the video returned is not synced
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are modules?

A

Module: a collection of useful functions and variables for some specific purpose

EXAMPLE: You typically install a package and then import/use modules contained within it:

from scipy.io import wavfile
samplerate, sound_data = wavfile.read(‘data/meow.wav’)

import sounddevice
sounddevice.play(sound_data, samplerate=samplerate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the Python standard library?

A

The Python standard library provides a wide range of modules, with many useful for data science.

  • You do NOT need to install the library or the modules from it, as they come with Python
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some EXAMPLES of the modules from the Python standard library?

A
  • math: provide mathematical functions and constants
  • random: generate pseudo random number from various distributions
  • os: provide functions for interacting with the operating system
  • datetime: represent and manipulate dates and times
  • pickle: serialising and de-serialising Python objects

NOTE: More modules can be found here: https://docs.python.org/3/library/index.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you use a module?

A
  • To use a module, one needs to first import it.
  • Once a module is imported, it can be used for the rest of the script:

EXAMPLE:
import random
random.random()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can you use the random module?

A
  • Generate a “random number” in the range [0.0, 1.0)

random.random()

  • Generate a ‘random integer’ in the range (1, 6)

random.randint(1, 6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What command will provide more information on the purpose of a module or a specific function?

A

help()

EXAMPLE:
* For a whole module: help(random)

  • For a specific function:
    help(random.random)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the 4 common ways to import a module, or objects from a module?

A
  1. import random
    * Creates a reference to the module ‘random’ in the current namespace
  2. import random as r
    * Creates a reference to the module random in the current namespace via the alias ‘r’
  3. from random import randint
    * Creates references to the specified object ‘randint’ in the module in the current namespace
  4. from random import *
    * Creates references to all public objects (e.g. randint, randrange, sample, seed) defined by the module in the current namespace
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are packages/modules from third parties and how fo you import them?
NOTE: Third-party packages do not come with Python and need to be installed before use through Anaconda * NumPy * Pandas * matplotlib * tensorflow EXAMPLE: import numpy as np np.random.random(5)
26
How do you install packages from Anaconda?
You can install a package easily by running some conda commands in a terminal. conda install [package-name] EXAMPLE: conda install conda-forge::python-sounddevice NOTE: 'pip' and 'conda' are the 2 main package managers for Python packages. But USE 'conda install' if using Anaconda
27
How do you update a package?
Use the command: conda update
28
What is the package for computer vision and machine learning software library?
opencv conda install conda-forge::opencv
29
What are some of the different dimensional data structures?
* 1-Dimensional: sound_data * 3-Dimensional: colour_img = [[[255, 127, 80], [222, 49, 99]], [[159, 226, 191], [64, 224, 208]]]
30
What are 3 common manipulations of numerical data?
1. Statistics - i.e. average Bitcoin price 2. Transform Data (apply the same function on each element of the array) - i.e. log of Bitcoin prices 3. Select a subset of the data - i.e. keep only rows for cars from 1980
31
How can you use NumPy to find the difference between values in a list
aapl_prices = [222.64, 229.98, 228.26, 237.87, 233.28] import numpy as np np.diff(aapl_prices) OUT: array([ 7.34, -1.72, 9.61, -4.59])
32
What is NumPy and its main object?
NumPy is a fundamental package for scientific computing with Python. * Short for Numerical Python * Main object: homogeneous multidimensional array (ndarray) - Homogeneous: all of the same type - Fast vectorised operations on arrays *NumPy offers numerical computing tools like mathematical functions, random number generators, linear algebra routines, etc.
33
How do you install NumPy and import it?
conda install numpy import numpy as np
34
How do you create a NumPy ndrarry?
Use command: np.array() EXAMPLE: cases_a = np.array([17, 26, 39, 39, 38, 33, 33])
35
How do you create a 2d array in NumPy?
To create a 2d array, use np.array() and provide a nested sequence like a list of list: EXAMPLE: * Here the 2d ndarray is storing tabular data, with: - 3 "rows", with each row representing a problem set - 6 "columns", with each column representing a student - You can also consider the example as a 3X6 matrix. ps = np.array(([[86.5, 65, 42, 73, 93, 72], [ 53, 62, 70, 83, 90, 65], [ 71, 82, 65, 80, 81, 78]]))
36
How can you represent a 3x3 matrix A with entries: 1, 2, 3, 2, 3, 0, 0, 1, 2?
A = np.array([[1, 2, 3], [2, 3, 0], [0, 1, 2]])
37
How can you use a 2d array to represent greyscale images?
zero = np.array([[ 0, 0, 5,13, 9, 1, 0, 0], [ 0, 0,13,15,10,14, 5, 0], [ 0, 3,15, 2, 0,11, 8, 0], [ 0, 4,12, 0, 0, 8, 8, 0], [ 0, 5, 8, 0, 0, 9, 8, 0], [ 0, 4,11, 0, 1,12, 7, 0], [ 0, 2,14, 5,10,12, 0, 0], [ 0, 0, 6,13,10, 0, 0, 0]], dtype=np.uint8)
38
How can we determine the array data type?
Use command: .dtype NOTE: All elements are all of the same (array data type) EXAMPLE1: sound_data.dtype OUT: dtype('int16') EXAMPLE2: ps.dtype OUT: dtype('float64') EXAMPLE3: zero.dtype OUT: dtype('uint8')
39
What are some EXAMPLES of data types in array?
* int16: signed integer with 16 bits * float64 floating point number with 64 bits * uint8: unsigned integer with 8 bits
40
How can you determine the number of dimensions of the array?
Use command: .ndim EXAMPLE: print(cases_a.ndim) print(ps.ndim)
41
How can you determine the size of the array along each dimension?
Use command: .shape EXAMPLE1: cases_a.shape # i.e. the number of elements in a vector EXAMPLE2: ps.shape OUT: (3, 6) # i.e. considered as 3 rows and 6 columns
42
How can we verify that the data is stored as np.ndarray?
Use command: type() EXAMPLE: type(img) OUT: numpy.ndarray
43
How can you calculate the element-wise difference between 2 arrays?
So long as they are the same size, you can calculate the element-wise difference between 2 vectors (i.e. x - y) EXAMPLE: cases_a = np.array([17, 26, 39, 39, 38, 33, 33]) cases_b = np.array([21, 20, 25, 34, 26, 41, 62]) # cases from region b cases_a - cases_b OUT: array([ -4, 6, 14, 5, 12, -8, -29])
44
How can you add a scalar to each element in an array?
You can add a scalar to each element of an array by using the variable and adding/subtracting a scalar to it (i.e. x + 3) EXAMPLE: ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps + 3 OUT: array([[89, 68, 45, 76, 96, 75], [56, 65, 73, 86, 93, 68], [74, 85, 68, 83, 84, 81]])
45
How do vector calculations with arrays differ from calculations with lists directly?
We cannot do vector calculation directly with a list (i.e. it returns an error code). Otherwise you need to use an iteration to build in a list to have such operations: EXAMPLE: new_list = [] for i in range(len(cases_a_list)): new_list.append(cases_a_list[i] - cases_b_list[i]) new_list
46
How can you add two arrays together and divide by 2?
EXAMPLE: cases_a = np.array([17, 26, 39, 39, 38, 33, 33]) cases_b = np.array([21, 20, 25, 34, 26, 41, 62]) (cases_a + cases_b) / 2
47
How can you have element-wise addition on a 2d array?
EXAMPLE: ps = np.array([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]]) penalty = np.array([[ 0, -10, -20, 0, 0, 0], [ 0, 0, 0, 0, 0, 0], [ 0, 0, -10, 0, 0, -30]]) ps + penalty OUT: array([[86, 55, 22, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 55, 80, 81, 48]])
47
What is important to note on the shape of the array for calculations to be conducted?
The shape of the arrays has to be compatible. EXAMPLE: This does not work print(cases_a + np.array([1,2]))
48
What is broadcasting?
Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes. * Subject to certain constraints, the smaller array is "broadcast" across the larger array so that they have compatible shapes. * EXAMPLE: ps_1 = np.array([86, 65, 42, 73, 93, 72]) ps_1 + 3 OUT: array([89, 68, 45, 76, 96, 75])
49
How can you use broadcasting to work with 1d and 2d arrays together?
EXAMPLE: ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) adjustment = np.array([3, 3, 3, 0, 0, 0]) ps + adjustment OUT: array([[89, 68, 45, 73, 93, 72], [56, 65, 73, 83, 90, 65], [74, 85, 68, 80, 81, 78]])
50
What are some general broadcasting rules on compatability and how can you check?
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when - They are equal, or - One of them is 1 * You can check they are compatible by using the .shape command and checking the last dimension is the same
51
What are some of the descriptive statistics that you can command in NumPy?
For an array 'btc': * Max Price: btc.max() (or) np.max(btc) * Average Price: btc.mean() * 50th percentile (i.e. median) np.percentile(btc, 50) * Standard deviation btw.std() * Correlation Coefficient np.corrcoef(cases_a, cases_b) LIST AVAILABLE HERE: https://numpy.org/doc/stable/reference/routines.statistics.html
52
How do you calculate the overall average of a 2d ndarray?
Use Command: .mean() EXAMPLE: ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) print(ps.mean()) OUT: 72.83333333333333
53
How do you calculate the average of each column?
Use Command: .mean(axis=0) REMEMBER: AXIS 0 is n in nxm matrix. For a 3x6 matrix, there will be the average of 3 value for 6 columns, so there will be 6 averages returned. EXAMPLE: ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps.mean(axis=0) OUT: array([70. , 69.66666667, 59. , 78.66666667, 88. , 71.66666667])
54
How do you calculate the average of each row?
Use Command: .mean(axis=1) EXAMPLE: ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps.mean(axis=0) OUT: array([71.83333333, 70.5 , 76.16666667])
55
What are Universal Functions?
Universal functions (or "ufunc" for short) are simple mathematical functions that operate on some sequences in an element-by-element fashion. https://numpy.org/devdocs/reference/ufuncs.html#available-ufuncs
56
What function can you use to create an array from endpoint -3 to 3 with 25 evenly spaced numbers?
Use Command: np.linspace(-3, 3, 25)
57
What is an EXAMPLE of using a universal functions to graph an exponential function?
x = np.linspace(-3, 3, 25) y = 1/np.sqrt(2*np.pi)*np.exp(-(x**2)/2) plt.scatter(x, y);
58
How can you conduct an element-wise comparison of an array with a scalar?
Use the logical operators between arrays. EXAMPLE: cases_a = np.array([17, 26, 39, 39, 38, 33, 33]) cases_b = np.array([21, 20, 25, 34, 26, 41, 62]) cases_a > cases_b OUT: array([False, True, True, True, True, False, False])
59
How can you compare element-wise of arrays using a scalar (broadcasting)?
ps_1 = np.array([86, 65, 42, 73, 93, 72]) ps_1 >= 70 OUT: array([ True, False, False, True, True, True])
60
What are 2 ways in which you can chain a boolean expression in comparing arrays?
You can use & (ampersand) and | (vertical slash) to chain the boolean expression with (). EXAMPLE1: Both are >= 70 (ps_1 >= 70) & (ps_2 >= 70) OUT: array([False, False, False, True, True, False]) EXAMPLE2: At least one is >= 70 (ps_1 >= 70) | (ps_2 >= 70) OUT: array([ True, False, True, True, True, True])
61
How can you index/select an element in NumPy?
Similar to Python list when accessing the items. EXAMPLE: ps_1 = np.array([86, 65, 42, 73, 93, 72]) ps_1[-1] OUT: 72
62
How can you slice elements in NumPy?
You slice to get part of the array with `[start:stop:step]` EXAMPLE: ps_1 = np.array([86, 65, 42, 73, 93, 72]) x = ps_1[1:3] OUT: array([65, 42])
63
How can you index/select an element in a 2d array in NumPy?
To get an item, you can use a "matrix"-like notation arr[i,j]. EXAMPLE. To get the element at the (0,2) position ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps[0,2] OUT: 42
64
How can you slice a 2d array in NumPy?
It is typically of the form [rows, columns] EXAMPLE: To get all "rows" and the "columns" with index 1 to 3 ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps[:,1:3] OUT: array([[65, 42], [62, 70], [82, 65]])
65
For a 3x3 matrix, what are the different arrays you can attend by slicing?
* Top right 2x2: arr[:2, 1:] * Bottom row: arr[2] or arr[2, : ] arr[2: , : ] * Left 2 columns: arr[ : , :2 ] * Middle of left 2 columns: arr[1, :2 ] arr[1:2, :2 ]
66
How can we filter data via boolean array indexing?
We can select items based on some conditions easily by boolean array indexing. EXAMPLE: We would like to get the ps_2 marks for students with ps_1 >= 70: ps_1 = np.array([86, 65, 42, 73, 93, 72]) ps_2 = np.array([53, 62, 70, 83, 90, 65]) ps_2[ps_1 >= 70] OUT: array([53, 83, 90, 65]) NOTE: For ps_1 >= 70, it returns True or False, and those that are true will be included in the filter
67
How can we use Boolean Array indexing on a 2d array?
We use a boolean-valued array as an index, with True meaning the corresponding item will be selected EXAMPLE: We want to get all the marks >= 70: ps[ps >= 70] OUT: array([86, 73, 93, 72, 70, 83, 90, 71, 82, 80, 81, 78])
68
How can you reshape an array to be 2d?
Use Command: .reshape() EXAMPLE: To reshape to be 2x6 matrix marks = np.array([86, 65, 42, 73, 93, 72, 53, 62, 70, 83, 90, 65]) ps = marks.reshape(2, 6) OUT: array([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65]])
69
What are the steps involved in downloading a data set of images, attaining the first image, and reshaping it to be 8x8?
load data from `sklearn` from sklearn.datasets import load_digits digits = load_digits() data = digits['data'] type(data) OUT: numpy.ndarray 1797 images, with each image an array with 64 elements data.shape OUT: (1797, 64) Get the first image first_img = data[0] OUT: A 5x13 grid Reshape it to be 8x8 first_img.reshape((8, 8)) OUT: array([[ 0., 0., 5., 13., 9., 1., 0., 0.], [ 0., 0., 13., 15., 10., 15., 5., 0.], [ 0., 3., 15., 2., 0., 11., 8., 0.], [ 0., 4., 12., 0., 0., 8., 8., 0.], [ 0., 5., 8., 0., 0., 9., 8., 0.], [ 0., 4., 11., 0., 1., 12., 7., 0.], [ 0., 2., 14., 5., 10., 12., 0., 0.], [ 0., 0., 6., 13., 10., 0., 0., 0.]])
70
Whats is a path?
A path is a unique location of a file or a folder in a file system of an operating system
71
What is an Absolute Path?
Absolute path: path that specifies the location of a file or directory from the root directory / EXAMPLE: /Users/christine/Documents/GitHub/st115_github/2024w-lecture-02/data/sample.txt DO NOT PUT AN ABSOLUTE PATH IN SUBMISSIONS (not able to access it from another computer)
72
What is a relative path?
Relative path: path that is relative to the current working directory. * Relative path never starts with / * Relative paths can be checked via: import os os.getcwd() EXAMPLE: The relative path for /Users/christine/Documents/GitHub/st115_github/2024w-lecture-02/data/sample.txt is data/sample.txt
73
How do you load a file into python?
To load a file into Python, you can use the built-in function open() * open() is a function to create a file object which in this case we can use to read the content from the corresponding file - The first argument is the path to the file (absolute/relative) EXAMPLE: f = open('data/sample.txt')
74
How do you close a file in Python?
To close the file, you use .close() It is important to close the file to release its resources back to the operating system EXAMPLE: f = open('data/sample.txt') f.close()
75
How can you get the first line from a file?
Use 'break' to stop after getting the first line f = open('data/sample.txt') for line in f: line_1 = line break f.close()
76
How can you removing training whitespace from a line?
.rstrip() removes any trailing whitespace EXAMPLE: line_1.rstrip()
77
How can you read files with with and it advantages?
The with keyword is used for resource management EXAMPLE: with open("data/sample.txt") as f: for line in f: print(line.rstrip()) * Advantages: - No need to write f.close() - Ensure the file is closed when the code that uses it finishes running, even if exceptions are thrown - Easier to know which lines of code are related to reading data from the file
78
Encoding, writing files, and lines
See NOTES for Encoding, writing files in Python, and writing multiple lines
79
What is notation for slicing an array in NumPy?
For slicing, again we can use list like notation [start:stop:step] . EXAMPLE: For ps = np.array(([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65], [71, 82, 65, 80, 81, 78]])) ps[:2] OUT: array([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65]])
80
How do you attain integer array indexing in NumPy?
We can create a new array by specifying the indexes of the items you would like to include: EXAMPLE: ps_1 = np.array([86, 65, 42, 73, 93, 72]) idx = [0, 2, 3] 1[idx] OUT: array([86, 42, 73])
81
How can you stack arrays in a sequence horizontally or vertically?
You can stack arrays in a sequence horizontally via hstack() or vertically via vstack() EXAMPLE hstack: ps_1 = np.array([86, 65, 42, 73, 93, 72]) ps_2 = np.array([53, 62, 70, 83, 90, 65]) np.hstack([ps_1, ps_2]) OUT: array([86, 65, 42, 73, 93, 72, 53, 62, 70, 83, 90, 65]) EXAMPLE vstack: np.vstack([ps_1, ps_2]) OUT: array([[86, 65, 42, 73, 93, 72], [53, 62, 70, 83, 90, 65]])
82
How can you transpose a 2d array?
Use command: .T EXAMPLE: ps.T OUT: array([[ 0, 53, 7], [65, 0, 8], [42, 4, 65], [ 1, 5, 9], [ 2, 6, 10], [ 3, 65, 11]])
83
How do you find the inverse of a 2d ndarray?
For a 2d ndarray to represent a matrix, and now we show that we can find the inverse of a matrix very easily with the use of np.linalg.inv() EXAMPLE: A = np.array([[1,2,3], [2,3,0], [0,1,2]]) A_inverse = np.linalg.inv(A)
84
How can you do matrix multiplication
We can do matrix multiplication with @ EXAMPLE: A_inverse @ A
85
How can you find the determinant of a matrix?
We can find the determinant of a matrix by using np.linalg.det()
86
What function tells you the run time of a code?
'%%time' is one of the built-in magic commands provided by IPython kernel. EXAMPLE, it returns: CPU times: user 18.6 ms, sys: 26.3 ms, total: 44.8 ms Wall time: 43.6 ms
87
Why is NumPy much faster than lists?
* np.ndarrays are densely packed with homogeneous type * Python list's: arrays of pointers to objects * Many NumPy operations are implemented in C, avoiding the general cost of loops in Python
88
What is a view in NumPy and why is it different from copy?
View in NumPy is another way of viewing the same data of an array. * For some operations, what we get is a view of the original array but not a copy * With this view, data is not copied * Any changes made to a view reflects in the original array * With the use of view, there is no need to spend time or memory for copying the data Copy in NumPy has a new array created by duplicating the data * use the .copy() command * Data is copied * Changes made to a copy do not reflect on the original array * Making a copy is slower and memory-consuming but sometimes necessary
89
EXAMPLE: 1. Create an array starting from 0 end before 20 w/ step size 2 2. select the items at index 1, 5, 7
arr = np.arange(0, 20, 2) sub_arr = arr[[1,5,7]]
90
EXAMPLE: 1. Create an array with 10 items, starting from 0.1 and end at 1 2. select items > 0.5
arr = np.linspace(0.1, 1, 10) sub_arr = arr[arr>0.5]
91
EXAMPLE: 1. Create a 2x5 array with all ones 2. Transpose the array 3. Replace the first number to not a number
mat = np.ones((2, 5)) mat = mat.T mat[0,0] = np.nan
92
EXAMPLE: 1. Create an array with all zeros of length 10 2. Reshape to be a 2x5 array 3. Replace the first row with infinity
arr = np.zeros(10) mat = arr.reshape((2, 5)) mat[0] = np.inf
93
EXAMPLE: 1. Create a 3x3 identity matrix 2. Create a diagnoal matrix 3. What is the code for the exponential 4. What is the code for Pi
arr = np.eye(3, dtype=int) mat = np.diag([1,2,3]) np.e np.pi
94
What is one of the main difference between Integer/Boolean array indexing, astype(), flatten(), and with transpose, reshape(), ravel()?
* Integer and boolean array indexing, astype() and flatten() return a copy * Transpose, reshape() and ravel() return a view
95
What is the os Module?
The os module provides functions for interacting with the operating system * Get the current working directory by os.getcwd() * List all the files in the given path by os.listdir() * Create a folder by os.mkdir() * Remove a file by os.remove() * Use os.rmdir() to remove a folder (if it is empty)
96
What is the os.path Module?
The os.path module provides useful functions on pathnames * Check if a file/folder at the specified path exists by os.path.exists() - NOTE: This does not tell you if it is a file or a folder. You can check it by os.path.isfile() and os.path.isdir()
97
What is the pickle module?
pickle module is for serialising and de-serialising Python objects. Serialising: the process of turning a Python object in memory into a stream of bytes for storage/sending over networks De-serialising: the process of converting a stream of bytes back to a Python object Or we "pickle" and "unpickle" Python objects.
98
How can you pickle and unpickle a list?
* We "pickle" a list by using pickle.dump() * We can "unpickle" and get back the list by using pickle.load() EXAMPLE pickle: import pickle lst = [1, 2.0, '3', None, False, [], {}, (), np.array(1)] with open("data/lst.pickle", mode = "wb") as f: pickle.dump(lst, file = f) EXAMPLE unpickle: with open("data/lst.pickle", mode = "rb") as f: unpickled_lst = pickle.load(f) unpickled_lst
99
EXAMPLE: The folder data/ps_marks contains 60 files. Each file contains 3 problem set marks for the corresponding student. Assume all marks are integers. 1. Please write code to do the following: Extract marks for Adam and Annie, and store the marks in a list of list with each "row" representing a student and each "column" representing a problem set. All marks should be int (not str)
adam_marks = [] with open('data/ps_marks/adam.txt') as f: for line in f: adam_marks.append(int(line)) with open('data/ps_marks/annie.txt') as f: annie_marks = [int(line) for line in f] annie_marks [adam_marks, annie_marks]
100
EXAMPLE: Write code to create 1. ps_marks (a 2d np.ndarray) to store the record for all 60 students. Each "row" of ps_marks represents a student and each "column" represents a problem set 2. student_names (a list or 1d np.ndarray) to store the names of the corresponding students
import numpy as np files = os.listdir('data/ps_marks/') ps_marks = [] student_names = [] for file_name in files: if file_name[0]!= '.': with open('data/ps_marks/'+filename) as f: row = [int(line) for line in f] student_names.append(file_name[:-4]) ps_marks.append(row) ps_marks = np.array(ps_marks)
101
EXAMPLE: * Calculate the average mark for (1) each student and (2) each problem set 3. Which student has the highest average mark? 4. What is the percentage of students with average mark ? 5. Is there any problem set that all students have marks ?
#1 ps_marks.mean(axis=1) #2 ps_marks.mean(axis=0) #3 student_avg = ps_marks.mean(axis=1) student_names[student_avg.argmax()] #4 (student_avg >= 70).mean()*100 #5 (ps_marks >= 70).all(axis=0)
102
How can we tell if the array is a view or a copy?
We can check the .base attribute of the array. * It returns the original array if it is a base, None if it is a copy
103
EXAMPLE: 1. Create a Counter from collections to count the occurrence of each character. What are the ten most frequently occurring characters in data/book.txt? 2. Create a Counter from collections to count the occurrence of each word. What are the ten most frequently occurring words?
import collections #1 character_count = collections.Counter() with open('data/book.txt', 'r') as f: for line in f: character_count += collections.Counter(line) character_count.most_common(10) #2 word_count = collections.Counter() with open('data/book.txt', 'r') as f: for line in f: words = line.split() for one_word in words: cleaned_word = one_word.strip().lower() if len(cleaned_word): word_count[cleaned_word] += 1 word_count.most_common(10) NOTE: Some of the words has punctuation attached. To remove this you can use string.punctuation and then strip() ord_count = collections.Counter() with open('data/book.txt', 'r') as f: for line in f: words = line.split() for one_word in words: cleaned_word = one_word.strip().lower().strip(string.punctuation) if len(cleaned_word): word_count[cleaned_word] += 1 word_count.most_common(10)
104
How can you remove punction attached to words?
If you have a look at word_count, you will realise you have some "words" like 'too,'. To remove the punctuations, you can use string.punctuation to get a sequence of punctuations: import string string.punctuation and use strip() to remove them:
105
HOW TO OPEN A FILE?
with open('data/auto-mpg-modified.csv') as file: content = file.read() import numpy as np mpg_data = np.loadtxt('data/auto-mpg-modified.csv', delimiter=',') mpg_data
106
EXAMPLE: Use normal() from numpy.random to create 10000 random samples from a normal distribution with mean 10 and standard deviation 1. Then find out the portion of values from (1) that is: In between 9 and 11 In between 8 and 12 In between 7 and 13
#1 import numpy.random as nr x = nr.normal(10, 1, 10000) #2 ((9 < x) & (x < 11)).mean() #3 ((8 < x) & (x < 12)).mean() #4 ((7 < x) & (x < 13)).mean()
107
EXAMPLE: 1. Use normal() from numpy.random to create 100000 random samples from a normal distribution with mean 10 and standard deviation 1, as a 2d array with 100 rows and 1000 columns 2. Take the column mean of 1 3. What is the portion of values from 2 that is outside of the range [9.804, 10.196]?
#1 import numpy.random as nr x = nr.normal(10, 1, (100, 1000)) #2 x_avg = x.mean(axis=0) #3 ((x_avg < 9.804) | (x_avg > 10.196)).mean()