Lecture Notes Flashcards

1
Q

Define programming.

A

Programming means giving a computer a list of tasks, which it then runs in order to solve a problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some advantages of computer programming?

A
  • Computers don’t get bored - automate repetitive tasks
  • Computers don’t get tired
  • Computers are calculators
  • Computer code is reproducible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can’t computers do?

A
  • Computers are not creative
  • Computers are not ethical
  • Computers only know what you tell them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some advantages of python?

A
  • High-level language
  • Emphasises readability, making use of white space and indentation
  • Dynamically typed
  • Interpreted language
  • Assigns memory automatically
  • Supports multiple approaches to programming
  • Extensive functionality
  • Portable
  • Open source
  • Very popular
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some disadvantages of python?

A
  • Slower than compiled languages
  • Can be memory-intensive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different types of cells in a Jupyter notebook?

A
  • Code cells - interpreted as Python code
  • Markdown cells - for adding formatted text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you add a comment to a Jupyter notebook?

A

#

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are comments important?

A
  • Allow you to keep track of what your code does
  • Avoids repetition and mistakes
  • Easy for other people to follow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What steps should you take for debugging?

A
  • Always read error messages carefully
  • Comment your code thoroughly
  • Tell your code to print outputs for intermediate steps
  • Use the internet
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you print in python?

A

print()

Prints whatever is in the brackets.
Useful for displaying results and testing purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a variable have?

A

A name and a value.

The name is fixed, the value can change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the different types of variables in Python?

A
  • Numeric: integers, floats or complex numbers
  • Text: string, always marked by quotation marks
  • Boolean: True or False
  • Sequences: lists or arrays of numbers/letters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you change the string x = ‘33.3’ to a float?

A

float(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you check the type of a variable?

A

type(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you change the a float to an integer?

A

int(x) - this roads it to a whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you get an input from the user?

A

variable = input(“Enter your name: “)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an expression?

A

Any group of variables or constants that together result in a value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the common symbols used in basic math expressions?

A

*
/
% (remainder)
** (raise to the power of)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you concatenate two strings together?

A

String1 + String2
= String1String2

String1 * 3
String1String1String1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is python indexed?

A

Zero-based indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is string slicing?

A

Extracting certain characters from a string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you access specific parts of a string?

A

Using the index with square bracket notation

  • string[0]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can we change a part of string in place?

A

We can access parts of a string to see their value, but we cannot change them in place - strings are immutable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we access a sequence (sub-string) of any length?

A

By specifying a range to slice. Ranges use a : notation eg [1:10]

The slice occurs before each index (eg between 0 and 1 and 9 and 10)- returning characters 1-9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can we create a new string with slicing?

A

We can store our sub-string as a new variable (then this can be manipulated)

string2 = string1[:8]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is string splitting?

A

String splitting is a very useful method for manipulating strings - it involves breaking a string into multiple parts.

string.split(‘ ‘)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a tuple?

A

A tuple is a type which holds an arbitrary sequence of items, which can be of different types.

They are used to store multiple items in a single variable.

Think multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can you declare a tuple?

A

my_tuple = (‘A’, ‘tuple’, ‘of’, 5, ‘entries’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you access a variable in a tuple?

A

Similar notation as characters in a string

my_tuple[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the advantage of a tuple over a list?

A

Tuples only use a small amount of memory but once created, the items cannot be changed.

Tuples are immutable, like strings

A list is a similar but more flexible data type compared to a tuple.

Lists are also comma-separated, but use square brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Give examples of immutable data types.

A

Tuples
Strings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the difference in declaring a list vs a tuple?

A

Both are comma-separated lists.

Tuples - ()
Lists - []

Tuples are immutable
Lists support assignment - you can access an item and change its value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Lists support assignment - what does that mean?

A

You can access an item and change its value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you access/change items in a list?

A

list[index]

for a list of lists
list[i][j]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you get the length of a list?

A

len(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you compute the sum of values in a list?

A

sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do find the minimum value in a list?

A

min(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you find the maximum value in a list?

A

max(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you make a copy of a list?

A

Store it as another variable

copied = list.copy()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do you add an element to a list?

A

list.append(value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the standard indent in python?

A

Four spaces - can usually tab in most editors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a dictionary?

A

A handy way to store and access data.

A dictionary is a set of keyword and value pairs. You use the keyword to access the value. The value can be of any type, including another dictionary.

dict = { x:y, a:b }

The name of a key is always a string and needs quotation marks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How do you define a dictionary?

A

dict = { x:y, a:b }

The name of a key is always a string and needs quotation marks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is program flow?

A

Controlling which parts of your code get executed when, in what order, how many times, under what conditions, where to start and stop etc.

It is essential to making sure your program actually does what you want it to do.

Flow is controlled mainly by using conditional logic and loops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the advantage of a dictionary?

A

We don’t need to care about where the value we want is, we just have to remember what we called it.

The name of a key is always a string and needs quotation marks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is an if statement?

A

A block of code which first checks if a specified condition is true, and only in that case will it carry out the task

if condition :
# body

It will only be applied to the indented code which follows the :

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is an if-else statement?

A

If statements only execute if the condition is true.
The else statement executes if the condition is false.

if condition :
# code
else :
# code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the elif statement?

A

If-elif-else

if condition 1 :
# code
elif condition 2 :
# code
else :
# code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a loop?

A

A block of code that will iterate (execute consecutively) multiple time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a for loop?

A

A for loop requires something to iterate over, ie an “iterable” like a list (do something for every time in the list) or a string (do something for every character in the string)

for var in iterable :
# code

for i in range(10)
# code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Which is the simplest kind of loop?

A

For loop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How do you get a list of integers of length x, starting with 0?

A

range(x)

list(range(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What are the key words used for control in the flow of a loop?

A

Pass - do nothing
Continue - stop this iteration of the loop early, and go on to the next one
Break - end the loop entirely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How do we open a file in python?

A

open() function

r - reading only
w - for writing, if the file exists it overwrites it, otherwise it creates a new file
a - opens for file appending only, if it doesn’t exist, it creates the file
x - creates a new file, if the file exists it fails
+ - opens a file for updating

syntax:
f = open(‘zen_of_python.txt’, ‘r’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What does “f = open(‘zen_of_python.txt’, ‘r’)” do?

A

‘r’ - opens a file for reading only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What does “f = open(‘zen_of_python.txt’, ‘w’)” do?

A

‘w’ - opens a file for writing. If the file exists, it overwrites it. Otherwise, it creates a new file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What does “f = open(‘zen_of_python.txt’, ‘a’)” do?

A

‘a’ - opens a file for appending only. If the file doesn’t exist, it creates the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What does “f = open(‘zen_of_python.txt’, ‘+’)” do?

A

’+’ - opens a file for updating.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

When are changes to a file saved?

A

When the file is closed

Use the .close() method if not using with/as

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does “f = open(‘zen_of_python.txt’, ‘x’)” do?

A

‘x’ - creates a new file. If the file exists, it fails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What do you have to do once you are finished with a file?

A

Close it, to release memory used in opening the file.

When writing to a file, the changes are not saved until the file is closed.

Use the .close() method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is the basic way to read from a file?

A

f = open(“file_name.txt.”, “r”)

then use
print(f.read()) pr
print(f.readline())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What arguments does the open function take?

A

The name of the file you want to look at and the mode with which you want to interact with the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the difference between .read(),.readline() and .readlines()?

A

.read() reads the entire contents of the file

.readline() reads only the next line, it can be called repeatedly until the entire file has been read

.readlines() is the most useful, it reads each line, one line at a time and then stores it all into a single list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What happens if you run print(fileread()) twice?

A

The first output will print the entire contents of the file.

The second output will be blank. Once the file object has been read to the end, any subsequent calls return an empty string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What happens if you try f.read() from a closed file?

A

Results in an error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

How do you read each line of a file and store all the lines in a list?

A

.readlines()

f = open(“file_name.txt”, “r”)
lines = f.readlines()
f.close()
print(lines)

The file is closed but we have the contents written to a variable, we can then get the lines we want by indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is the safe way to open files?

A

We can make sure that files are only open for as long as we need them by using a with statement

with open(“file_nmae.txt”, “r”) as d:
# put file operations in here
print(f.read())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What happens if you try print(f.read()) after a with/as statement?

A

An error will be produced - the with/as syntax closes the file automatically at the end.

This is important for file writing, less important for file reading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

How do you write to a file?

A

with open(“file_name.txt”, “w”) as f:
f.write(“String”)

Basic input and output only reads and writes strings. The code below will cause an error and result in an empty file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What happens to the contents when you open a file in write mode?

A

It erases any previous contents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

How do you format a string?

A

%s - string
%d - integer
%f - float
%e - float, but using scientific notation

eg(‘%f’, %length)

or (“This is a %d word %s” %(length, datatype)) - can include as many variables as you want by putting several % signs in the string, and providing a tuple after the string.

The first % (inside the string) indicates that we are writing a variable. The letter that follows indicates what type of variable.

The second % sign (after the string) tells your code which variable to write at the first % sign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How can you cadd a tab into the string?

A

“\t”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

How can you add a new line into the string?

A

“\n”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What is a JSON file?

A

A JSON file is structured like a Python dictionary.

JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How is it best to read CSV or JSON files?

A

Using specialised modules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

How do we write JSON?

A

Using the JSON module
Use json.dump to write to file

import json

define dictionary eg masses

with open(“planets.json”, “w”) as f:
json.dump(masses,f)

  • this is the thing you want to dump and the file you want to dump it into
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

How do we read JSON?

A

Using the JSON module
Use json.load to write to file

import json

with open(“planets.json”, “r”) as f:
new_dictionary = json.load(f)

print(new_dictionary) – to investigate that we have successfully read the JSON dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What are the two calls to read and write json?

A

import json

write - json.dump()
read - json.load()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What quotation marks are standard used by JSON?

A

Double quotes

You can define it with single quotes - python doesn’t care but JSON does, so it will convert it eg so that all keys are “”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

When might a dictionary be a string?

A

Dictionaries may be stored as a string if the dictionary is one entry within a larger database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

How do we turn a dictionary into a string?

A

Simply add quotation marks

Can check the type with print(type(item))

If there are “” used in the string, then we create the overall string with ‘ ‘ - if we try to use the same type of quote both around and within the string, it would end the string early

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

How can we turn a string into a dictionary?

A

json.loads()
pronounce load-S

the extra s is for string

eg dict = json.loads(string)

print(type(dict)) to check it was successful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What are the two cases we want to allow code to fail gracefully?

A

Errors - a fundamental issue where python cannot understand your code (syntax error)

Exceptions - code is written in valid Python syntax, but an operation cannot be completed successfully

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

What is the syntax used to predict and catch exceptions under some circumstances?

A

The try/except code

try:
# code
except:

The except prevents the code from crashing and implementing an emergency fallback option.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Why do you need to be cautious about using a generic except statement?

A

It will catch all exceptions - even if the error is not what you think it is.

You should try to catch specific errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What is a ValueError exception?

A

Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

How do you extend the exception-handling block with additional steps to execute after the try…except?

A

try:
# code
except:
# code
else:
# code - do if no exception
finally:
# code - always do this at the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

What is the difference between a type error and value error?

A

Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError , but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

What is the benefit of using a function?

A

Functions are re-usable.

We often want to do the same operation at different times or with different data,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

What is a function?

A

A separate, named block of code for a specific purpose.

The code inside a function is “walled off” from the main code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What is required for a function?

A

every function has a name, a list of (required and optional) inputs in parentheses, and returns something at the end.

def my_function():
return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What is the syntax for defining a function?

A

def my_function():
return

You should give your function a meaningful name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

What are inputs of a function called?

A

Keyword arguments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

How do you call function?

A

Call the function using its name, including the brackets (and any arguments required to be passed in)

eg hello_world()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

If a function requires an argument to be provided, but we don’t provide it, what happens?

A

We get an error message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

When we call a function and assign it to a variable, what happens?

eg sum = my_sum(5, 6)

A

The variable will be assigned the value returned by the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

What are global and local variables?

A

A global variable is a variable defined in the main body of the code. Any code executed after the variable has been defined is able to “see” the variable.

A local variable is a variable defined inside the function or other object. Its value is only accessible within the function or object (ie cannot be accessed outside of the function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

If we want to make an input of a function optional, what do we need to do?

A

Give it a default value

def my_sum(a, b =1):
return a +b

  • If you provide a value for b it will overwrite
  • If you don’t provide a value for b, it will use b = 1 as a default
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

How do you reverse a list?

A

list.revers()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

How do you declare a function with an arbitrary number of variables?

A

def arb_function (*nums):
# code

Within the code, you then loop over nums

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

Why might you want to declare an arbitrary number of variables for a function?

A

You may not know in advance how much data you will need to work with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What do all functions in Python have in common?

A

All functions in Python return something.

If you do not specify a value (or leave out the return statement entirely), the function will return a None value by default.

Otherwise it returns the value we specify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

How many values can you return from a function?

What options do you have for these outputs?

A

You can return more than one value from a function, and return different types.

For the output:
- Provide the same number of variables as the number of values returned/ Each returned value then goes to a separate variable.
- Provide a single variable, this will then contain a list of the values returned by the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

What does the return statement do?

A

Returns variables, ends the function call and returns to the main code.

Therefore any code in the function after the return will not be executed.

This can be convenient if you want to put conditions for what to return.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

What is a lambda function?

A

A quick way to make short functions that can be defined in one line.

They can take any number of arguments, but can only have one expression.

name = lambda vars : code

eg doubler = lambda x: x*2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

How do you define a lambda function?

A

name = lambda vars : code

eg doubler = lambda x: x*2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

When would it be most appropriate to use a lambda function?

A

If we need to create a function for temporary use eg within another function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

How do you add an element to a list?

A

list.append(i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

How do you sort a list?

A

sorted(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

What is a programming paradigm?

A

A paradigm is like a philosophy informing how we write code.

Usually there are many different ways to solve a problem with code. Different paradigms help to shape which approach we choose to use.

Procedural programming.
Object-oriented programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

What are the most common paradigms in python?

A

Procedural programming - the code is organised as a sequence of instructions (do this, then this). Each block performs a prescribed task to solve the problem.

OOP - data are stored as “objects” belonging to pre-defined “classes”. These objects have a set of “attributes” stored internally, which can be updated using built in “methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

What is a class?

A

A class is like a template designed in advance to handle a particular data structure, with a set of properties called attributes.

It also provides implementations of behaviour (member functions or methods). The syntax looks like class_name.function_name()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

How do you reverse a list?

A

list.reverse()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

How do you investigate all of the attributes and functions of a class or object?

A

dir(x)

or print(dir(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

What are alternative names for attributes and methods?

A

Attributes - properties
Methods - functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q

What do attributes with a double underscore represent?

A

Attributes internal to python that cannot be updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

How do you create a new list?

A

my_list = [x,y,z]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

What is the relationship of an object and a class?

A

Any object is an instance of a class, created to represent a particular item of data.

An instance ie one specific example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q

What do methods of an object do?

A

Update the internal state of the object eg reversing the list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
113
Q

How can you check the class of an object?

A

object.__class__

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
114
Q

How do you create a class?

A

eg
class Animal():
# Can list attributes
# Can define functions using def function():

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
115
Q

How do you create an object? (ie particular instance of the class)

A

object = Class()

Passing in attributes as appropriate. In this case, the attributes would be set to their defaults.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
116
Q

What is creating an object (ie. a particular instance of the class called?

A

Instantiation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
117
Q

How do you check the value of an attribute for an object?

A

object.attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
118
Q

How do you update the attribute of an object?

A

object.attribute = value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
119
Q

How do you create a new attribute of an object?

A

object.attribute = value

We can add attributes to class instances, we can’t edit the parent class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
120
Q

Why use classes?

A

Objects store data in a way where it is easy to update and display the internal state of that data, using built-in methods.

OOP allows you to put your methods next to the data.

Once we have defined useful classes and instantiated objects, an OO code will mainly interact with the data object through its built-in methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
121
Q

What function do we use when we create a class and we know that we will create many objects from that same class, with shared attributes and want to assign values when creating each object?

A

The __init__ function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
122
Q

What does the __init__ function do?

A

The python __init__ method is declared within a class and is used to initialise the attributes of an object as soon as the object is formed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
123
Q

How do you use the __init__ function?

A

class Animal():
def __init__(self, attribute1, attribute2):
self.att1 = val
self.att2 = val

Give the __init__ function a list of arguments, the first argument is always self. This is a special variable which represents the object itself once we have created it ( a self-referential thing)

__init__ initialises the attributes of the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
124
Q

What is the self parameter?

A

The self parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
125
Q

What does self.x mean?

A

The attribute “x” belonging to the object “self”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
126
Q

When using the __init__ function, what attribute can the functions defined take?

A

Taking the “self” function as an input, you can then access any attribute with self.x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
127
Q

What is a benefit of using __init__ when we create objects?

A

The object can be created and attributes defined in one line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
128
Q

What is hierarchical inheritance?

A

Hierarchical inheritance is a type of inheritance in which multiple classes inherit from a single superclass.

“Parent” class - Animal
“Child” class - Cat, Dog etc.
Object - your pet

A child class inherits the attributes of its parents class, but we can also add new method and attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
129
Q

How do you create a child class of a parent class?

A

Put the name of the parent class in the brackets when creating the child.

eg class Cat(Animal):
# put attributes from the parent class that should be fixed for the child class first
# then use super().__init__(attribute1, attribute2) within __init__ for the attributes we want to specify new values for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
130
Q

How do we define the __init__ function for a child class?

A

def __init__(self, attribute1, attribute2):
super().__init__(attribute1, attribute2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
131
Q

If you define a useful function or class and want to use it in many different codes, instead of copying and pasting the code what can you do?

A

Make the code into a Python module.

This is a python file (extension .py) containing one or more classes or functions.

You can then import the class or function from the module easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
132
Q

What is a python module?

A

A python file (extension .py) containing one or more classes or functions.

You can then import the class or function from the module easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
133
Q

How do you import a module?

A

Ensure the .py file is in the same directory as the current notebook.

eg for the class Dog from the animals.py module

from animals import Dog

From this import, you can create Dog objects

(can also define and import functions, they don’t need to be part of a class, making it very easy to reuse them)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
134
Q

What is a python package?

A

A collection of modules.

You can install (download) these packages and then have access to incredibly useful functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
135
Q

What is NumPy?

A

Numerical python - a built-in module

A module is a pre-defined collection of functions we can load into our programs.

NumPy arrays are multidimensional array objects.

We can use NumPy arrays for efficient computations on large data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
136
Q

How do we import NumPy?

A

import numpy as np

Then call functions as eg np.sin(0)

Alternatives (don’t use):
import numpy - importing the entire numpy module

from numpy import sin - importing only a specific function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
137
Q

When might you only import a specific function required, rather than an entire module?

A

If we don’t want to use up memory on the whole library.

We would need to know the name of the function/class hat we want to import in advance.

The specific function is now a global name, we don’t need to specify the module.

This could cause issues when there are functions with the same name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
138
Q

How do you call trigonometry functions?

A

sin(), cos() and tan()

The value passed in this function should be in radians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
139
Q

How do we investigate the contents of a module?

A

import module first

dir()
eg dir(np)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
140
Q

What is an array?

A

A “grid” of values, all of the same dtype
ie all floats, or all strings etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
141
Q

What is the difference between a 1D array and a list?

A

They look similar at a first.

  • Array use less memory. Therefore array is much more efficient than a list, particularly for a large collection of elements.
  • Lists are more flexible (can have mixed types)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
142
Q

How do you create a list of length x?

A

my_list = []
for i in range(x):
my_list.append(i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
143
Q

How do you print the first x items of a list?

A

print(my_list[:x])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
144
Q

How do your create a Numpy array from a list?

A

my_array = np.array(my_list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
145
Q

How do you print the first x items of an array?

A

print(my_array[:10])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
146
Q

How do you print the first array item?

A

print(my_array[0])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
147
Q

How do you print the last item of an array?

A

print(my_array[-1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
148
Q

How do you print the time taken to execute a cell?

A

%%time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
149
Q

Why is there a difference in the time taken for an operation on a list vs an operation on an array?

A

Lists - operations can only be performed on items, so calculates have to be one at a time.

Arrays - operation is performed on all elements with single function call - this is much quicker for large data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
150
Q

Why are arrays more convenient for many mathematical operations?

A

You can write one line of code rather than a loop.

Eg 3D array requires 2 nested for loops.

With NumPy,, we don’t have to worry about the array shape (it is automatically preserved)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
151
Q

What are different ways to create an array?

A

From a list:
my_array = np.array(my_list)

Create an array of zeros (n = how many elements you want)
np.zeros(n)

Create an array of ones
np.ones(n)

Create an array of numbers from a to b, with spacing c
np.arange(1, 10, 1)

Create an evenly spaced array from a to b, with c points
np.linspace(1, 10, 19)

Create an array of random numbers from 0 to 1 of length n
np.random.random(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
152
Q

What does np.arange() do?

A

Creates an array of numbers from a to b with spacing c.

Pass in where to start, where to stop and the spacing that you want

Stops before the stop number ie if you want 1 - 10
np.arange(1, 11, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
153
Q

What does np.linspace() do?

A

Creates an evenly spaced array from a to be with C points.

State how many points you want. This enables better precision and can be used to control the number of samples that you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
154
Q

What function creates an array of numbers from a to b with spacing c?

A

np.arange(a, b, c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
155
Q

What does np.random.random(n) do?

A

Create an array of random numbers from 0 to 1 of length n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
156
Q

How do you create an array of random numbers of length n?

A

np.random.random(n)

This creates an array of random numbers ranging from 0 to 1. Can apply transformations to get it into a range you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
157
Q

What is each dimension of an array called in NumPy?

A

An axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
158
Q

How do you initialise a higher-dimension NumPy array?

A

You need to specify the data along each axis.

eg
my_2d_array = np.array(
[ [1, 2, 3],
[4, 5, 6] ]
)

This is a nested list. The items are the rows. Items at the same position (the same index) within each sub-list form the columns

eg
my_3d_array = np.array(
[ [ [1,2], [3,4], [5,6]],
[ [7,8], [9,10], [11,12] ] ]
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
159
Q

How do you select an element from a 2D array?

A

We need to supply N indexes, equal to the number of axes (dimensions)

print(my_2d_array[0,0])
print(my_2d_array[0][0])

The first index is the row, the second is the column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
160
Q

How do you select a whole row or column from a 2D array?

A

Get the first row, all columns
my_2d_array[0,:]

Get all rows, the first column
my_2d_array[:,0]

Get all rows, last two columns
my_2d_array[:,1:3]

The slice of the array is itself returned as an array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
161
Q

How do you determine how many dimensions an array has?

A

Determine the dimensions an array has AND the size of each dimension

my_array.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
162
Q

What happens if you apply a simple expression to an array? eg array * 2

A

We can do this, the operation is applied to every element in the array. We are adding a scalar (constant value) to the array.

Can do add, sub, mult, div

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
163
Q

What happens if we multiply two arrays together?

A

We can add, sub, mult, div one array with another - but the behaviour is different to scalar mathematic expressions.

Each element of the array will operate on the element in the same position in the other array.

eg my_array * my_array - the array is squared

If arrays are different shapes/sizes you can get errors or unexpected behaviours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
164
Q

How can you combine arrays?

A

Combine array with shape (m,n) with:
- Array with shape (1, n)
- Array with shape (m, 1)
ie a 1D array with the same number of rows or columns as the data.

When you add/multiply, it will repeat the 1D array as many times as needed, in order to match the rows.columns in your data.

The new array is “broadcast” to the shape of your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
165
Q

What is masking?

A

Masking is the term used for selecting entries in arrays, e.g. depending on its content.

We can apply that mask to our data to retrieve a new array that only contains masked values.

We can specify conditions and return sub-sets of the array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
166
Q

What is a mask for getting even numbers?

A

even_numbers = (my_array %2 == 0)
my_array[even_numbers]

Testing each element for the condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
167
Q

What does (my_array %2 == 0) return?

A

The conditional statement returns an array of Boolean True/False, with the same shape as the array.

This can be used as a mask to pick out only the array elements where the condition is true.

We mask arrays using square bracket notation, similar to slicing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
168
Q

How do you apply multiple masks at the same time?

A

Using &

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
169
Q

Why do we often work with 2D arrays in data science?

A

They are good for holding tabular data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
170
Q

How do you get pi in the Jupyter notebook?

A

Import numpy as np

np.pi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
171
Q

How do you generate data to plot for a sin curve?

A

x = np.linspace(0, 2*np.pi, 100)

y = np.sin(x)

Combine the two arrays into a new array
data = np.column_stack([x,y])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
172
Q

How do you combine two arrays into a new array?

A

Using column_stack() or row_stack() functions

Takes one argument - a list of arrays to stack

The arrays to stack must be the same length as each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
173
Q

How can we change the shape of an array?

A

We transpose the array using .T
eg transposed_data = data.T

For more complicated manipulation we can use the shape and reshape methods
- data.shape to check the current shape
- rd = data.reshape(2,100) - instead of 100 rows and 2 columns, reshape to 2 rows and 100 columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
174
Q

How do you transpose data?

A

data.T

The rows are now the columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
175
Q

How do you reshape data?

A

reshaped_data = data.reshape(2,100)

The size of the new array (n rows * n columns) must match the original ie the product of the axis lengths is constant.

eg instead of 2 columns and 100 rows we can have 2 rows and 100 columns

OR three_d_data = data.reshape(2,50,2)
rows, columns, number of elements in each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
176
Q

How do you calculate the sum of elements of an array?

A
  • OOP
    data.sum()
  • Procedural
    np.sum(data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
177
Q

How do you find the minimum and maximum of an array?

A

Method approach

data.min()
data.max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
178
Q

How do you compute statistics on the slices of an array?

A

Possible because the slice is just another array.

eg mean of the first column
print(np.mean(data[:,1]))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
179
Q

How do you write an array to a file to save for later use?

A

Using the savetxt() function

Required arguments are the name of the file to save to (created if it does not exist, otherwise it will be overwritten by default) and the array to save. We can also specify the format of the data and the character to separate the data.

np.savetxt(‘name.csv’, data, fmt=’%.4f’, delimiter=’,’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
180
Q

How do you load data from a file?

A

loadtxt() or genfromtxt() functions

Required argument - file name.
You can also specify the delimiter and dtype to ensure desired behaviour

eg arr = np.genfromtxt(“file.csv”, delimiter=’,’, dtype= ‘float’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
181
Q

What is the standard plotting library in python?

A

Matplotlib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
182
Q

What is Matplotlib?

A

A comprehensive library for creating static, animated and interactive visualisations in Python.

It makes easy things easy and hard things possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
183
Q

How do you import the matplotlib module?

A

import matplotlib.pyplot as plt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
184
Q

What is pyplot?

A

A set of functions that can be used to create a figure with procedural programming.

For better control over plotting, it is recommended to use an OO approach with Matlib objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
185
Q

What are the fundamental objects used in Matplotlib?

A

Figures - the entire area of the figure
Axes - the area for plotting data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
186
Q

How do you create the axis and figure for a plot?

When should you do this?

A

fig, ax = plt.subplots()

Do this at the start of the plot

187
Q

How do you obtain the size of the figure in pixels?

A

print(fig)

188
Q

What is the default resolution of the figure in pixels?

A

The default resolution is 100 pixels per inch.

189
Q

How can you specify the size of the figure?

A

Using the figsize argument

fig, ax = plt.subplots(figsize=(7,5))

Size in inches

190
Q

What condition needs to be met to plot some simple lines?

A

The points along the lines can be given as a list of x and y coordinates which must be the same length.

191
Q

What is the minimal code for plotting a line graph?

A

fig, ax = plt.subplots()

x = […]
y = […]

ax.plot(x,y)

ax.set_xlabel(“X”)
ax.set_ylabel(“Y”)

plt.show()

This is the OOP approach

192
Q

How do you set labels on your plot?

A

ax.set_xlabel(“X”)
ax.set_ylabel(“Y”)

Procedural:
plt.xlabel(“X”)
plt.ylabel(“Y”)

193
Q

How do you display the plot in Jupyter?

A

plt.show() is used in Python to display the plot, not always needed in Jupyter.

194
Q

What differences are there between the OOP and procedural approaches in plotting?

A

Procedural approach - we call functions from pyplot

Using methods (OOP) often start with set_, functions often do not eg plt.xlabel.

In procedural, we don’t tell Pyplot which axis to plot the data on, it infers which axis to use (the most recent one).

195
Q

What kind of objects is matplotlib built to handle?

A

Numpy arrays

196
Q

How do you plot two columns?

A

ax.plot(data[:,0],data[:,1])

197
Q

What ways can you customise a plot?

A
  • Changing the units
  • Changing the upper and lower limits on the axes
  • Changing the axes tick marks
  • Adding another curve to a figure using the legends
  • Changing line styles and colours
  • Add arbitrary text labels
  • Add a title
198
Q

How can you customise the units of a plot?

A

Apply conversion in ax.plot. Can do numpy operations directly in the .plot as long as it produces another array.

eg ax.plot(data[:,0]/np.pi*180, data[:,1])

199
Q

What is the conversion between degrees and radians?

A

degrees = radians * pi/180

200
Q

How can we change the upper and lower limits on the axis?

A

ax.set_xlim()

eg ax.set_xlim(0,360)

201
Q

How can we change the axes tick marks?

A

ax.set_xticks([…])
ax.set_yticks([…])

Pass in a list of the tick marks you want.

202
Q

How do you add a second curve to a plot?

A

Use two ax.plot functions in the one plot.

203
Q

How do you distinguish between two curves on the same plot?

A

Add a legend

ax.legend()

204
Q

How do you change the location of a legend?

A

Using the loc keyword
- ‘lower’, ‘center’, ‘upper’ for vertical placement
- ‘left’, ‘center’, ‘right’ for horizontal placement

eg ax.legend(loc = ‘upper center’)

205
Q

How do you add a box to the legend?

A

frameon=True

eg ax.legend(loc = ‘upper center’, frameon=True)

206
Q

How do you alter the thickness and style of. line?

A

Add lifestyle=’-‘ and line width = 2 to ax.plot

Available line styles include ‘-‘ (solid), ‘–’ (dashed), ‘:’ (dotted), ‘-.’ (dash-dot), ‘–.’

eg ax.plot(data[:,0]/np.pi*180, cosine, label=’cos(x)’, color=’deeppink’, linestyle=’–’, linewidth=2)

207
Q

How do you change the colour of a plotted line?

A

Add color=”” to ax.plot

208
Q

How do you add an arbitrary text label to a plot?

A

ax.text(120, 1, “Maximum”, fontsize=20)

Providing the coordinates where you want to write the text and the string you want to put in.

209
Q

How can you customise font size?

A

fontsize =

210
Q

How do you add a title to a plot?

A

ax.set_title(“Title”)

211
Q

How do we display multiple axes on the same figure?

A

This means showing different information on different panels of a single figure.

Using the plt.subplots() function we can specify how many axes in the vertical direction with nrows and the horizontal direction with ncols.

fig, axes = plt.subplots(figsize=(8,8), nrows=2, ncols=1)
ax1 = axes[0]
ax2 = axes[1]

now we access using ax1 and ax2 etc.

212
Q

What is the keyword to generate a line graph?

213
Q

How do you create a scatter plot?

A

ax.scatter(x, y, marker=”o”)

214
Q

How do you plot a scatter plot with error bars?

A

plt.errorbar(x, y, xerr, yerr, fmt=”o”, color=”r”)

215
Q

In what ways can we customise a scatter plot?

A

Shape and colour of the plots for errorbar

Outline:
- ‘.’ : point
- ‘+’, ‘x’ : crosses

Filled:
- ‘o’ : circle
- ‘s’ : square
- ‘^’, ‘<’, ‘>’, ‘v’ : triangles in different directions
- ‘d’, ‘D’; ‘p’, ‘P’; ‘h’, ‘H’ : different types of diamond, pentagon or hexagon
- ‘*’ : star

Line plots:
- ‘-‘, ‘–’, ‘:’ etc

fmt=’s’
color=’gold’
markersize=6
markeredgewidth=2
markeredgecolor=’k’
ecolor=’k

216
Q

How do you control the shape of the errorbar plot?

A

fmt (format)

217
Q

How do you create a histogram?

A

ax.hist(x)

Specifying the bins:
ax.hist(x, bins=20)

218
Q

What is the default number of bins if not specified?

A

10 bins (of equal width)

219
Q

What should you consider when choosing the size of your bin?

A

With finer bins, we can see more detail in the distribution.

But if we use too many bins we can overdo it and end up with lots of misleading gaps.

220
Q

How can you further customise a histogram?

A
  • Changing colour
  • Changing from a filled histogram to an outline
  • Normalise the histogram to plot the probability density rather than total frequency

ax.hist(heights, bins=20, color=’teal’, histtype=’step’, density=’True’)

221
Q

How do we get the values of the bin edges and the numbers in each bin?

A

The hist function returns these already.

counts = ax.hist(x)

numbers in each bin - counts[0]
boundary edges - counts[1]

222
Q

What kind of plots are useful for categorical data?

A

Bar charts and pie charts

223
Q

How do you create a bar chart?

A

ax.bar(categories, counts, color=bar_colors)

These are all lists to be passed in

224
Q

How do you create a pie chart?

A

ax.pie(counts, labels=categories, colors=bar_colors, autopct=’%d’)

Include optional argument, auto percent to print the percentages - d means it prints as an integer

225
Q

How do you display image data in Matplotlib?

A

Matplot has an easy way to make plots using images (eg a picture or photograph)

Data must be provided as a 2D NumPy array. Matplotlib will display the array as a grid of pixels, with the intensity of each pixel determined by the value of the array at that position.

image = np.gemfromtxt(“pixels.txt”)

fig, ax = plt.subplots(figsize=(8,8))

ax.imshow(image, origin=’lower’, cmap=’Greys_r’, vmin=0, vmax=300)

  • Origin determines which way up it will be printed
  • CMAP - what colour do you want it print
  • Vmin and max are saturation points (less than 0 = fully black, above 300 = fully white, important for contrast)
226
Q

What do you do if you don’t want to show any tick marks on a figure, eg for an image?

A

ax.set_xticks([])
ax.set_yticks([])

227
Q

How do you “zoom in” on an important part of an image? How do you add a circle to highlight this?

A

Using array slicing to zoom in
ax.imshow(image[80:220,80:220], origin=’lower’, cmap=’Greys_r’, vmin=0, vmax=1000)

Highlighting key features
ax.scatter(70,70,marker=’o’,s=10000,c=’None’,edgecolors=’r’,label=’Supernova’)

228
Q

How do you save a plot?

A

Reduce whitespace around your figure:
plt.tight_layout(pad=0.5)

Save your plot:
plt.savefig(‘image.png’)

229
Q

What is Pandas?

A

Pandas builds on NumPy and introduces a new object called a data frame (or a series if one-dimensional)

230
Q

What is the difference between a dataframe and pandas series?

A

Pandas series is one-dimensional (more similar to a list rather than a tabular structure)

231
Q

What advantages do data frames provide for data science?

A
  • A data frame looks like a table or spreadsheet, with convenient column and row labels
  • A data frame includes methods for sorting, filtering and performing complex operations on data
  • Columns can be of different data types (unlike an array)
  • Provides some of the functionality of an array
232
Q

How do you load a dataframe from a file?

A

import pandas as pd

df = pd.read_csv(“data.csv”)

When we load the data into Pandas, the first row is assumed to be the column headings. If we wanted to we could override this behaviour by providing a list of column names to an optional keyword, names=.

233
Q

How do you import pandas?

A

import pandas as pd

234
Q

How do you determine the number of rows in a data frame?

A

Length

len(df)

235
Q

How do you examine the first few rows of a dataframe?

A

df.head(x) - where x is the number of rows to display

236
Q

If you want to display the dataframe, what could you do?

A

print(df) but this isn’t very nice

can call df directly but this must be the last command in the cell

237
Q

How are rows indexed?

A

The rows are given numerical indices by default.

Sometimes one of the columns in the data is already a convenient index. We can assign this as the index

df = pd.read_csv(‘titanic.csv’, index_col=’PassengerId’)

238
Q

How do you assign an index within the read_csv() function?

A

index_col=”Column Name”

239
Q

What should the first step of any data analysis be?

A

Clean up the data set

  • removing unwanted data, missing values or duplicates
240
Q

How do you drop a column?

A

df = df/drop(columns=[“X”])

you can drop multiple columns at once by providing a list of column labels

241
Q

How might missing values be represented in the dataset?

A

NaN

Not a number value - usually this represents a missing value

242
Q

Check for missing values

A

df.isna()

eg df.isna().head(6)

243
Q

How do we remove rows with NaN values?

A

df.dropna()

eg
df = df.dropna(subset=[‘Age’])

If we don’t specify a subset of columns to use, it will remove all rows that have a NaN in any column.

244
Q

How can we remove any rows that appear more than once in the data set?

A

df.drop_duplicates()

df.drop_duplicates(subset=’Ticket’)

245
Q

How do we slice data from a pandas data frame?

A

Using loc() and iloc()

NB: they use square brackets like in array indexing

loc gets rows (and/or columns) with particular labels.

iloc gets rows (and/or columns) at integer locations.

246
Q

How do we get a single column from a data frame?

A

df[‘Age’]

Slicing syntax

247
Q

How do we get the first row of a data frame?

A

df.iloc[0]

As this is only 1D it displays a series.

248
Q

How do we get rows 100-110 of a dataframe?

A

df.iloc[100:110]

249
Q

How do we get rows 100-110 and the first four columns of a dataframe?

A

df.iloc[100:110, :4]

This will return 5 columns in total - the index column and then the first 4 columns

250
Q

Why might loc() be more useful than iloc()?

A

We may not know the specific index to search for but we do know the column title.

251
Q

How do we return only the “Name” column?

A

df.loc[:,’Name’]

252
Q

How do we retrieve the name, sex and age columns of the first 10 passengers?

A

df.loc[:10, ‘Name’:’Age’]

df.loc[:10, [‘Name’,’Age’, ‘Fare’]]

NB: you can retrieve non-consecutive rows/columns by providing a list. Therefore df.loc is very flexible.

253
Q

How do you use loc or iloc to return only the rows or columns where a certain condition is met?

A

Masking - provide an array of T/F to loc.

eg df.loc[(df[‘Pclass’]==1)]

254
Q

How do you check whether values in a data frame column are in a list of possible values?

A

Using the .isin() method

eg
df.loc[df[‘Pclass’].isin([1,2])]

255
Q

How do you compute summary statistics for numerical columns in a data frame?

A

df[‘Age’].mean()

mean / min / max - these calculations automatically ignore NaN values

256
Q

How do you sort values for a column in a data frame?

A

df[‘Age’].sort_values()

257
Q

How do you sort entries in a data frame by a particular column?

A

df.sort_values(by=’Age’)

df.sort_values(by=’Age’, ascending=False)

258
Q

How can we get the length of a dataframe / column?

A

len() python function - number of rows

df.size pandas property - number of cells ie rows by columns

259
Q

What is the result of adding two columns together?

A

A 1D data frame ( a series) - the original indices are still present.

To get only the values, we can access df.values - this is a property not a method, so no()

260
Q

How do we get the values of a 1D dataframe / pandas series?

A

df.values

No brackets, it is a property not a method.

261
Q

How do we get the values of a column?

A

df[“Name”].values

262
Q

How do you add columns together?

A

df.add() method

eg
relatives = df[‘SibSp’].add(df[‘Parch’], fill_value=0)

store it in a new column:
df[‘Relatives’] = df[‘SibSp’].add(df[‘Parch’], fill_value=0)

263
Q

What is the advantage of using the df.add() method to add columns, rather than using the + operator?

A

It will not try to add a NaN if the column has missing values, you can specify what value to use in place of NaNs by including a fill_value

This is safer for handling NaN values rather than simple addition

264
Q

What operations can you use on columns so that NaN values can be handled appropriately?

A

df.add()
df.subtract()
df.multiply()
df.divide()

265
Q

How can you plot data from a data frame?

A

Matplotlib can naturally understand data frames just like Numpy arrays. You can pass the columns directly to plotting commands.

eg
ax.hist(df[‘Age’], bins=30)

266
Q

How do we apply functions to an entire column?

A

The df.apply() method
Define the function needed if required.
df[“Name”].apply(function_name)
Returns a series of

Quicker = lambda functions, define as a temporary function inside apply()
df[‘Name’].apply(lambda x: x.split(‘,’)[0])

267
Q

How do you split a string?

A

my_string.split(‘,’)

268
Q

How do you get the first/last part of a split string?

A

my_string.split(‘,’)[0]
my_string.split(‘,’)[-1]

269
Q

What is a benefit of using a lambda function in apply()

A

It is defined as a temporary function and avoids using memory for a function that is used only once

270
Q

How do you group data in a data frame?

A

df.groupby()

classes = df.groupby(‘Pclass’)

This provides a dictionary, where the keys are the groups and each contains a list of row indexes.

The object produced has the function/attribute .groups

print(classes.groups) to see the index of the rows belonging to each key.

271
Q

How do you see the keys (ie the groups) from grouped data?

A

new_groups = df.groupby(‘Embarked’)

new_groups.groups.keys()

272
Q

Why is grouping useful?

A

We can quickly calculate statistics separately on each of the different groups.

Allows us to investigate aggregated data rather than on the whole dictionary directly.

We can do this with any column of our grouped data using square bracket notation.

273
Q

How do you calculate summary statistics for a grouped data frame?

A

classes = df.groupby(‘Pclass’)
classes[‘Fare’].mean()

This returns a value for each of the group keys

274
Q

How do you determine how many entries fall into each group?

A

Look at the size of each group - classes.size()

NB: for a dataframe object, size is a property (no parentheses) but for a grouped object is is a method, requires parentheses

275
Q

How do you determine and rank by how many entries fall into each group?

A

classes.size().sort_values(ascending=False)

276
Q

How do you create a dataframe for a specific group?

A

Use get_group()

first = classes.get_group(“Group Name”)

The argument “Group Name” should match one of the keys in classes.groups.keys().

277
Q

How do you make an array of 12 random integers from 40 to 100?

A

data = np.random.randint(low=40, high=100, size=12)

This will make a 12 x 1 array

278
Q

How do you convert a Numpy array into a dataframe?

A

Use the pd.DataFrame function

df = pd.DataFrame(data)

where data is a NumPy array

The shape and content is preserved, but the rows and columns now have explicit names. By default the NumPy row and column indices.

279
Q

How do you retrieve a column from a data frame?

A

Using familiar square bracket notation with the name of the column

df[“Name”] or df[1]

or more explicitly (better for more complex selections)
df.loc[:, “Name”]

280
Q

When creating a data frame from an array, rather than using default indices, how can we create memorable column headings or row indices?

A

use index= and columns= attributes

eg
df = pd.DataFrame(data, index=[‘Matt’,’Jonathan’,’Fiona’,’Deepak’], columns=[‘DSA8001’,’DSA8002’,’DSA8003’])

281
Q

How do we create a dataframe directly from a dictionary?

A

data_dict = {
“Module 1”: {“Matt”:80, “John”:60},
“Module 2”: {“Matt”:70, “John”:63},
“Module 3”: {“Matt”:82, “John”:76},
}

df = pd.DataFrame(data_dict)

Outer keys define the column headings ie modules will be the column
Each nested dictionary defines one row

282
Q

Why do we not need to specify labels when creating a data frame directly from a dictionary?

A

Pandas will use the dictionary keys

283
Q

How can you add a column to an existing dataframe?

A

Insert method

df.insert(loc=1, column=”Name”, value=data)

The length of the array data needs to match the number of rows in the df.

284
Q

How do you add a new row to a data frame?

A

Concat function

Concatenates a new data frame to the end of the current one

df = pd.DataFrame(new_student, index=[‘New Student’], columns=[‘DSA8001’,’DSA8002’,’DSA8003’,’DSA8021’])

May need to reshape the data to be added after creation, using reshape.

Concatenation only works well if the column labels match. It will fill in things with NaN an may convert existing data (NaN is not an integer, things may be converted to float).

285
Q

How do you save a data frame to a file?

A

to_csv() - typically we save as a CSV file using this built-in dataframe method

df.to_csv(“file_name.csv”)

286
Q

What might happen if we repeatedly read, edit and save CSV files with Pandas?

A

When you open a data frame with read_csv(), it adds a numerical index column by default.

We may end up doing this repeatedly, adding another index column each time.

Best to specify a particular column to use for the row indices when reading in the CSV

df.read_csv(“file.csv”, index_col=0)

287
Q

How should you read in a data frame from a file?

A

df.read_csv(“file.csv”, index_col=0)

288
Q

Some columns contain JSON data, how is this formatted?

A

JSON is a string, formatted like a dictionary.

It is very flexible for dataframe columns that need to contain complex information.

289
Q

How is complex information stored in a data frame column?

A

JSON data - can be stored as a dictionary

290
Q

Before working with complex data stored in a column in JSON format, what do we need to do?

A

import json

291
Q

How do you create a data frame with JSON in a column?

A

eg
df = pd.DataFrame(index=[‘Matt’,’Jonathan’,’Fiona’,’Deepak’], columns=[‘module_scores’])

292
Q

What function is used to retrieve information from a data frame with JSON data?

A

json.loads(x)

eg getting the “DSA8002” column info.
df[‘module_scores’].apply(lambda x: json.loads(x)[‘DSA8002’])

293
Q

What does json.loads do?

A

The json.loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary

294
Q

How do you get the value of a specific row and column from JSON data in a data frame?

A

Index the series returned from json.loads(x) like any other data frame

eg
df[‘module_scores’].apply(lambda x: json.loads(x)[‘DSA8002’]).loc[‘Matt’]

295
Q

How do we convert a column to a date time dtype?

A

pd.to_datetime

df[‘datetime’] = pd.to_datetime(df[‘datetime’])

296
Q

How do we check the data type of a column?

A

df[‘datetime’].dtypes

297
Q

How do we extract hours/years etc. from a date time object? (from the timestamp)

A

df[‘hour’] = df[‘datetime’].dt.hour

or dt.year etc.

298
Q

How do we calculate eg the total sales for each category in a data frame?

A

Group by and then sum

spend_by_hour = df.groupby(‘hour’)

spend_by_hour[‘total’].sum()

299
Q

When would data frame merging be more useful?

A

If two data frames contain only some columns in common, it is often more useful to merge rather than concatenate.

300
Q

What is the theory of merging two databases?

A

We find the columns in common between the two databases and return a set of rows with those columns.

301
Q

What is merging a pandas data frame equivalent to?

A

JOIN statements in SQL.

302
Q

What are JOIN statements in SQL equivalent to in pandas data frame?

303
Q

What function is used to merge data frames?

A

pd.merge(df1, df2, on=”Column”, how=”left”)

304
Q

What are the different ways data frames can be merged?

A
  • Left join - keep everything in the left table and what’s in the right table if available
  • Right join - keep everything in the right table and what’s in the left table if available
  • Inner join - return only entries that are present in both tables
  • Outer merge - returns all rows across both tables
305
Q

What is the opposite of the inner join?

A

The outer join - returns everything across both tables

306
Q

Which type of merge is most likely to have lots of NaNs?

A

The outer merge / full join in SQL

307
Q

Are data frames static?

A

No, new data can be inserted by adding rows or columns

308
Q

What is SQL?

A

Structured Query Language

Used as a tool to search relational databases. Can search, filter, group or combine databases to return entries matching certain criteria.

Most popular language to manage relational databases

309
Q

What is a relational database?

A
  • Data are stored as a table or tables with rows (records or tuples) and columns (attributes)
  • Each record has a unique key
  • Each table represents a particular type of data eg one table to store information on customers, another to store products
310
Q

What is the advantage of SQL?

A

It is written closer to natural language, so queries can be constructed more intuitievely..

311
Q

What are the different data types in SQL?

A

Numeric - eg INTEGER, FLOAT(p) with p digits of precision

String types - CHARACTER (L) with fixed length L, or VARCHAR(L) with a maximum length L

DATE, TIME

BOOLEAN (True, False)

312
Q

Why are we able to use SQL to perform queries on pandas data frames?

A

Pandas data frames are relational databases.

313
Q

Before performing SQL queries directly in Python/Pandas, what do we need to do?

A

import pandas as pd
import pandasql as ps

314
Q

What is pandasql?

A

A handy python module to query pandas data frames

315
Q

If you don’t have pandasql, what should you do?

A

!pip install pandasql

316
Q

What is the general syntax for writing and executing an SQL command in the Jupyter notebook?

A

query = ‘’’

’’’

ps.sqldf(query)

317
Q

What is a simple query to fetch all data?

A

query = ‘’’
SELECT *
FROM dataframe
‘’’

ps.sqldf(query)

318
Q

What are SQL queries composed of?

A

Combination of “clauses” with the names of tables and/or columns

319
Q

What clause returns entries of interest?

320
Q

What does the SELECT clause do?

A

Returns entries

321
Q

What does the FROM clause do?

A

Tells SQL which database to select the columns from

322
Q

Why are SQL clauses written in capital letters?

A

They are not case sensitive.

Writing in capitals helps differentiate them from the names of tables etc.

323
Q

What is returned when we use PandaSQL?

A

A Pandas DataFrame - which is very convenient for further database operations.

324
Q

How do you select a specific column from a database?

A

query = ‘’’
SELECT “Column Name1”, “Column Name2”
FROM dataframe
‘’’

ps.sqldf(query)

325
Q

How do we return entries that have a specific attribute?

A

Use conditional searches using the WHERE clause.

query = ‘’’
SELECT *
FROM dataframe
WHERE city = “Belfast”
‘’’

ps.sqldf(query)

NB: single = sign, not ==
NB: Need to put string of interest in different quotation marks to overall string query

326
Q

What conditions can we apply with WHERE?

A
  • =, <, <=, >, >=
  • BETWEEN X AND Y - number in a specified range
  • IN (‘X’, ‘Y’) - values in a given list
  • LIKE ‘%YZ%’ - value matches a given pattern YZ where % is used to represent free text before and/or after the pattern
327
Q

How can we apply multiple condition at the same time in SQL?

A

Use the AND clause

328
Q

How can we sort data by column values in SQL?

A

query = ‘’’
SELECT *
FROM dataframe
ORDER BY column DESC
‘’’

ps.sqldf(query)

Can specify ASC or DSC

329
Q

How can we sort data by multiple column values?

A

query = ‘’’
SELECT *
FROM dataframe
ORDER BY column DESC, column2 ASC
‘’’

ps.sqldf(query)

The ordering is applied one after the other

330
Q

How do we modify the SQL query so that we only return a small number of rows?

A

The LIMIT clause. This is similar to the head() function in Pandas.

query = ‘’’
SELECT *
FROM dataframe
LIMIT 5
‘’’

ps.sqldf(query)

331
Q

What kind of data aggregation computing statistics are usually performed in SQL?

A
  • COUNT - returns the number of records
  • MIN, MAX - returns the smallest/largest entries in a column
  • SUM - sum of entries in a column
  • AVG - average of entries in a. column
332
Q

How do you find out eg how many women are in a database using SQL?

A

query = ‘’’
SELECT COUNT(*)
FROM dataframe
WHERE Gender = “Female”
‘’’

ps.sqldf(query)

NB: you would get the same answer whether you count the whole database or a single column - the number of rows will be the same either way.

333
Q

How do we use simple expressions in SQL to return a calculation?

A

query = ‘’’
SELECT Number, Income/1000 AS [Income ($k)]
FROM dataframe
‘’’

ps.sqldf(query)

NB: be aware of automatic rounding here, because the column was an integer, it automatically converts a float to an integer.

334
Q

How do we name a created column in SQL?

A

Using the AS clause

It should come right after the column or expression of interest.

Need to include square brackets if you want the column name to have a space.

335
Q

What clause allows us to return different values depending on the content of a column?

A

The CASE clause

query = ‘’’
SELECT Number, Gender, Age, City,
CASE
WHEN City=’New York City’ THEN ‘North’
WHEN City=’Dallas’ THEN ‘South’
END AS Region
FROM citizens
‘’’

ps.sqldf(query)

336
Q

What does the CASE clause do?

A

Allows us to return different values depending on the content of a column.

The CASE and END start and end the logical criteria

AS specifies to column name to store the result

337
Q

How do we calculate summary statistics based on a categorical column?

A

Using the GROUP BY clause

query = ‘’’
SELECT Age, COUNT(*)
FROM citizens
GROUP BY Age
‘’’

ps.sqldf(query)

338
Q

When do we use GROUP BY in SQL queries?

A

To calculate aggregated statistics (counts, averages etc.).

You can’y display columns from grouped tables, you just see the first record in each group.

Only the column used for grouping and aggregated statistics should be included with the SELECT clause.

339
Q

How do you group by multiple columns?

A

Select the two columns you want to group by, and select the aggregate statistic for the third column.

The order typed is the order they are grouped by

query = ‘’’
SELECT Gender, Age, AVG(Income)
FROM citizens
GROUP BY Age, Gender
‘’’

ps.sqldf(query)

340
Q

When combining WHERE and GROUP BY clauses, what order should they be stated in?

A

Order matters

Need to apply the WHERE clause before grouping the data, so that undesired rows are not included in the grouping stage.

341
Q

What clause do you need to apply conditions to grouped data?

A

HAVING clause

342
Q

How do you use the HAVING clause?

A

query = ‘’’
SELECT Gender, Age, AVG(Income)
FROM citizens
GROUP BY Gender, Age
HAVING Age > 30
‘’’

ps.sqldf(query)

“Group by gender and age, but show me only the ones having ages over 30”

343
Q

Why is the HAVING clause necessary?

A

It allows us to perform more complex conditions, by applying criteria to the statistics of each group.

Eg you only want groups with an average income over X. We need to do the grouping first before we can apply the condition.

344
Q

What is the operation to join two tables in SQL?

A

JOIN

The first table is always the left table
The second table is always the right table

Joined based on a common column

345
Q

If you do not specify the JOIN type in an SQL query, which type of JOIN is automatically performed?

A

Inner join

ie only returns records present in both tables.

346
Q

What information do you need to provide in the SQL query when performing a JOIN?

A

You must provide the column to use for the join, otherwise the entire right table will be repeated for every record in the left table.

Specify columns to join on using ON
ON citizens.Number = welfare.IDNum

It is best to specify which table the column is coming from, to avoid any ambiguity if there is a column with the same name in each table.

347
Q

How do you write a JOIN query in SQL?

A

query = ‘’’
SELECT *
FROM citizens
JOIN welfare
ON citizens.Number = welfare.IDNum
‘’’

ps.sqldf(query)

348
Q

What can be handy to do in complex queries?

A

Give each table a shorthand name,

query = ‘’’
SELECT *
FROM citizens c
JOIN welfare w
ON c.Number = w.IDNum
‘’’

ps.sqldf(query)

349
Q

When is an SQL sub-query used?

A

When it takes more than one query to get what we want.

Avoids hardcoding, which is not very efficient.

350
Q

How do you specify a sub-query?

A

Specified using round brackets. The table or value the sub-query returns directly feed into the overall query.

query = ‘’’
SELECT *
FROM citizens
WHERE Income >
(SELECT MAX(Income)/2
FROM citizens)
‘’’

ps.sqldf(query).

351
Q

When might you need to use a sub query?

A

When you need to compare a column against a list of values.

Eg finding all citizens belonging to age groups with average incomes over 55000

352
Q

What clause is used to convert a character string to JSON?

A

JSON_EXTRACT

JSON_EXTRACT(table.column, “$[x].key”) AS new_colimn

Specifying which element of the JSON list we want to extract (0 is the first, 1 is the second etc.)

The key we want is optional

353
Q

What is the syntax for an SQL query using JSON_EXTRACTS?

A

query = ‘’’
SELECT title, JSON_EXTRACT(credits.cast, “$[0].name”) AS starring
FROM credits
‘’’

ps.sqldf(query)

354
Q

If data has a very large dynamic range, what is it good to do?

A

Look at the logarithm of the data (convert to powers of 10)

355
Q

How do you convert an axes to the logarithmic scale?

A

using np.log10() function

create new columns with the logarithmic conversions and then plot these.

OR

plot directly with
plt.xscale(‘log’)

356
Q

How can you identify trends?

A

Looking for the slope of the relationship.

Using np.polyfit() to fit a polynomial to data.

357
Q

How do you use the polyfit function?

A

It takes an x array, y array and a degree (first = linear, second = quadratic etc.)

f = np.polyfit(x=df[‘log_galaxy_mass’], y=df[‘log_bh_mass’], deg=1)

f[0] is the slope
f[1] is the intercept

We can then plot this using dummy data
x_arr = np.arange(x1,x2,0.1)

y_mod = f[0] * x_arr + f[1]

plt.plot(x_arr, y_mod, color=’r’)

358
Q

How do you quantify the goodness of fit of the polynomial fitted?

A

Calculating the Mean Squared Error - the average difference between model and data

MSE = mean((data - model)**2)

Calculate the predicted y values
linear_prediction = f[0] * df[‘log_galaxy_mass’].values + f[1]

Calculate MSE
mse = np.mean( (linear_prediction - df[‘log_bh_mass’])**2 )

359
Q

When is the MSE good for a data set?

A

When the range of the y axis is several times bigger than 1 - so the fit is doing better than random

360
Q

What is SciPy?

A

A Python scientific module which provides algorithms for many mathematical problems.

We use it for correlation in this module (does bigger x really mean bigger y)

361
Q

How is correlation determined?

A

Using a statistic called Spearman’s Rank Correlation coefficient

from scipy.stats import spearmanr

print(spearmanr(df[‘log_galaxy_mass’], df[‘log_bh_mass’]))

362
Q

How do you convert a list of strings to a single string?

A

” “.join(x)

363
Q

How do you remove an item from a list?

A

list.remove(item)

364
Q

How do you insert an item into a list?

A

list.insert(position, item)

365
Q

How do you add something to the end of a list?

A

list.append(item)

366
Q

How do you investigate the names of the keys in a dictionary?

A

dict.keys()

367
Q

What is XeY shorthand for?

A

XeY is short-hand in Python for “X times 10 to the power of Y”

368
Q

What does 1e6 represent?

A

1 x 1 000 000

369
Q

How do you write a dictionary to a file?

A

Within the with open as - json.dumps(data, f)

370
Q

How do you read in a dictionary within a file?

A

Within the with open as - json.load(f)

371
Q

How do you check in a condition that a variable is of a certain type?

A

isinstance(string1, str)

372
Q

What syntax for raising an exception can be used in a function?

A

def paint(self, colour):

    try:
        if isinstance(colour, str):
            self.colour = colour
        else:
            raise TypeError("Paint should be provided as a string")
    except TypeError:
        print(TypeError, "- the colour remains", self.colour)
373
Q

How can you remove an item from a list?

A

del list[0]

374
Q

How does python store a list?

A

In a simplified sense, you are storing a list in your computer memory, and store the address of that list, so where the list is in your computer memory in x.

This means that x does not actually contain all the list elements, rather it contains a reference to the list.

375
Q

How can you create a new list from an original list, so that it is passed by value rather than reference?

A

y = list(x) - this is a more explicit copy of the list

rather than y = x

376
Q

How do you find the maximum value of a list?

377
Q

How can you round a value?

A

Round function
round(value, precision)

378
Q

How can you look at python documentation?

A

help(function_name)

379
Q

How do you find the length of a list?

380
Q

How do you sort a list?

A

sort(list, reverse=False)

381
Q

How do you get the index of a specific item in a list?

A

list.index(item)

382
Q

How do you count the number of time an element appears in a list/string?

A

list.count(element)

383
Q

How do you capitalise the first letter of a string?

A

string.captialise()

384
Q

How do you replace part of a string with a different part?

A

string.replace(“x”,”y”)

385
Q

How do you convert an entire string to all caps?

A

string.upper()

386
Q

How do you reverse the order of a list?

A

list.reverse() - this changes the list it is called on

387
Q

What is the NumPy array an alternative to?

A

The NumPy list

388
Q

How do you create a NumPy array from a list?

A

np_array = np.array(list)

Assumes the list contains elements of the same type

389
Q

In a NumPy array, how are True and False treated?

A

As 1 and 0

390
Q

How do you investigate the size of a numpy array?

A

array.shape

391
Q

How can you subset. single element from a 2D NumPy array

A

array[0][2]

or array[0,2]

392
Q

How can you get the mean of a column of a 2D NumPy array?

A

np.mean(dataset[:,0])

393
Q

How can you check if two columns of a 2D NumPy array are correlated?

A

np.corrcoef(dataset[:,0], dataset[:,1])

correlation coefficient

394
Q

How can you calculate the standard deviation of a NumPy array column?

A

np.std(dataset[:,0])

395
Q

How do you generate random data points from a normal distribution?

A

data = np.round(np.random.normal(1.75, 0.2, 5000), 2)

mean = 1.75, std = 0.2, 5000 samples

396
Q

What package do we use for data visualisation?

A

Matplotlib

import matplotlib.pyplot as plt

397
Q

When is it appropriate to plot a line graph?

A

When time is on the x-axis

398
Q

In a scatter plot, how do you set the size of plots?

A

s=numpy array

399
Q

How can you add grid lines to your plot?

A

plt.grid(True)

400
Q

How do you look at the keys of a dictionary?

A

dict.keys()

401
Q

What type of values can dictionary keys be?

A

immutable objects

402
Q

What are examples of immutable object types that can be used as dictionary keys?

A

Strings, Booleans, integers and floatsH

403
Q

How can you check if a key is already in a dictionary?

A

“key” in dictionary - see if it returns True or False

404
Q

How can you delete a value from a dictionary?

A

del(dictionary[“key”])

405
Q

How can you manually check if two arrays are compatible for broadcasting?

A

np.broadcast_to()

406
Q

How do you find the maximum value of a numpy array?

A

np.max(array)

407
Q

How do you find the index of the maximum value of a numpy array?

A

np.argmax(array)

408
Q

How you transform all values in a numpy array to positive?

A

np.absolute(array)

409
Q

How do find the find the base 10 logarithm of 1000?

A

np.log10(1000)

410
Q

How do you find the exponential of 1?

411
Q

What kinds of mathematical functions can you access with numpy?

A

np.sin(x)
np.cos(x)
np.pi

412
Q

How do you count the number of occurrences of eg a City in a database?

A

Group by city then find the size

eg home_team = matches.groupby(“Home Team Name”).size()

413
Q

How do you make a dataframe from a dictionary and change the names of the indexes?

A

pd.DataFrame(dictionary)

Indexes automatically given

df.index = [list_of_strings]

414
Q

How do you select a column from a data frame and keep it in a data frame (rather than a pandas series)?

A

Use double square brackets

df[[“column”]]

415
Q

How do you select multiple columns from a data frame by name?

A

df[[“column1”, “column2”]]

OR

df.loc[:, [“column1”, “column2”]]

416
Q

To carry out the slicing function my_array[rows columns] on pandas data frames what do we need?

A

loc and iloc

417
Q

How can you only select certain columns and certain rows of a data frame?

A

df[ [“row1”,”row2”], [“col1”,”col2”]]

418
Q

How do you apply multiple logical operators to a NumPy array / pandas series?

A

np.logical_and(array > 1, array < 5)

array[np.logical_and(array > 1, array < 5)]

np.logical_and()
np.logical_or()
np.logical_not()

419
Q

How do you write a for loop to include access to the index?

A

for index, var in enumerate(seq):
expression

420
Q

How do you loop over a dictionary to access both key and value?

A

for key, value in dictionary.items():
expression

421
Q

How do you loop over an array to get each element?

A

To get every element of an array, you can use a NumPy function called nditer (ND iter)

for val in np.nditer(array):
print(val)

422
Q

When looping over a dataframe - what does the following print out?

for val in dataframe:
print(val)

A

Prints out the column names

423
Q

How do you iterate over the rows of a data frame?

A

In pandas, you need to explicitly say that you want to iterate over the rows.

Generates label on row and actual information.

for label, row in np.iterrows(dataframe)|:
print(label)
print(row)
dataframe.loc[label, “country_name_length”] = len(row[“country”])

Can also select a specific column eg print(row[“column_name”] or (as shown, can create new column)

But this is inefficient - use .apply
eg dataframe[“country_name_length”] = dataframe[“country”}.apply(len)

424
Q

How can you create a column that contains a calculation based on another column?

A

Use .apply(function)

eg dataframe[“country_name_length”] = dataframe[“country”}.apply(len)

425
Q

What does .apply() do?

A

Allows you to apply a function on a particular column in an element-wise fashion.

426
Q

How do you generate random numbers, ensuring reproducibility?

A

Using a seed - generate pseudo random numbers

np.random.seed(123)
np.random.rand()

427
Q

How do you randomly generate a 0 or 1?

A

np.random.randint(0,2)

This simulates a coin toss

428
Q

How can you simulate a dice throw?

A

np.random.randint(1,7)

429
Q

In functions with eg subtracting, how can you account for the fact you can’t have a negative number?

A

x = max(0, calculated_value)

this ensures x never goes below zero

430
Q

How do you transpose a 2D NumPy array?

A

np.transpose(array)

431
Q

How do you add a description of a defined function?

A

Use of docstrings - placed inside triple double quotation marks

def function(paramters):
“”” “””

432
Q

How do you change the value of a global parameter inside a function?

A

use keyword global

global name

433
Q

In a nested function, how can you change the value in an enclosing scope?

A

nonlocal keyword

434
Q

How do you allow for passing multiple arguments into a function?

435
Q

How do you allow for passing multiple keyword arguments into a function?

A

**kwargs

This turns the identifier keyword-pairs into a dictionary within the function body
Then, within the function body, we print all the key value pairs stored in the dictionary kwargs

for key, value in kwargs.items():

436
Q

How do we apply a lambda function to all elements of a list?
How do we print results of this lambda function?

A

We need to use map() to apply the lambda function to all elements of the sequence
result = map(lambda x,y: x+y)

It returns a map object, convert to list using list(result)

437
Q

How can you filter out elements of a list which don’t meet certain criteria?

A

result = filter(lambda x: len(x) > 6, list)

438
Q

What kind of error is thrown when an operation or function is applied to an object of an inappropriate type?

439
Q

When should we raise an error (instead of catching in an except)?

A

eg if we don’t want our function to work for a particular set of values - such as don’t want to square root negative numbers

using an if statement, we can raise a value error for cases in which the user passes the function a negative number

if x < 0:
raise ValueError(“X must be non-negative”)

440
Q

in an SQL query, how do you count unique values?

A

COUNT (DISTINCT “column_name”)

441
Q

How do you determine the number of rows in a data frame?

442
Q

How can you quickly inspect a data frame?

443
Q

What does df.describe() do?

A

The describe() method computes some summary statistics for numerical columns like mean and median

444
Q

What are the components of a data frame that you can access?

A

df.values - a 2D NumPy array
df.columns - column labels
df.index - row labels

445
Q

How can you sort a data frame by multiple column values?

A

df.sort_values([col1, col2], ascending=[True,False])

446
Q

How do you select multiple columns from a data frame?

A

Need double square brackets

df[[“col1”, “col2”]]

447
Q

How do you compare dates in a logical comparison?

A

The dates are in quotes, written as year, month then day
This is the international standard date format

448
Q

How can you filter a dataframe on multiple options of a categorical variable?

A

Using .isin()

dogs[“colour”] .isin([“Black”, “Brown”])

449
Q

What method allows to calculate custom summary statistics?

A

Aggregate .agg()

def function(column):
return column.quantile(0.3)

df[“column”].agg(function)

Can be used on multiple columns - pass in list [“col1”,”col2”]
Agg itself can also take a list of functions to apply at the same time

Can use .agg for the IQR

450
Q

How can you calculate the cumulative sum of a column?

A

Calling .cumsum() on a column returns not just one number, but a number for each row of the data frame

df[“column”].cumsum()

Can also have .cummax(), .cummin(), .cumprod()

These all return an entire column of a dataframe, rather than a single number

451
Q

When counting in a dataframe, how do you ensure you only count each “thing” once?

A

use .drop_duplicates()

eg df.drop_duplicates(subset=[“col1”, “col2”]

452
Q

After subsetting, how can you count the number of values in a table?

A

To count the dogs of each breed, we subset the breed column and use the value_counts() method

Can do .value_counts(sort=True)

453
Q

How can you turn counts into proportions of the total?

A

df[“column”].value_counts(normalize=True)

454
Q

How can you calculate the mean weight of each colour of dog?

A

dogs.groupby(“colour”)[“weight”].mean()

455
Q

What does the .agg method allow you to do?

A

Pass in multiple summary statistics at once to calculate

df[“column”].agg([np.min, np.max, np.sum])

456
Q

What are pivot tables?

A

A way of calculating grouped summary statistics

.pivot_table()

df.pivot_table(values=”col”, index=”colour”)

o The values argument is the column that you want to summarise
o The index column is a column that you want to group by

Automatically calculates the mean, if you want another statistic, use aggfunc
df.pivot_table(values=”col”, index=”colour”. aggfunc=np.median)

To group by more than one variable, pass in columns
df.pivot_table(values=”col”, index=”colour”, columns=”breed’, fill_value=0, margins=True)

457
Q

How do you set the index of a data frame?

A

df.set_index(“column”)

can include multiple columns df.set_index([“col1”, “col2”])

458
Q

How do you reset the index of a dataframe?

A

df.reset_index()

to get rid of it completely df.reset_index(drop=True)

459
Q

How can you subset a data frame with row labels?

A

.loc

df.loc[[item1, item2]]

460
Q

How do you subset rows at the outer level of an index vs the inner level, when there are two indexes?

A

Outer -df.loc[[item1, item2]] - with a list
Inner - df.loc[[(oteritem1, inneritem1), (outerritem2, inneritem2]] - with a tuple

461
Q

How can you sort values by their index?

A

.sort_index()

for multiple indexes - By default, it sorts all index levels from outer to inner, in ascending order, can control this;:
df.sort_index(level = [inner, outer], ascending=[True, False])

462
Q

What does slicing do?

A

Selects consecutive elements from objects

463
Q

If a column contains a date type, how can you access the different elements of the date?

A

df[“columns”].dt.year /.dt.month etc

464
Q

What is the simple way to plot?

A

eg df[“column”].hist()

avg_weight_by_breed.plot(kind=bar)

465
Q

How do you rotate axis labels by 45 degrees?

A

pass in rot=45

466
Q

How can you investigate if there are any missing values in your dataset?

A

Represented by NaN

df.isna().any() - tells you if there are any missing values in each column
df.isna().sum() - tells you how many missing values are in each column

467
Q

What can you do with missing values in a dataframe?

A

Drop - df.dropna()
Fill with 0 - df.fillna(0)

468
Q

How do you convert a data frame to a CSV file?

A

df.to_csv(“new filename.csv”)

469
Q

How do you find the value in column 1 based on a condition in columns 2?

A

journalsBSA.iloc[journalsBSA[“Rank”].idxmin()].loc[“Rank”]

correct - journalsBSA.loc[journalsBSA[“Rank”].idxmin(), “Title”]

470
Q

How do you change the range of the data shown on the axis?

A

Change the axis limits - ax.set_ylim()

471
Q

What are the steps of calculating the MSE?

A

determine the y values based on the predicted model and compare to actual values in table

MSE = np.mean( (predicted_y - df[“column”])**2)

472
Q

How do you count the number of occurrences in a grouped

A

phys_groups.size().sort_values(ascending=False)

473
Q

In databases, what are rows and columns referred as?

A

In the world of databases, rows are referred to as records
Columns are referred to as fields

474
Q

What SQL query do you use to only return unique values?

A

SELECT DISTINCT column1, column2
FROM dataframe

475
Q

What does the distinct key word do?

A

return the unique combinations of multiple field values

476
Q

What is an SQL view?

A

A view is a virtual table that is the result of a saved SQL SELECT statement
Views are considered virtual tables
There is no result set when creating a view

Then this table can be queried