Conceptual Questions Flashcards

1
Q

Briefly outline Defensive Programming

A

Defensive programming is a method of writing code that is designed to protect against bugs. This includes:

1) Building up code bit by bit
2) Using functions copiously
3) Testing as you go against known inputs/outputs
4) Logging each step of your progress
5) Using comments to remind you of any assumptions/cut corners
6) Be on the lookout for common “gotchas”
7) Remembering to focus on readability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Briefly outline the general approach to debugging

A

1) Reproduce the bug
2) Determine exactly what the problem is
3) Eliminate “obvious” cases (e.g. Is it plugged in?)
4) Divide the process, separating out the parts that work from the part(s) that don’t (“isolate” the problem)
5) When you reach a dead end, reassess your information; then step through the process again
6) As you proceed, make predictions about what should happen and verify the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Briefly outline the Actuality of Software Testing

A

There are 2 types of software testing:

(1) Execution-based verification, which is where you:
a) Generate and execute set of representative test-cases/inputs, and check the correctness of the output
b) Examples are integration testing” (between system components), or “unit testing” the components of the system
(2) Non-execution-based verification, which is where you detect bugs by eyeballing the code directly, e.g. Code Review or Pair Programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Briefly describe the ideal specifications of Test Cases

A

Test cases should be designed independently of the software implementation, and (ideally) be designed to:

1) Test one thing each (e.g. one use case per input type)
2) Test over the spectrum of use cases (often based on the “boundaries” of inputs)
3) Identify and test “corner case” inputs (partly)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Briefly list Common Python Gotchas

A

1) Equality (==) vs. assignment (=)
2) Printing vs. returning from functions
3) Correct use of types (e.g. False vs. “False”)
4) Incorrect use of function/method (e.g. return list.sort())
5) Spelling and capitalisation
6) Loops and incrementing
7) Conditionals and indentation
8) Namespace problems
9) Incorrect indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Briefly Outline what an Iterator is

A

Definition: An object that keeps track of the traversal of a container. An iterable object will return an interator object when you pass it to the built-in Python function iter(). (This happens automatically with for… in…:) Iterators have a __next__ method that will return the next thing in the iteration, and update their state/memory of where they are up to. You can access it with the built-in function next(). Iterators raise a StopIteration exception when the container is empty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why use an iterator?

A

We would use iterators because there’s less scope for a programmer to make an error: (1) You do not have to initialise the index variable, (2) You do not have to define the end value of the index variable, and (3) you do not have to increment the index variable. Simply create the Iterator using iter and step through it using next. Python will take care of all the indexing. (Similarly, we prefer for over while for a subset of those reasons). Iterators are more memory efficient than storing and indexing a whole iterable: E.g. For … in … Is better than .readlines() … for.. in…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between an Iterable and an Iterator?

A
An iterable describes something that could conceptually be accessed element-by-element, while an iterator is an actual interface (or an object) that allows element-by-element access to an iterable.
An iterator:
- Has No Random Access
- "Remembers" last item seen
- has no len()
- can be infinite
- Traverse exactly once (forwards)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the relationship between the “internet” and the “(worldwide) web”?

A

The internet is the open network by which computers are connected and communication between computers take place; the worldwide web is a subset of the internet used to link hypertext documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an IP address?

A

An IP address is a numerical label assigned to each device participating in a computing network that uses Internet Protocol for communication. Each device on the Internet has a unique “IP address”. Devices address each other by their IP, but communicate via “Internet routers”. Communication is from a “client” to a “server”.

IPv4 vs. IPv6: In IPv4, IP addresses are represented as 4 “8-bit” integers, each in the range [0, 255], e.g. 128.250.36.33 is Tim’s main web server, but in IPv6, IP addresses are represented by 8 4-digit hexadecimal numbers.

Static vs. Dynamic: IP addresses can be allocated to a device either “statically” (an IP is reserved for a given device) or “dynamically” (an IP is allocated to a device dynamically when it connects to the Internet)

Local vs. External: IP addresses can also either be “local” (addressable only on a private network) or “external” (directly addressable from anywhere on the internet). In IPv4, a number of address rangers, including 192.168.x.x, are reserved for local IPs. Devices with local IPs can often still access the open Internet through “proxy servers” or “gateways”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Hostname?

A

A hostname is a hierarchically-organised, human-readable version of a machine address, that maps onto IP addresses using DNS, or the Domain Name System. We use this because humans tend to find sequences of numbers hard to remember, so we use hostnames, such as hum.csse.unimelb.edu.au, made up of (case-insensitive) letters and full stops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Port?

A

Machines communicate with other machines over the internet via a collection of numbered “ports”. A port is an endpoint of communication in an operating system, that specifies a process or type of service. A port is identified for each address and protocol by a 16-bit number known as the port number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a URL?

A

A URL is a “Uniform Resource Locator”, and Internet resources are “addressed” via URL, e.g.: http://nlp.stanford.edu:8080/parser/.
URLs are made up of the following parts: scheme://hostname:port/path, where: (1) scheme = the “protocol” for accessing the file, (2) hostname = the device the file lives on, (3) port = access port (optional), (4) path = where the file lives on the device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

With the aid of an example, explain the difference between an HTML “tag”, and an HTML “entity”

A

A tag pair is used to apply some operation to a document extent (or insert an object of some description at a particular point in the document), whereas an entity is simply a special character. For example <b>text</b> is used to make ‘text’ appear bolded, and < is an entity used to represent the

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the relationship between a hostname and an IP address?

A

Hostnames are hierarchically-organised, human-readable versions of machine addresses that map onto IP addresses using DNS, or a ‘Domain Name System’. They are used because humans find it easier to remember than the large sets of numbers that are IP addresses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain what recursion is, what are its elements, and why we use it?

A

Recursion is where, in a function, we call the same function, and it is used to solve problems in a “divide-and-conquer” manner, breaking down the problem into smaller sub-problems and solving them in the same way as the big problem.

Recursion is generally made up of 2 parts: (1) A recursive function call on smaller inputs, and (2) a reachable base case to ensure the calculation halts. It is used when an iterative solution would involve a level of iterative testing proportionate to the size of the input, which occurs in the Powerset problem, for example.

Recursion has two forms, head recursion, where we recurse first, and then perform some local calculation, and tail recursion, which is where we perform some local calculation, then recurse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the limitations of recursion?

A

Function calls are computationally expensive, meaning deep recursion comes at a price. There is often an efficient iterative solution to the problem, although there may not be a general iterative solution. Recursion is elegant, but elegance isn’t more readable or efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are Raster images and vector images?

A

A raster image is an image represented by colour ‘pixels’ that are mapped from the top-left, row by row, bottom to right, whereas Vector images are represented as points, lines, and curves, on a map, and the order of the vectors is unimportant. Vector images allow us to zoom in, without loss in quality, as the image is determined by points and not pixels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the types of digital image?

A

Binary images are images that have only 2 possible values for each pixel, 0 for black, and 1 for white.
Grayscale images are images that have a value for each pixel that ranges from 0, for black, to 255, for white.
Colour images are images that have 3 values for each pixel that range from 0 to 255, which represent Red, Green, and Blue (RGB), and can be used to represent any of over 16 million colours (256^3).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the 2 principal computational desiderata for algorithms?

A

Correctness: an algorithm should find and output the correct solution for every input, every time, and Efficiency: an algorithm should be as fast as possible (run-time efficiency) and it should require as little storage as possible (storage efficiency).

21
Q

Briefly outline the Linear Search Algorithm

A

The linear search algorithm takes a list of numbers, and a number to find, examines each number in the list, one by one, and tests whether that number is equal to the number it’s searching for. If found, it returns the index number in the list which matches the input it’s searching for, or if it doesn’t find the number it’s looking for, it returns a value or operation that indicates that the number wasn’t found; for example, None.

22
Q

Briefly outline the Binary Search Algorithm

A

The Binary Search Algorithm takes a sorted list of numbers, initialises a sub-list to the full list, and our index to the mid-point of the list. While the sub-list is non-empty, if the value at the mid-index is smaller than the one we wanted, continue to search over the right half of the current sub-list, or if it’s larger, continue to search over the left half of the current sub-list, or if the value is the one we wanted, return the index of that value within the full list. The run-time for Binary Search = log2(n).

23
Q

Outline the Process of Algorithmic Analysis, using an example.

A

In Algorithmic analysis, we ask three questions. (1) Is this algorithm correct? Proof is often required, if correctness is not obvious. (2) Is it run-time efficient? How efficient is it in its best, average, and worst cases? And (3) Is it storage efficient? How efficient is it in its best, average and worst cases?

For Linear search:

(1) It’s obvious that it’s correct.
(2) There is n = 1 step for the best case, n = len(list) / 2 is the average case, and n = len(list) steps, for the worst case. We have to compare this to other search algorithms and what is possible, when considering if this is efficient. Given the efficiency of binary search, which takes fewer steps, on average, the Linear search algorithm is run-time inefficient.
(3) It’s relatively storage efficient as it only requires the storage of the list, as does binary search.

24
Q

In the context of algorithmic development, what is the difference between an “exact” and “approximate” approach?

A

The ‘exact approach’ is calculating the solution with a guarantee of correctness, e.g.: (1) Brute force, or (2) Divide and conquer. The ‘approximate approach’ is estimating the solution, ideally with an additional estimate of how “close” this is to the exact solution (set), e.g. (1) simulation, (2) heuristic search.

25
Q

Outline the Brute-force/Generate & Test strategy of algorithmic problem solving, with the aid of an example.

A

Generate candidate answers and test each of them until the solution is found; it assumes a candidate answer is easy to test, and the set of candidate answers is ordered or can be generated exhaustively. It is commonly applied to problems where the number of solutions is relatively small, like linear search, for example.

26
Q

Outline the “divide-and-conquer” strategy of algorithmic problem solving, with the aid of an example.

A

Split the problem into sub problems, solve each sub problem, and extend the sub-solution to create the solution of the original problem. One common example of this is binary search, wherein for a set of ordered numbers, the middle index is found to be either larger or smaller or the same, and dependent on this, we either return the index, or continue our search in the bottom or top half of the list.

27
Q

Outline Memoisation, in the context of algorithmic problem solving, with the aid of an example.

A

Memoisation is used where, when solving a problem, we may have to perform the same operation multiple times. Memoisation is the process wherein we store the results of each operation, so that it may be used in future operations to speed up the algorithm. One example is where we generate Fibonacci numbers: to generate one large number, we have to generate the smaller numbers, and our operation might include, by the nature of the algorithm, finding each of these numbers multiple times. Memoisation will ensure that we don’t have to perform an operation to find any Fibonnaci number more than once.

28
Q

Outline the “Simulation” strategy of algorithmic problem solving, with the aid of an example.

A

Simulation is where we randomly generate a large amount of data to predict an overall trend. We use multiple runs to verify the stability of an answer. It is used in applications where it is possible to describe individual properties of a system, but hard/impossible to capture the interactions between them. For example, it is used in weather forecasting, and company valuations.

29
Q

Outline the “Heuristic Search” strategy of algorithmic problem solving, with the aid of an example.

A

The strategy in Heuristic search is to search via a cheap, approximate solution which works reasonably well most of the time… but where there is no proof of how close to optimal the proposed solution is. An example is playing your strongest (valid) card every play in Oh Hell! until you get your bid, then play the weakest.

30
Q

For the problem of sorting a list of integers, design: (a) a test to determine whether a list is in sort order and (b) a ‘brute-force’ algorithm to solve the problem. What is the best- and worst-case runtime efficiency of your algorithm?

A

a) One text would be to iterate over all elements of the list, and check that the desired inequality (

31
Q

For the problem of sorting a list of integers, design a “divide and conquer” algorithm to solve the problem (e.g. by applying the same basic logic that was used in the binary search algorithm). What is the best- and worst-case “runtime efficiency” of your algorithm?

A

There are various ways of doing this. One that is called ‘insertion sort’ would be to iteratively build up a sorted sub-list, using binary search to insert one element at a time into the position i in the sub-list such that the value of the element lies between the values of the ith and i + 1 elements.
The best run-time efficiency of the algorithm is n, in the case that the mid-point of the sub-list is the insertion point for all elements; the worst-case efficiency is roughly equivalent to n x log2(n), i.e. the case of each of the n elements taking the worst-case time for binary search (log2(n)) to insert, which occurs when binary search gets down to a one-element sub-list before it can finally insert the element.

32
Q

Briefly Describe Unicode, and its advantages

A

Unicode is an attempt to map all of the glyphs in the world’s documented languages (and more) to a unique code point. Unicode is great in that it supports all orthographies in a single document, there is scope for both precomposed (e.g. é), and composite characters/glyphs (e.g. ´+e), and there is scope to monotonically add more code points for different languages.

33
Q

Briefly Describe the difference between ‘fixed precision’ text encoding and ‘variable-width’ text encoding, using an example to outline

A

Fixed-precision text encoding is where there is a fixed number of digits to represent a code point for each character. For example, if our highest code point were 10^3-1, we could encode each point in our document with 3 decimal digits. If a = 97, b = 98, and c = 99, in a fixed-width encoding of 3 digits per code point, ‘abc’ would be 097098099. A variable-width text encoding, like UTF-8, checks the leading bit of each byte provided (it uses 1-4 8-bit digits), which determines how many code units wide the character is, and interprets the byte(s) accordingly.

34
Q

Briefly describe ASCII, its advantages and its disadvantages

A

ASCII is a fixed-width encoding that only uses one 7-bit code unit to represent many popular Unicode character encodings. So, its compact, however has 1 redundant bit, and it also has the obvious failing in that it can only encode a small number of characters.

35
Q

Briefly describe UTF-32, its advantages and its disadvantages

A

UTF-32 is an encoding which uses a 32-bit encoding to represent each Unicode code point value directly. This is great in that it can encode all Unicode characters, but is bloated in that documents are bigger than they need to be.

36
Q

Briefly describe UTF-8, its advantages and disadvantages

A

UTF-8 is a variable-width encoding that uses 8-bit code units to represent Unicode strings. It can represent any Unicode string, and it keeps documents smaller and less bloated than in UTF-32.

37
Q

Briefly Describe ISO-8859, its advantages and Disadvantages / Setting the top bit / Single-Byte Encoding

A

ISO-8859 is a single-byte encoding built on top of ASCII (7-bit) to include an extra 128 characters, which is enough to represent many orthographies. It isn’t enough to cover all of these orthographies at once, however, so the standard occurs in 16 variants (ISO-8859-1 to ISO-8859-16), for example ISO-8859-7 is Latin/Greek. It means that it can represent more than just Latin, however it cannot represent different orthographies, like Hebrew and Arabic, in the same document. Furthermore, they aren’t able to support ‘big’ orthographies, like Chinese, Japanese and Korean.

38
Q

Briefly Describe the relationship between UTF-8 and ASCII

A

ASCII is a fixed-width encoding that uses one 7-bit code unit to represent many of the popular Unicode character encodings, while UTF-8 is a variable-width encoding that uses 8-bit code units to represent Unicode strings, is a superset of ASCII, and can indeed be used to represent ASCII characters via using only one code unit.

39
Q

Briefly describe the relationship between UTF-8 and Unicode

A

Unicode is an attempt to map all of the glyphs in the world’s documented languages (and more) to a unique code point, while UTF-8 is a variable-width encoding that uses 8-bit code units to represent Unicode strings.

40
Q

What is cryptography?

A

Sending messages that are secret from everyone but the intended recipient. The sender has to “hide”, or “encrypt” the message so no one else can understand it. The receiver has to “un-hide” or “decrypt” the message to recover it.

41
Q

Explain Public-Key Cryptography

A

Public key cryptography works by having two parties, who want to send messages to one another: Party A, the sender, and Party B, the receiver. Party B has a public key, that anyone can view (shared publicly, hence ‘public key’), and a private key, that allows them to decrypt messages encrypted with the public key. Party A decrypts their message with the public key, sends it to Party B, who then decrypts this message with their private key.

42
Q

Explain why RSA is “secure”?

A

In a simplified example, RSA is secure as it takes message M, generates two large prime numbers (p & q), about 300 digits long, and multiplies them together to find N, which it uses to encrypt the message. It is only possible to decrypt this message knowing both p and q, and factorising N to find p and q takes too long even for an incredibly large amount of computing power, so effectively only the person who generated the public and private key is able to decrypt the message.

43
Q

What are the Security Requirements for an “Internet Voting System”?

A
  • Verifiability (so no one can manipulate the output)
    o Only eligible voters vote
    ♣ At most once (Multiple Voting)
    o Voters should get evidence that their vote was:
    ♣ Cast as they intended (Impersonation i.e. authentication failures)
    ♣ Counted as cast
  • Privacy, so coercers can’t manipulate inputs:
    o Coercion based on physical presence
    o Coercion based on electronic privacy invasion:
    ♣ Client machines privacy breach
    ♣ In transit privacy breach
    ♣ On the server privacy breach
44
Q

What is a FREAK attack?

A

In the 90s, the US government restricted the export of a strong crypto, in particular of RSA using more than 512 bit keys. So, lots of servers (and clients) maintained the option to use “export grade” crypto, just in case they had to communicate with a restricted computer. Unfortunately, many still do. A 512-bit “export-grade” RSA now costs about $100 to break running overnight on Amazon’s EC2 cloud. A FREAK attack is where an attacker intercepts the client’s request for any RSA encryption and makes it into a request for 512 bit RSA, the server provides a 512 bit RSA-EXP key, the user’s buggy client accepts a 512-bit key and uses that to encrypt, and the attacker uses the factored 512-bit key to control the session.

45
Q

Explain the role of the “public key” and “private key” in “public key encryption”, when securely transferring a file. For each of the “sender” and “receiver” of the file, state which keys they have access to.

A

The public key is used to encrypt the message, and is accessible by both the sender and receiver (the receiver generates it and makes it available), while the private key is used to decrypt the message, and is accessible only to the receiver, who generates it.

46
Q

With the aid of an example, describe what “alignment” means in the context of statistical machine translation, e.g. as used for language decipherment purposes:

A

Alignment is the process of determining which substrings in the target language are translations of which particular substrings in the source language. For example, in French, we have “Je parle la pauvre français”, and “I speak poor French”. Our program would use a large set of strings to determine probabilities of which words in french are related to which words in english, and would tell us, for example, that “la pauvre” would be associated and ‘aligned’, with “poor”, for example.

47
Q

Briefly describe how distributed computing was used in the discovery of the Higgs boson at CERN in 2012

A

CERN observed a large number of particle collisions per second, amounting to an enormous amount of data. In order to facilitate the storage of this data (around 4 Petabytes of data per year), and to aid in the simulation of the collisions, CERN not only has a very large computing system, but also is connected to a very large, tiered network of other computing facilities around the world, which aid in storage, simulation and analysis.

48
Q

Computational simulation was a vital component of the discovery of the Higgs boson at CERN in 2012. Without any details of the underlying particle Physics, briefly describe the interaction between “computational simulation”, “theoretical prediction” and “measurement” that led to the discovery.

A

Computational simulation was used to calculate the expected measurements associated with a given theoretical prediction. Measurement of the actual particle collisions were compared with the expected measurements, to see if any of the expected measurements fit with the actual measurements.

49
Q

With the aid of an example, explain what an “API” (application programming interface) is.

A

An API is a set of functions and procedures that allow the creation of applications which access the features or data of an operating system, application, or other service. For example, the Rome2Rio search API allows an application to access the data of the Rome2Rio server, and can be used to request searches for routes from, theoretically, any two cities in the world, which returns different methods of going from location to location, prices, and duration of travel.