Short Answer Section Flashcards

1
Q

What is a bloom filter and how would you implement one?

A

Bloom filters are a probabilistic data structure designed to quickly tell you with efficient memory whether an element exists in a given set. It can tell you if an item is definitely not in a set, or if it maybe is. It cannot give certainty that an item does exist in a set.

The tradeoff for optimized space is that you sometimes get false positives.

A bloom filter uses a Bit Vector (or bit array) as its base data structure.

First, you would hash an element a few times and then set the bits at the index of the hashes to one.

To test for membership in the set, you hash a string a number of times and check whether those values are set in the bloom filter.

If they aren’t, the item is not in the set. If they are, the item might be in the set, but might not if another element has the same hash indices.

Hashes must be uniformly distributed and quick (murmur or fnv).

Time complexity for insertion and membership testing: O(k) with k hashing functions.

Space complexity: O(m) where m is num of bits in the bit vector

Ex) Medium uses when recommending posts to readers by filtering out those already seen by readers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a mutex?

A

A mutex stands for “mutual exclusion interface” and is basically a lock or binary flag that we set before accessing a shared resource and release when we’re done.

They are used to protect critical data by ensuring access by only one thread at a time.

While we hold the lock, any thread that tries to set it will block until we release it.

Type is pthread_mutex_t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a semaphore?

A

A semaphore is a counter used to provide access to a shared data object for multiple processes.

To obtain a shared resource, a process needs to:

1) Test the semaphore that controls the resource
2) If the value is positive, the process can use the resource and decrement semaphore by 1.
3) If value is 0, process sleeps until semaphore value is greater than 0.
4) When a process is done with resource, increments the semaphore by 1.

API: wait() and signal() or acquire() and release()

Binary semaphores are set to 1 and control just one resource, but a semaphore can be higher than 1 and control multiple units of a resource.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between a mutex and a semaphore?

A

A mutex is similar to a binary semaphore. The primary difference is ownership.

Only the task that locks a mutex can unlock it.

Whereas mutexes act as binary flags, semaphores can be thought of a signaling mechanism from one task to another. It uses wait() and signal() operations between producers and consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a lock?

A

It’s a synchronization mechanism for enforcing limits on access to a resource in a multi-threaded environment.

Locks are used to enforce a mutual exclusion concurrency control policy - they can only be owned by one thread at any given time.

API: acquire() and release() or wait() and signal()

Examples are mutexes, binary semaphores and reader-writer locks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Two generalizable types of locks

A

Advisory and mandatory. Advisory locks are when each thread needs to acquire the lock before accessing the resource.

Mandatory locks are advisory locks that throw exceptions when a process attempts to gain access without the lock.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe atomic operations

A

An operation that is composed of multiple steps, where either all the steps are performed successively or none are performed.

Any operation that requires more than one function call cannot be atomic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the relationship between locks and atomic operations

A

Locks generally require hardware support to be implemented efficiently.

Usually this is via atomic operations that allow a single process to test if the lock is free and if so, acquire it.

Atomic operations are required because two tasks can be testing and attempting to acquire and set the lock at the same time, which could result in a deadlock or livelock scenario.

Examples of atomic operations for this include “test-and-set”, “compare-and-swap” and “fetch-and-add.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe hashing

A

Hashing is a mathematical function that maps keys to integers.

Hashing is deterministic. For a given input, the output will always be the same.

It’s also a one-way function, meaning the output cannot predict the input.

Ideally, hashing output is uniformly distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do hash tables work? What is the space and time complexity for its main API?

A

We use the output of a hash function to determine an index of the array where we will store a given item.

Pragmatically, hash tables are the best way to maintain a dictionary.

In steps:

1) Hash each key into a large integer.
2) Then, we mod it by the length of m (the number of bins) and play the item at that index.

If m is selected carefully (large prime not too close to 2^i + 1), the items should be uniformly distributed.

Collisions may occur when two items mod to the same index. In this case, we can use chaining or open addressing.

Time complexity:
Doubly-linked lists:
Insertion: O(n) worst-case (O(1) amortized)
Deletion: O(n) worst-case (O(1) amortized)
Search: O(n) (expected O(n/m)) O(1) average
Min, max, successor: O(m + n)

Open addressing:
Only difference is search is O(m)

Space: O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do deal with hash table collisions?

A

Chaining or open addressing.

With chaining, we use an array of pointers to linked-lists and each new collided element is added as a node to the linked list at that index. The tradeoff is that lists consume added memory.

The alternative is open addressing. An array consists of elements set to null. During insertion, if the spot is empty, we insert our element.

If taken, we find a different spot. Sequential probing takes the next spot available. The downside is that deletion can be a mess if we break the chain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an XSS attack?

A

XSS stands for Cross-Site Scripting.

XSS is a security vulnerability found in web applications that injects client-side scripts into web pages viewed by other users.

An injected script can collect private data from the user’s computer, as well as session cookies and user account data.

There are persistent and non-persistent forms, persistent being the most dangerous. because the script is triggered every time a user views the page.

The most common solution is contextual output encoding/escaping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is CSRF?

A

CSRF stands for Cross-Site Request Forgery.

This happens when unauthorized commands are transmitted from a user that the application trusts.

Counter to XSS, which exploits a users trust in an application, CSRF exploits the trust an application has in its user.

An innocent user is tricked by an attacker into submitting a web request they didn’t intend when logged in. This can cause the user to inadvertently perform actions on the target site that can leak client or server data, change session state or manipulate a user’s account.

Example: Attacked creates fake link to a funds transfer, which the client clicks. Website validates request and transfers funds to attacker.

Simple preventative measure: authenticity token checks between cookies and request params whenever a form is submitted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is functional programming?

A

It’s a programming paradigm that treats computation as the result of functions and avoids changing state or mutating data.

It’s declarative and uses expressions instead of full statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is object-oriented programming?

A

OOP is a programming paradigm based around “objects” that contain data called fields (or attributes) and code in the form of methods.

OOP uses statements and can mutate data..

Key concepts:
Abstraction: Helps to present information and an API for only what is relevant to user.
Inheritance: Can inherit methods, functions, properties, etc from parent class.
Encapsulation: Hides irrelevant info from user and helps prevent unauthorized access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the most common security vulnerabilities for web applications?

A

1) Injection attacks (i.e. SQL injections)
2) Broken authentication (hint: use a framework)
3) Cross-site scripting (XSS) - Solution is to not return HTML tags to client or using regex
4) Insecure direct object references (like accessing internal download file to access critical resources)
5) Security misconfiguration (i.e. debugger in production, outdated software, etc.)
6) Sensitive data exposure (not hashing passwords or encrypting data) - use HTTPS to mitigate and strong hashing
7) CSRF
8) Using components with vulnerabilities
9) Bad redirects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are database indexes and their benefits?

A

A database index is an auxiliary data structures that allows for faster retrieval of data in a database.

Benefits: They speed up reads for the column they are keyed off of, but they make writes slower and increase memory requirements.

Write and delete are slower because you have to update the index every time along with the data itself.

Memory increases by the size of the index, which only stores pointers to the data.

They are usually implemented with B-trees due to logarithmic insertion, deletion and retrieval. B-trees are also ordered and support inequalities and prefixes.

Bonus: R-trees are better for spacial data like finding locations within a certain radius.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a composite index?

A

It’s an index based on multiple columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If a function in C returns the pointer of a local integer variable set with a value, what will happen when you dereference it? Why?

A

It will most likely be undefined if the function has already returned and freed the item off of the stack.

The pointer will point to a value that will only live as long as the function runs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe database normalization.

A

Normalization is a set of guidelines for designing a database in order to reduce redundancy and data corruption.

There are 7 steps, referred to as Normal Forms (NF).
1NF: Tables shouldn’t contain repeated groups of data (ex: single column for multiple phone numbers, or multiple phone number columns in “Person” table)
2NF: This requires that no field should only be partially dependent on any candidate key in the table. This does not only include the PK, but any fields combinations that would uniquely identify a row. (i.e. course name that depends on candidate key “Course ID”-“Semester”, as the semester has nothing to do with the course name.
3NF: Every field should only depend on the entire primary key, not part of it and not on any other field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why would de-normalizing a database be a good idea at times?

A

It can speed up reads and enforce constraints that you wouldn’t be able to otherwise.

For example, a check constraint comparing two columns of a table, like making sure all appointments fit within the working hours of a doctor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are foreign keys?

A

These define relationships between tables. When you want a row in one table to be linked to a row in another table, you place a FK column in the child table and use the value of the parent row’s PK as the value of the FK field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do free() and malloc() work?

A

Malloc allocates memory via a specified number of bytes, where the initial value of the memory is indeterminate.

Free deallocates memory allocated by malloc.

Malloc and Free are implemented via a circularly-linked-list of free memory blocks with a head consisting of the size and some admin data.

Malloc returns a non-null pointer if successfuly, else a null pointer.

Malloc finds the first free block of memory in the linked list that’s large enough, and allocates more memory from OS if necessary.

Free() deallocates the space point to by a pointer. This space is usually not returned to the OS, because this can cause gaps in the heap.

Instead, it puts it into a pool of available memory or free block list that can be allocated in later calls.

malloc(size) - returns pointer to space of size (size)
free(*p) - deallocates space pointed to by p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do http cookies work?

A

A cookie is a small piece of data that a server sends to a user’s browser. The browser may then store it and send it back to the same server in the next request.

They store state for the stateless http protocol and most commonly are used for session management, personalization and tracking.

1) Server receives HTTP request and sends a Set-Cookie header in the response
2) Cookie is stored in web browser and then sent back to server in next request in a Cookie HTTP header
3) An expiration date or duration can be specified.

These are most commonly session cookies. Permanent cookies expire after a specific duration or on a specific date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What’s the difference between cookies and localStorage?

A

Localstorage is only client-side whereas cookies are primarily used server-side, with some client-side use. Local storage also provides more space to work with.

26
Q

What happens when you type a URL into your browser’s address box and hit enter

A

1) The browser looks up the IP address for the domain name.
2) First it checks the browser cache for the DNS records, then makes a system call to the OS cache, then the router DNS cache, then the ISP’s cache, then the ISP will perform a recursive search through the root nameserver, .com name server and then the specific domain’s name server.
3) The browser formulates an HTTP GET request to the web server with the URL, User-Agent header, maybe a cookie and Accept headers.
4) The browser sends the request over the network (mention Border Gateway Protocol, but that you don’t know how it works)
5) The request potentially interacts with proxy server, load balancer or CDN
6) The server may issue a permanent 301 redirect if the URL omitted “www’, in which case the browser would send another GET request
7) The server ‘handles’ the request. The web server receives the GET requests, processes it and sends a response. The web server selects a request handler which parses the request, params and cookies, routes it using Regex on the request path. Application layer assembles response and potentially updates data on the server.
8) The server sends back an HTML response with certain headers related to cookies, privacy, caching, etc.
9) Your browser parses the response and checks the header (in particular the status code).
10) The browser begins to render each html element, either painting to the DOM or executes the tag. While rendering, it will notice other tags that require additional requests for images, CSS stylesheets, etc., which will follow the same process as above.
11) If static files are available in the cache, it will not need to make requests to the server
12) The browser sends further AJAX requests

27
Q

What is threading?

A

A thread consists of the information necessary to represent an execution context within a process. This includes a thread ID, a set of register values, a stack, to name a few.

Processes alone represent a single thread of control, so multiple threads of control within a given process can allow better handling of asynchronous events syncronously and execute tasks at the same time that were otherwise handled serially.

The benefit of threading is that all threads within a process share all resources of the process, including memory, program text, global and heap memory, stacks and file descriptors.

Threading with pthreads reduces OS overhead and improves overall performance time.

In UNIX, threads are implemented as pthreads.

28
Q

How are the stack and the heap used in low-level systems?

A

Heap = dynamic memory allocation at run time, slower, shared by threads

Stack = static memory allocation at compile time, faster, each thread gets their own

The stack is used for automatic variables, along with information that is saved each time a function is called. For each function call, first the return address and certain info about the caller’s environment are saved onto the stack. Then, automatic and temporary variables are added to the stack.

The stack operates in a LIFO manner and when a given function returns, the stack frame or block is open for use for the next function call.

The heap is used for dynamic memory allocation. There is no enforced pattern, so memory can be allocated and deallocated at any time.

29
Q

What is the difference between processes and threads?

A

Both are independent sequences of execution. The main difference is that threads (within a given process) run in a shared memory space, while processes run in separate memory spaces.

30
Q

What do SQL explain and analyze do?

A

Using the EXPLAIN command shows you the query plan that the planner creates for any query.

ANALYZE executes the query and displays the actual output.

31
Q

Describe query planning

A

PostgresQL or whichever RDBMS you’re using creates a query plan to run the most performant query given a query structure and data type.

The structure is a tree of plan nodes, the bottom of which are scan nodes.

The query plan outputs one line per plan node, with a summary line at the top.

It will look something like…

Seq Scan on employees (cost:0.00…38811 rows=1 width=25)

In order, these represent: a) start-up time (in disk page fetches) b) max time c) rows and d) estimated average width of output rows in bytes

Note:

“Rows” are not those scanned, but those output.

32
Q

What are ORMs? What security advantage do they provide?

A

ORM stands for “Object Relational Mapping”.

It allows you to query and manipulate data in a database with an object-oriented programming paradigm. This is usually implemented with a library (like ActiveRecord in Rails).

ORMs protect you from SQL injection attacks because you no longer need to use SQL to interact with your data, you can use your programming language.

Pros:

  • DRY, OOP
  • Sanitization via method calls
  • Many things are handled automatically
  • No ugly SQL

Cons:

  • Can be slower for advanced SQL queries
  • Learning curve
33
Q

How does chunked transfer encoding work?

A

Chunked transfer encoding is a streaming data transfer mechanism on HTTP 1.1.

The data stream is divided into a series of non-overlapping chunks that are all sent out and received independently.

No knowledge of anything but the current chunk is needed by the sender and receiver at any given time.

Benefit: Allows a server to maintain an HTTP-persistent connection for dynamically generated content.

34
Q

Does HTTPS mean my boss can’t see my browsing behavior? Why or why not?

A

They can see your behavior if you are using a company-provided computer. A company can simply add their self-signed certificate to your list of CAs, intercept your HTTPS requests and decrypt them, modify and view them.

HTTPS takes normal HTTP and layers an SSL encryption layer on top of it.

The same HTTP protocol occurs, but through an encrypted SSL connection that encrypts and decrypts the data sent back and forth.

The SSL connection is established by a handshake:

1) Client sends a ClientHello message with all info the server needs to connect
2) The server proves identity to the client through Certificate Exchange. Client verifies certificate and that it’s approved by a Certificate Authority.
3) Key exchange with symmetric algorithm.

35
Q

How does a web application control the caching behavior of its clients?

A

A web application can control caching through a caching policy.

The policy depends on A) the caching entity itself and B) primarily, the content owner through the use of HTTP headers.

Example frequently-used headers: Cache-Control, Expires, E-tag, Content-Length

Cache-control includes options like no-cache and no-store, public and private.

36
Q

Explain WebSockets

A

WebSockets provide a persistent, low-latency connection between a client and server that both parties can use to transfer data at any time.

Benefit: reduces overhead of regular HTTP requests / responses (such as long-polling) when a bidirectional persistent connection is desired.

1) Client initiates a hand-shake with the server via an HTTP GET request with an Upgrade: websocket header.
2) If the server supports WebSockets and agrees, it completes the handshake by sending an Upgrade WebSocket header in the response
3) WebSockets then replaces the HTTP connection still using TCP/IP.

Data is transferred via “messages” with different “frames” of a given payload.

37
Q

What’s the difference between TCP and UDP?

A

TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) differ in that TCP is connection-oriented and UDP is a connectionless protocol.

TCP:

  • Connection-oriented, which means that client and server should establish a connection before communicating over the internet and close it afterwards.
  • Reliable: Guarantees data delivery and performs error checking
  • Sequencing: packets arrive in order
  • Both use the transfer layer of TCP/IP

UDP

  • Connection-less - does not require extensive error-checking or connection
  • Not reliable - doesn’t guarantee delivery of data
  • Good for broadcast and multicast transmissions
  • Sequencing: packets are not guaranteed to arrive in order

Bottom-line:

Choose TCP for higher reliability (www, SSH, email), but expect higher overhead and slower performance.

Choose UDP for lower reliability but higher speed and efficiency (streaming, broadcasting, etc.)

38
Q

What are monitors? What’s the difference between locks and monitors?

A

A monitor is a synchronization construct that allows threads to both have mutual exclusion and the ability to block for a condition to become false.

The defining characteristic is that its method are executed with mutual exclusion.

They are a collection of procedures manipulating a shared data structure.

They have both a mutex lock and condition variables.

A monitor is also known as a thread-safe object that wraps a mutex to safely allow access to a resource by more than one thread.

Difference between locks and monitors
Monitors include both locks and condition variables.

39
Q

What are condition variables?

A

They are used in synchronization constructs to signal that a particular condition is true or false before a thread executes.

When used with mutexes, they allow threads to wait in a race-free way for arbitrary conditions to occur.

API: wait(), signal(), broadcast()

40
Q

What are readers-writer locks?

A

Reader-writer locks are similar to mutexes, except that they allow for higher degrees of parallellism. (Also known as “shared-exclusive” locks)

They are suited for read-heavy situations.

Three states are possible:

a) Locked in read mode
b) Locked in write mode
c) Unlocked

Only one thread can hold a lock in write mode, while several can hold it in read mode.

41
Q

What is a deadlock?

A

Deadlocks occur when a process or thread enters a waiting state because a resource is currently being held by another waiting process.

All processes are waiting on the other processes which are waiting on the other processes.

If a process indefinitely cannot update its state because the resources that it requests are being held by another waiting process, then a deadlock occurs.

Ex) A thread tries to lock the same mutex twice.
Ex2) One thread holds a mutex and blocks while trying to lock a second mutex, while another thread with the second mutex tries to lock the first mutex.

They can be avoided by controlling the order in which mutexes are locked. Or, you can use pthread_mutex_trylock, which causes you to release the locks you hold if you can’t acquire a lock.

42
Q

What are race conditions?

A

Race conditions occur when multiple processes are trying to do something with shared data and the final outcome depends on the order in which the processes run.

This happens when multiple threads or processes do not lock the resource when attempting to access at the same time.

Solution: Manage resources with mutexes or other concurrency management tools to ensure orderly access.

43
Q

What is concurrency?

A

It’s the ability of different parts of a program or algorithm to be executed out-of-order or partial-order without affecting the outcome.

This allows for parallel execution of the concurrent units which can increase overall speed of execution.

44
Q

What are system calls and how are they implemented?

A

A system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it’s executed on.

Sample requested services are storage, memory, network, etc.

1) A kernel code snippet is run on request of user process at the highest level of privilege.
2) Old way was to use interrupts
3) LInux now uses SYSENTER and SYSEXIT instructions.

45
Q

Describe CPU caching.

A

A CPU cache is a hardware cache used by the CPU of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations.

To bridge the gap between registers (super fast, lowest-level memory that’s very expensive and space limited) and DRAM (very cheap but takes hundreds of cycles and is very slow), we have cpu caches.

L1, L2, L3 in decreasing speed and cost.

Caching is intended to reduce latency.

A cache hit rate refers to a hit at the highest (slowest) level of CPU cache.

The bottleneck is leaving the CPU die and heading to RAM.

The main aspect of cache-friendly code is locality, which says to place related data close in memory to allow for efficient caching.

46
Q

What are green threads?

A

Green threads are cooperative multitasking implemented by the programming language runtime, instead of being
preemptive multitasking from the OS.

These contrast to “native threads” which are provided by the OS.

Often mutex locks or monitors can be eliminated by proper scheduling.

Generally more lightweight and supported on systems that don’t allow native threads.

47
Q

What are the most common HTTP methods and what do they mean?

A

GET - Requests representation of resource. Only retrieves data.

HEAD - identical to GET but without the response body

POST - submits an entity to the resource, causing a change in state or side effects to the server.

PUT - Replaces current representation of resource with new payload.

PATCH - partial modification to resource.

DELETE - deletes specified resource

CONNECT - establishes a tunnel to the resource

OPTIONS - describes communication options for resource

TRACE - performs message loop-back test

48
Q

Describe networking at a high and low level.

A

Networking is the interconnection of multiple devices (“hosts”) connected over multiple paths for the purpose of sharing and receiving data.

Devices include router, hub, switch, etc.

The layout of a network is called its topology.

OSI: Open Systems Interconnection. It specifies standards for communication protocols and the functionalities of each layer.

Host name: unique identifier of device on network.
IP address: network address of the system across the network (32-bits)
MAC address: physical address of device
Port: logical channel where data can be sent/received
Socket: IP address + Port number

49
Q

What’s the difference between the internet and the web?

A

The Internet is a global network of smaller sub-networks that communicate through standardized protocols.

The Web is a subset of the Internet. It’s a system of servers that support specially formatted documents in HTML.

50
Q

Design and describe a REST API

A

A) REST APIs are designed around “resources”, which are any kind of object, data or serviced that a client can access.
B) Resources have unique identifiers (like URIs) that are used to access the resource
C) Client’s interact with a service by exchanging “representations” of resources (like JSON).
D) REST APIs use a “uniform interface” (use the standard HTTP methods)
E) Stateless. Requests are independent, atomic operations and all info should be in resource itself.
F) REST APIs are driven by hypermedia links contained in the representation (like an order JSON object with links to update the customer accordingly. This is so that a client can navigate the entire set of resources without prior knowledge of URI scheme.

To design the RESTful API:

1) Create simple routes / URIs
2) Choose the right HTTP methods to use for each of our services
3) Define resources properly
4) Return data (preferably JSON but XML works too)
5) Return the proper status code

51
Q

What are AVL trees?

A

An AVL is a self-balancing binary search tree. It was the first structure of its kind to be invented.

In an AVL, the heights of each child subtree may differ by at most 1. If the difference goes above one, rebalancing occurs to reduce the difference to 1.

Insertion, lookup and deletion all take O(log(n)) in the average and worst cases.

Insertions and deletion may require the tree to be rebalanced through rotations.

52
Q

What are the tradeoffs between SQL and NoSQL?

A

SQL db’s are table-based while NoSQL are document-based, key-value pairs or graph-based.

SQL is relational. NoSQL is non-relational.

SQL db’s are vertically scalable (on same server through RAM, CPU or SSD). NoSQL db’s are horizontally scalable (sharding, adding more servers)

  • NoSQL is good for “schemaless” data b/c you don’t need to define your schema up front and can modify it as you go easily.
  • NoSQL favors a denormalized schema due to no support for JOINs
  • NoSQL solutions are generally very scalable
  • Arguably, complex queries are better performed with SQL
53
Q

Implement breadth-first search

A

BFS is an algorithm for traversing a tree or a graph starting at the root and proceeding one depth level at a time.

Implementation

def bfs(node):
  queue = [node]
  while len(queue) > 0:
    current = queue.pop(0)
    current.[process]
    for child in current.children:
      queue.append(child)
    return processed

Time: O(v+e)
Space: O(v)

54
Q

Implement depth first search

A

(recursively)

def dfs(node):
  if node is None:
    return
  node.process
  dfs(node.left)
  dfs(node.right)

Time: O(v+e)
Space: O(v)

(iteratively)

def dfs(root):
  stack = [root]
  while len(stack) > 0:
    current = stack.pop(0)
    current.process
    if root.right:
      stack.append(node.right)
    if root.left:
      stack.append(node.left)
55
Q

How would you write a web crawler?

A

Source: https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch-What-language-or-framework-would-you-recommend/answer/Chris-Heller

Your web crawler is really just a function. It’s input is a set of seed URLs (or entry points), and it’s output is a set of HTML pages (or results).

Inside this web crawler function is something called the “horizon” which is just a list containing all unvisited URLs that your crawler intends to visit.

The initial value of the horizon is your set of entry points.

The computation performed by this function is as follows:

Continuously remove a URL from the horizon
Issue an HTTP GET on the URL
Download the contents, add the contents to the results
Parse the contents for new URLs and append any unvisited URLs to the horizon.

Your crawl is complete when the horizon is empty.

56
Q

Explain dynamic arrays (arraylists in java) and how resizing works.

A

Dynamic arrays are arrays with automatic resizing.

This way, you don’t have to specify number of elements ahead of time.

Arrays are allotted a contiguous section of memory based on the capacity of the array. When the size reaches capacity, a new array is generated in another block of memory that is twice the size as the original array, and all of the previous elements are copied into this new array.

Worst case insertions take O(n) time, but an an amortized basis, they are O(1) due to the infrequency of the resizing.

In total, resizing takes O(2n) –> O(n), because resizing is expressed as N + n/2 + n/4 … 1, which is 2n.

Pros: Fast lookups, variable size, cache-friendly due to contiguous memory
Cons: slow worst-case appends. Costly inserts and deletions.

Time: 
Insert and delete: O(n)
Append: O(1) average and amortized, O(n) worst
Lookup: O(1)
Space: O(n)
57
Q

Find the kth smallest element in a binary search tree

A

Reference code

58
Q

What are the tradeoffs between lists and arrays?

A

Compared to arrays, they have O(1) insert and delete. Retrieval is slow O(n).

59
Q

Time and space complexity for stacks and queues

A

reference

60
Q

How do binary heaps work? What’s their time and space complexity?

A

reference

61
Q

Explain binary search trees and their time and space complexity

A

reference

62
Q

What are red-black trees?

A

reference