Short Answer Section Flashcards
What is a bloom filter and how would you implement one?
Bloom filters are a probabilistic data structure designed to quickly tell you with efficient memory whether an element exists in a given set. It can tell you if an item is definitely not in a set, or if it maybe is. It cannot give certainty that an item does exist in a set.
The tradeoff for optimized space is that you sometimes get false positives.
A bloom filter uses a Bit Vector (or bit array) as its base data structure.
First, you would hash an element a few times and then set the bits at the index of the hashes to one.
To test for membership in the set, you hash a string a number of times and check whether those values are set in the bloom filter.
If they aren’t, the item is not in the set. If they are, the item might be in the set, but might not if another element has the same hash indices.
Hashes must be uniformly distributed and quick (murmur or fnv).
Time complexity for insertion and membership testing: O(k) with k hashing functions.
Space complexity: O(m) where m is num of bits in the bit vector
Ex) Medium uses when recommending posts to readers by filtering out those already seen by readers.
What is a mutex?
A mutex stands for “mutual exclusion interface” and is basically a lock or binary flag that we set before accessing a shared resource and release when we’re done.
They are used to protect critical data by ensuring access by only one thread at a time.
While we hold the lock, any thread that tries to set it will block until we release it.
Type is pthread_mutex_t
What is a semaphore?
A semaphore is a counter used to provide access to a shared data object for multiple processes.
To obtain a shared resource, a process needs to:
1) Test the semaphore that controls the resource
2) If the value is positive, the process can use the resource and decrement semaphore by 1.
3) If value is 0, process sleeps until semaphore value is greater than 0.
4) When a process is done with resource, increments the semaphore by 1.
API: wait() and signal() or acquire() and release()
Binary semaphores are set to 1 and control just one resource, but a semaphore can be higher than 1 and control multiple units of a resource.
What’s the difference between a mutex and a semaphore?
A mutex is similar to a binary semaphore. The primary difference is ownership.
Only the task that locks a mutex can unlock it.
Whereas mutexes act as binary flags, semaphores can be thought of a signaling mechanism from one task to another. It uses wait() and signal() operations between producers and consumers.
What is a lock?
It’s a synchronization mechanism for enforcing limits on access to a resource in a multi-threaded environment.
Locks are used to enforce a mutual exclusion concurrency control policy - they can only be owned by one thread at any given time.
API: acquire() and release() or wait() and signal()
Examples are mutexes, binary semaphores and reader-writer locks.
Two generalizable types of locks
Advisory and mandatory. Advisory locks are when each thread needs to acquire the lock before accessing the resource.
Mandatory locks are advisory locks that throw exceptions when a process attempts to gain access without the lock.
Describe atomic operations
An operation that is composed of multiple steps, where either all the steps are performed successively or none are performed.
Any operation that requires more than one function call cannot be atomic.
Describe the relationship between locks and atomic operations
Locks generally require hardware support to be implemented efficiently.
Usually this is via atomic operations that allow a single process to test if the lock is free and if so, acquire it.
Atomic operations are required because two tasks can be testing and attempting to acquire and set the lock at the same time, which could result in a deadlock or livelock scenario.
Examples of atomic operations for this include “test-and-set”, “compare-and-swap” and “fetch-and-add.”
Describe hashing
Hashing is a mathematical function that maps keys to integers.
Hashing is deterministic. For a given input, the output will always be the same.
It’s also a one-way function, meaning the output cannot predict the input.
Ideally, hashing output is uniformly distributed.
How do hash tables work? What is the space and time complexity for its main API?
We use the output of a hash function to determine an index of the array where we will store a given item.
Pragmatically, hash tables are the best way to maintain a dictionary.
In steps:
1) Hash each key into a large integer.
2) Then, we mod it by the length of m (the number of bins) and play the item at that index.
If m is selected carefully (large prime not too close to 2^i + 1), the items should be uniformly distributed.
Collisions may occur when two items mod to the same index. In this case, we can use chaining or open addressing.
Time complexity:
Doubly-linked lists:
Insertion: O(n) worst-case (O(1) amortized)
Deletion: O(n) worst-case (O(1) amortized)
Search: O(n) (expected O(n/m)) O(1) average
Min, max, successor: O(m + n)
Open addressing:
Only difference is search is O(m)
Space: O(n)
How do deal with hash table collisions?
Chaining or open addressing.
With chaining, we use an array of pointers to linked-lists and each new collided element is added as a node to the linked list at that index. The tradeoff is that lists consume added memory.
The alternative is open addressing. An array consists of elements set to null. During insertion, if the spot is empty, we insert our element.
If taken, we find a different spot. Sequential probing takes the next spot available. The downside is that deletion can be a mess if we break the chain.
What is an XSS attack?
XSS stands for Cross-Site Scripting.
XSS is a security vulnerability found in web applications that injects client-side scripts into web pages viewed by other users.
An injected script can collect private data from the user’s computer, as well as session cookies and user account data.
There are persistent and non-persistent forms, persistent being the most dangerous. because the script is triggered every time a user views the page.
The most common solution is contextual output encoding/escaping.
What is CSRF?
CSRF stands for Cross-Site Request Forgery.
This happens when unauthorized commands are transmitted from a user that the application trusts.
Counter to XSS, which exploits a users trust in an application, CSRF exploits the trust an application has in its user.
An innocent user is tricked by an attacker into submitting a web request they didn’t intend when logged in. This can cause the user to inadvertently perform actions on the target site that can leak client or server data, change session state or manipulate a user’s account.
Example: Attacked creates fake link to a funds transfer, which the client clicks. Website validates request and transfers funds to attacker.
Simple preventative measure: authenticity token checks between cookies and request params whenever a form is submitted.
What is functional programming?
It’s a programming paradigm that treats computation as the result of functions and avoids changing state or mutating data.
It’s declarative and uses expressions instead of full statements.
What is object-oriented programming?
OOP is a programming paradigm based around “objects” that contain data called fields (or attributes) and code in the form of methods.
OOP uses statements and can mutate data..
Key concepts:
Abstraction: Helps to present information and an API for only what is relevant to user.
Inheritance: Can inherit methods, functions, properties, etc from parent class.
Encapsulation: Hides irrelevant info from user and helps prevent unauthorized access.
What are the most common security vulnerabilities for web applications?
1) Injection attacks (i.e. SQL injections)
2) Broken authentication (hint: use a framework)
3) Cross-site scripting (XSS) - Solution is to not return HTML tags to client or using regex
4) Insecure direct object references (like accessing internal download file to access critical resources)
5) Security misconfiguration (i.e. debugger in production, outdated software, etc.)
6) Sensitive data exposure (not hashing passwords or encrypting data) - use HTTPS to mitigate and strong hashing
7) CSRF
8) Using components with vulnerabilities
9) Bad redirects
What are database indexes and their benefits?
A database index is an auxiliary data structures that allows for faster retrieval of data in a database.
Benefits: They speed up reads for the column they are keyed off of, but they make writes slower and increase memory requirements.
Write and delete are slower because you have to update the index every time along with the data itself.
Memory increases by the size of the index, which only stores pointers to the data.
They are usually implemented with B-trees due to logarithmic insertion, deletion and retrieval. B-trees are also ordered and support inequalities and prefixes.
Bonus: R-trees are better for spacial data like finding locations within a certain radius.
What is a composite index?
It’s an index based on multiple columns.
If a function in C returns the pointer of a local integer variable set with a value, what will happen when you dereference it? Why?
It will most likely be undefined if the function has already returned and freed the item off of the stack.
The pointer will point to a value that will only live as long as the function runs.
Describe database normalization.
Normalization is a set of guidelines for designing a database in order to reduce redundancy and data corruption.
There are 7 steps, referred to as Normal Forms (NF).
1NF: Tables shouldn’t contain repeated groups of data (ex: single column for multiple phone numbers, or multiple phone number columns in “Person” table)
2NF: This requires that no field should only be partially dependent on any candidate key in the table. This does not only include the PK, but any fields combinations that would uniquely identify a row. (i.e. course name that depends on candidate key “Course ID”-“Semester”, as the semester has nothing to do with the course name.
3NF: Every field should only depend on the entire primary key, not part of it and not on any other field.
Why would de-normalizing a database be a good idea at times?
It can speed up reads and enforce constraints that you wouldn’t be able to otherwise.
For example, a check constraint comparing two columns of a table, like making sure all appointments fit within the working hours of a doctor.
What are foreign keys?
These define relationships between tables. When you want a row in one table to be linked to a row in another table, you place a FK column in the child table and use the value of the parent row’s PK as the value of the FK field.
How do free() and malloc() work?
Malloc allocates memory via a specified number of bytes, where the initial value of the memory is indeterminate.
Free deallocates memory allocated by malloc.
Malloc and Free are implemented via a circularly-linked-list of free memory blocks with a head consisting of the size and some admin data.
Malloc returns a non-null pointer if successfuly, else a null pointer.
Malloc finds the first free block of memory in the linked list that’s large enough, and allocates more memory from OS if necessary.
Free() deallocates the space point to by a pointer. This space is usually not returned to the OS, because this can cause gaps in the heap.
Instead, it puts it into a pool of available memory or free block list that can be allocated in later calls.
malloc(size) - returns pointer to space of size (size)
free(*p) - deallocates space pointed to by p
How do http cookies work?
A cookie is a small piece of data that a server sends to a user’s browser. The browser may then store it and send it back to the same server in the next request.
They store state for the stateless http protocol and most commonly are used for session management, personalization and tracking.
1) Server receives HTTP request and sends a Set-Cookie header in the response
2) Cookie is stored in web browser and then sent back to server in next request in a Cookie HTTP header
3) An expiration date or duration can be specified.
These are most commonly session cookies. Permanent cookies expire after a specific duration or on a specific date.