Technical Interview Flashcards
Caching is about saving requested data to a faster or closer data store, so that data can be accessed again in the future.
Caching takes advantage of the locality of reference, which is the tendency to access the same information over and over again. For more details refer to locality in the glossary.
Caching offers the following benefits:
1) Reduces user wait time
2) Saves network bandwidth
3) Eliminates unnecessary computation time
API is an acronym that stands for application programming interface.
An API exposes programming functions to third-party developers. These developers can incorporate and execute those functions in their code. These third-party developers cannot see or change how the underlying functions are implemented.
For example, a ridesharing app can request and display map data via the Google Maps API.
REST stands for Representational State Transfer. REST is a way for two systems to communicate over HTTP, similar to how web browsers communicate with servers.
REST API is an important standard when exchanging information between two systems. Other standards, like SOAP, were unnecessarily complex and arbitrary.
An API is called a REST API or RESTful API when it follows these six design principles:
1) Client-server architecture. The client and server applications are separate. Each one can change independently from one another.
2) Stateless. The client application’s state is not stored on the server. Instead, the client’s state is passed around to every system that needs it.
3) Cacheable. The data transferred between clients and server must be cacheable.
4) Uniform interface. The API must have a uniform interface. It should be descriptive naming conventions. It should use consistent link and data formats such as JSON.
5) Layered system. The API should use a layered approach like MVC. For example, APIs, data, and authentication systems should sit on different servers.
6) Code on demand. This is an optional principle, but this allows for executable code to be returned.
RESTful API typically use standard HTTP verbs such as:
GET - retrieves information from a server
POST - writes new information to a server
PUT - updates prior information to a server
DELETE - removes information from a server
Tell me about “Reading from Cache”
When reading from the cache, sometimes the requested data is there. Sometimes not. We refer to this data availability as a cache hit or miss:
Cache Hit: Data is available in cache
Cache Miss: Data not available in cache
Along with data availability, we care about cache freshness. Freshness refers to whether the cache’s information is up to date. Out of date, or stale, data can be a concern.
For example, stale bank data can frighten both clients and banks. However, an older version of a personal web page can be less catastrophic.
The most common way cache systems determine freshness is with age. Cache systems often delete cache data that exceed an age threshold, which we call time-to-live (TTL).
What are the three common cache writing policies?
- Write-back : Write to the cache only;
Pro: low latency and high throughput
Con: potential data loss, especially if it’s the only copy during a crash - Write-through: Write to cache and permanent storage at the same time;
Pro: data consistency between the cache and storage
Con: Higher latency. Every write operation has to be performed twice - No-Write (AKA Write-around): Write to permanent storage only
Pro: Cache isn’t flooded with write requests
Con: Higher latency. A read for recently written data will create a cache miss
Tell me about “replacing the Cache”
Caches do not have infinite space, so a cache system dictates rules on what should be removed first. Here are some of the most common policies:
3) Last Recently Used (LRU)
4) Most Recently Used (MRU)
5) Random Replacement (AKA Cache Eviction)
Machine Learning
Machine Learning
Machine learning refers to algorithms that perform tasks based on inference rather than rules.
These inferences are derived from mathematical models that “learn” from large amounts of structured data.
For example, a developer can create a music recommendation service based on rules. For example, we can program an explicit rule that says if a user likes artist A then recommend music from artist B.
Machine learning is different. Instead, a machine learning algorithm will be given training data. Based off that training data, the algorithm will infer the data points that predict a particular outcome.
Example of provided data, for a music recommendation service, can include:
1) Listening history. Users who listened to a song are likely to listen to another song. This is called collaborative filtering.
2) Keywords. The song’s meta data or lyrics can provide clues. For example, the song’s meta data might indicate that the song is appropriate for toddlers. Or the song lyrics might indicate that the song is related to New York.
3) Audio file analysis. The music recommendation service might analyze the song file’s characteristics including tempo, loudness, key and time signature.
Other popular applications of machine learning include:
- Fraud detection
- Self-driving cars
- Voice recognition
- Email spam detection
- Shopping and movie recommendations
What are “features” when referring to Machine Learning?
A feature is a property of an event, observation, or data point. Here are some examples:
1) For a music recommendation, a song’s tempo combined with genre may accurately predict music one would like.
2) In email spam detection, the word “FREE” in the subject line may more accurately predict spam.
3) In eCommerce, whether a user using an Apple device, may more accurately predict how likely that user will make a purchase.
Choosing features is very important. It can significantly impact prediction accuracy.
What is “Training data” when referring to Machine Learning?
Training Data
Training data, as the name implies, is data that trains machine learning models.
What is “Validation Data” when referring to Machine Learning?
Validation Data
After a machine learning model has been trained, the validation data set is processed to gauge the machine learning model’s accuracy
Machine Learning: Supervised vs. Unsupervised Learning
Supervised vs. Unsupervised Learning
Supervised learning is a machine learning model that makes predictions (output) with clear inputs.
Unsupervised learning is a machine learning model that makes predictions (output) with unclear inputs.
For example, a self-driving car application may be given a clear (labeled) input that certain intersections have red stop signs. Based on this labeled data, the machine learning algorithm can infer that cars should stop when it encounters intersections with stop signs. This is an example of supervised learning.
However, a self-driving car application may not be given labeled data indicating which intersections have red stop signs. Instead, it would have to infer from the data available to it, when the car should stop. For example, it may wrongly infer that:
Stop: When it approaches an intersection and other cars are slowing down.
Not stop: When it approaches an intersection and other cars aren’t present
Machine Learning: Neural networks
Neural networks
A neural network is another learning model. Just like machine learning, it learns without specific rules.
Neural network’s quirky name comes from “neurons” in the human brain. That is, a neural network model process signals (i.e. data) like a neuron. That neuron can then signal other neurons it is connected to.
Machine Learning: Batching
The concept of aggregating portions of the training data together. These aggregations are “learned” together, instead of one-by-one. Batching training data reduces the number of times gradients are calculated to adjust weights through back propagation.
Machine Learning: Deep Learning
Deep Learning
Group of machine learning methods that are fundamentally based on neural networks. This includes variations of neural networks such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs).
What is Microservice Architecture?
Also called microservices, microservice architecture is an application designed with loosely coupled services or submodules.
An example of this is an eCommerce store. There is an account services, inventory service, and shipping service, all with their own databases. A mobile app and browser would tie into these loosely coupled services.
These loosely coupled services make it easy to develop and run them independently. This makes it easier to maintain, test, and scale. These microservices can even be running in different programming languages.
This is considered better than a monolithic application, which is developed from start to finish as a single unit. Monolithic applications are typically poorly organized, making it hard to debut, maintain, or extend.
What is MapReduce?
MapReduce is a way of dividing a large chunk of work between multiple computers. These computers can do the work in parallel (this is the map phase of MapReduce). When the parallel work is completed, a final collation or summary is performed (the reduce phase of MapReduce) before returning the result. Apache Hadoop and Apache CouchDB are popular, open-source MapReduce frameworks.
A developer would use MapReduce to speed up the time it takes to process large sets of data. For example, a search on Facebook among billions of users, groups, businesses and pages would take too long on a single computer. Instead, a cluster of many computers can each search a portion of Facebook’s databases and return their partial result to the reducer, which could then combine with other partial results and present to the user.
MapReduce is also useful for sorting. Unsorted data may be spread across a cluster and given to a reducer which can merge the sorted partials more quickly.
MapReduce may also be used to divide work between processes on a single machine. Querying a database with millions of records is much faster when several worker processes search assigned parts of the database in parallel.
MapReduce process and transform data too. A photo sharing service may automatically generate thumbnails for user images. When an album is created, thumbnails can be created in parallel on many machines (map) and then combined onto the web page (reduce).
What is CAP Theorem?
The CAP theorem states that between consistency, availability, and partition tolerance, a system may possess at most two traits simultaneously.
!)Consistency. Data is the same across all components of the system. Changes within data are reflected immediately; there are no discrepancy between data values.
2) Availability. The ability for the system to be unwaveringly operational and responsive.
3) Partition tolerance. The system continues to operate with communication breaks between system or data partitions.
In practice, any system complicated enough to warrant system design will need to scale onto multiple machines, so partition tolerance is almost always a given.
This leaves a choice between consistency and availability. In a perfect world, we would have all three, but a choice must be made. Consistency and availability tradeoffs can be made independently for different services within an microservice-based system architecture. That is those microservices can choose to emphasize consistency and availability, depending on their goals.
Example Tradeoffs and their emphasis:
Search Engine: Availability; orders search results are tolerable if it means the user doesn’t have to wait
Banking Apps: Consistency; inaccurate data is unacceptable
Concert Tickets: Consistency; not acceptable to sell the same concert ticket twice
Airfare search: Availability; near impossible to show 100% accurate data given volume of searches and combinations
Impatient users have emphasized the importance of quick results (availability) over accurate results (consistency). As a result, it’s more acceptable to have eventual consistency. Refer to the glossary for the definition of eventual consistency.
Discuss Horizontal Scaling
Horizontal scaling is running software across multiple machines simultaneously as opposed to running that software on a single machine.
In CAP terms, horizontal scaling supports better availability and partition tolerance. On the other hand, horizontal scaling reduces consistency because the data in the different partitions may not be synchronized.
Horizontal scaling is often contrasted with vertical scaling.
Discuss Vertical Scaling
Vertical scaling is moving software to a more powerful machine. That machine could have more memory, more storage, or faster CPUs.
While it can be straightforward, there are two problems with vertical scaling.
The fastest CPUs, along with the biggest memory and storage upgrades, are very expensive. The ROI decreases as you pursue the best upgrades. In other words, you get diminishing returns.
After a certain point, CPUs / memory / storage cannot be increased further, regardless of cost.
Application Size
Your application may be too intensive to be processed or stored on a single computer, regardless of cost.
Vertical scaling is often contrasted with horizontal scaling. Given the cost and limitations of vertical scaling, horizontal scaling is increasingly the more popular choice.
What do you know about TCP?
TCP stands for transmission control protocol. TCP provides a specific method and structure of sending packets over an IP network. It structures information and data packets sent over the internet. TCP provides the following:
Ordered Packets
The receiver knows the exact order, out of the total transmission, that a packet belongs in.
Receivers send “acknowledgements” to the sender for every packet they receive. This ensures that there are no missing packets.
Congestion Control
TCP controls the rate at which data is transmitted between the sender and receiver to avoid network collapses due to exterior constraints
UDP, or user datagram protocol, is another way to send packets across an IP network.
UDP is often compared with TCP. Each one has advantages and disadvantages; the choice to use one over another depends on the situation.
Most websites use TCP transmission. SSH connections also use TCP.
UDP is used by streaming applications such as Skype and Facetime.