Data intensive applications Flashcards
Impedance Mismatch
When converting objects in code to a SQL table and Vice Versaa
Another name for rows in a DB?
Tuples
Schema-on-read
The structure of the data is implicit and only interpreted when the data is read.
Schema-on-write
The traditional approach of relational databases, where the schema is explicit and the database ensures all written data confirmed to it.
Document database
Data structures are self contained. JSON representation Can be quite appropriate.
Document oriented databases: MongoDB, RethinkDB, CouchDB and Espresso.
Imperative Languages
Tells the computer to perform tasks in a particular order. This makes them hard to parallelize across multiple cores and machines.
Declarative languages
Specify the pattern of the results, not the algorithm that is used to determine the results. This lends them to more parallel executions.
Explain MapReduce?
In a real-world computing context, MapReduce is a programming model used for processing and generating large datasets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. It’s popularly used in big data applications and was a key component of the original Google search engine’s infrastructure.
Imagine you’re a teacher with a large class of students. You need to find out the total number of books each student read over the summer. Instead of doing this task by yourself, which would be quite time-consuming, you employ the MapReduce strategy.
- Map: You divide the task among your students. You ask each student to make a list of the books they’ve read over the summer and count them. This division and individual counting is like the “map” phase of MapReduce.
- Shuffle & Sort: After your students complete their lists, you gather the lists together. You arrange them in order, making it easier for you to count.
- Reduce: Lastly, you go through the sorted list and sum up the total number of books read. This final aggregation is like the “reduce” phase of MapReduce.
Which data model is more appropriate for a one to many relationship (tree-structured) or no relationships?
Document model is appropriate.
Which data model is appropriate if you have many-to-many relationships?
Relational can handle simple cases but as connections within your data becomes more complex it becomes more natural to start modeling your data as a graph.
What is the basic structure of a Property Graph
Each Vertex has a unique identifier, set of outgoing & set of incoming edges plus a collection of properties (key-value pairs).
Each Edge has a unique identifier, tail vertex (start), head vertex (end), label to describe relationship between two vertices and a collection of properties (key-value pairs).
When is it beneficial to use asynchronous programming models?
Asynchronous programming models are beneficial when dealing with I/O-bound operations, such as network requests or file operations. By allowing tasks to run concurrently and asynchronously, it helps improve responsiveness and resource utilization in scenarios where tasks spend a significant amount of time waiting for external operations to complete.
With asynchronous programming, you can utilize asynchronous calls to the API, allowing the application to continue executing other tasks while waiting for the response. Here’s a step-by-step breakdown:
- The application sends an asynchronous request to the API using a designated function or library. This function typically takes a callback function as a parameter or returns a Promise object.
- Instead of waiting for the API response, the application can continue executing other tasks while the request is being processed by the API. This ensures that the application remains responsive and can perform other operations in the meantime.
- Once the API response is received, the callback function (or the resolved Promise) is triggered, allowing the application to handle the response. This callback function typically takes the response data as a parameter and contains the logic to process and display the data.
- The application can then update the webpage or perform any necessary operations with the received data.
When is it beneficial to use threading models?
Threading models are beneficial in scenarios involving CPU-bound tasks or parallel processing, where tasks require heavy computation and can benefit from utilizing multiple processor cores. Threading allows for true parallelism, enabling tasks to execute simultaneously and speed up the overall processing time.
Describe how to use async/await functions.
async/await:
-The async/await syntax is a modern approach that makes asynchronous code appear more synchronous and easier to read.
-It is built on top of Promises and provides a way to write asynchronous code that looks similar to synchronous code.
-The async keyword is used to declare an asynchronous function, and the await keyword is used to wait for the Promise to be resolved before continuing execution.
-The try/catch block can be used to handle any errors that occur within the async function.
Describe the major parts of the Circuit Breaker Pattern
The Circuit Breaker pattern works by monitoring the availability and responsiveness of a service. It maintains the state of the service (closed, open, or half-open) based on the observed behavior. Here’s a step-by-step breakdown of how the pattern operates:
- Closed State: Initially, the circuit breaker is in the closed state, allowing requests to pass through to the remote service/API as usual.
- Monitoring: The circuit breaker monitors the responses from the service. It keeps track of metrics such as response times, error rates, or timeouts.
- Thresholds: Based on the observed metrics, the circuit breaker sets thresholds to determine when to open the circuit. For example, if the error rate exceeds a certain threshold or the response time exceeds a specified duration, it triggers the circuit breaker to open.
- Open State: When the circuit breaker opens, it stops any further requests from reaching the remote service/API. Instead, it immediately returns a predefined fallback response (e.g., an error or cached data) without forwarding the request.
- Wait Duration: After the circuit breaker opens, it enters a wait duration known as the “open state timeout.” During this period, the circuit breaker periodically allows a limited number of requests to pass through to the remote service/API to test its availability.
- Half-Open State: After the wait duration elapses, the circuit breaker enters the half-open state. It allows a small number of requests to reach the remote service/API. If these requests succeed, it indicates that the service is now available, and the circuit breaker transitions back to the closed state. However, if any request fails, the circuit breaker reopens and returns to the open state.
By utilizing the Circuit Breaker pattern, you can achieve the following benefits:
- Fail Fast: Requests are quickly intercepted and don’t waste resources on unresponsive or failing services.
- Graceful Degradation: Instead of completely blocking requests, a fallback response is provided during service unavailability, ensuring that the application can still function partially.
- Avoid Cascading Failures: By isolating problematic services, the Circuit Breaker pattern prevents the propagation of failures to other parts of the system.
It’s worth noting that there are various implementations of the Circuit Breaker pattern available in different programming languages and frameworks. Some popular libraries for circuit breaking include Hystrix (Java), resilience4j (Java), and Polly (.NET).