Web Application & Software Architecture 2 Flashcards

Question

# CAP Consistency Priority

Answer 1

Prioritizing consistency requires locking nodes until all are online, ensuring data synchronization and strong consistency but impacting availability.

Answer 2

The choice between availability and consistency depends on use cases and business requirements, defining system behavior during failures.

Answer 3

The CAP theorem mandates choosing between availability and consistency in distributed systems, impossible to achieve simultaneously.

Answer 4

Nodes globally dispersed face latency issues, making instantaneous consensus impossible despite partition tolerance. Designing systems that balance availability and consistency remains a fundamental challenge due to the CAP theorem's implications. The CAP theorem necessitates trade-offs between availability and consistency, reflecting the reality of distributed system design.

Answer 5

Document-oriented database Key-value datastore Wide-column database Relational database Graph database Time-series database Databases dedicated to mobile apps and so on.

Answer 6

Document-oriented databases store data in a document-oriented model using independent documents, often in a JSON-like format.

Answer 7

They align closely with the application code's data model, making data storage and querying simpler for developers.

Answer 8

MongoDB, CouchDB, OrientDB, Google Cloud Datastore, and Amazon DocumentDB are among the popular choices for document-oriented databases.

Answer 9

Semi-structured data and an anticipated need for frequent schema changes. Uncertainty about the initial schema with expectations of evolving requirements. When rapid scalability and continuous high availability are crucial.

Answer 10

Real-time feeds, live sports apps, product catalogs, inventory management, user comments, and web-based multiplayer games are typical scenarios well-suited for document-oriented databases.

Answer 11

Graph databases store data in nodes and edges representing relationships between entities, ideal for modeling complex relationships.

Answer 12

Graph databases simplify querying complex relationships, eliminating the need for multiple joins often required in relational databases.

Answer 13

Examples include social networks (like Facebook's graph search), recommendation engines, route planning (as seen in Google Maps), and NASA's data storage for lessons learned from missions.

Answer 14

Graph data models use vertices (nodes) and edges to represent entities and relationships, providing an efficient way to visualize and query complex data structures.

Answer 15

Graph databases are ideal for scenarios involving complex relationships such as social networks, recommendation engines, fraud analysis, knowledge graphs, AI applications, and genetic data storage.

Answer 16

Neo4J is one of the prominent graph databases used, known for its efficient handling of complex relationships and real-time querying capabilities.

Answer 17

Key-value databases use a simple key-value pairing method, enabling quick data retrieval with minimal latency.

Answer 18

Use cases include caching, managing real-time data, persisting user sessions, implementing queues, creating leaderboards, and pub-sub systems.

Answer 19

Redis, Hazelcast, Riak, Voldemort, and Memcached are among the popular key-value data stores used in the industry.

Answer 20

They offer minimum latency with constant time O(1) for data fetching, making them ideal for use cases requiring super-fast data retrieval.

Answer 21

Key-value databases are best suited for scenarios where data needs to be fetched rapidly with minimal complexity in data retrieval operations.

Answer 22

Twitter utilizes Redis in its infrastructure, while Google Cloud implements caching using Memcached on its platform.

Answer 23

Time-series databases handle data associated with events occurring over time, often tracked from IoT devices, sensors, financial markets, etc.

Answer 24

Storing time-series data enables the study of user patterns, system behaviors, anomalies, and facilitates running analytics to derive insights for informed business decisions.

Answer 25

InfluxDB, TimescaleDB, and Prometheus are among the popular time-series databases used in the industry.

Answer 26

Time-series databases are ideal for scenarios requiring continuous, real-time data management over extended periods, such as handling IoT device data or running real-time analytics.

Answer 27

M3DB powers time-series metrics workflows at Uber, while Apache Druid is used for real-time analytics at Airbnb.

Answer 28

Time-series data consists of data points associated with events occurring over time, often collected from various sources like sensors, IoT devices, social networks, etc.

Answer 29

Wide-column databases, also known as column-oriented databases, specialize in handling large volumes of data.

Answer 30

Records in a wide-column database consist of a dynamic number of columns and can hold billions of columns.

Answer 31

Cassandra, HBase, Google BigTable, and ScyllaDB are among the well-known wide-column databases used in various industries.

Answer 32

Wide-column databases are best suited for scenarios involving Big Data, offering scalability, high performance, and availability.

Answer 33

Netflix employs Cassandra in its analytics infrastructure, while Adobe and other major companies use HBase for processing large volumes of data.

Answer 34

Wide-column databases are ideal for managing Big Data, offering scalability and high performance for analytical use cases.

Answer 35

1. Web-based multiplayer games 2. Product catalogues 3. Real-time feeds

Answer 36

Caching involves storing frequently accessed data from a database in RAM for faster response times, reducing latency, and improving throughput.

Answer 37

Caching ensures low latency and high throughput by intercepting and responding to frequent requests, allowing the database to focus on other operations.

Answer 38

By storing frequently requested data, caching reduces the need for database computations, especially for complex queries involving multiple table joins.

Answer 39

Caching can involve both static data (like images, CSS files) and dynamic data (such as frequently accessed database queries).

Answer 40

Dynamic data in a cache typically has a Time To Live (TTL), after which it's purged and updated with new data. Cache invalidation ensures data remains current

Answer 41

Caching might not be as effective for data that changes too often, like real-time stock prices or live sports scores, as the constant updates could invalidate the cached data quickly.

Answer 42

Caching static data reduces server load, speeds up content delivery, and minimizes data transfer by storing unchanging data closer to the user.

Answer 43

Static data can be cached on the client-side (in browsers), content delivery networks (CDNs), or on servers depending on the nature and sensitivity of the data.

Answer 44

Understanding the nature of the application's data, frequency of changes, and the balance between data volatility and potential performance improvements guides the decision to implement caching.

Answer 45

Caching helps in improving application performance by reducing data retrieval latency, especially with static data.

Answer 46

Caching can be utilized at various levels within an application, including client browsers for static data, with the database to intercept data requests, in REST API implementation, cross-module communication in microservices, and more.

Answer 47

Caching alleviates database load by intercepting data requests, reducing the need for repeated database queries and joins, leading to improved response times and reduced latency.

Answer 48

Even if the database experiences downtime, caching mechanisms continue to serve data requests, ensuring users experience uninterrupted service.

Answer 49

Caching is fundamental in the HTTP protocol and can store user sessions, leveraging key-value data stores primarily for implementing caching in web applications.

Answer 50

While caching is beneficial, it should be implemented wisely to prevent data inconsistency issues, and it can be applied at various layers within an application architecture.

Answer 51

Caching can circumvent the need for performing complex joins in relational databases, significantly improving response times and application speed.

Answer 52

During database outages, the cached data can continue to serve user requests, ensuring a seamless experience even when the primary database is down.

Answer 53

Key-value data stores are predominantly utilized for implementing caching in web applications due to their ability to efficiently store and retrieve cached data.

Answer 54

The game used a caching mechanism (Memcache) to store updated stock prices every second, then scheduled batch operations to update the database at intervals, saving significantly on database write costs.

Answer 55

Polyhaven managed 5 million page views and 80TB traffic monthly for less than 400 USD by leveraging caching. Without caching, storing that data on cloud object-based storage could have cost them around 4K USD monthly.

Answer 56

Both applications saved on database storage costs by using caching mechanisms to store less critical or frequently updated data, reducing the frequency of database writes and associated expenses.

Answer 57

Instead of persisting updated stock prices every second, it cached the changes using Memcache and performed batch updates periodically, significantly reducing database write costs.

Answer 58

Caching helped minimize database write operations, utilizing cheaper cache storage (Memcache) compared to database storage costs, resulting in substantial cost savings for both applications.

Answer 59

Leveraging caching for non-critical or frequently updated data can significantly reduce storage costs, especially when using a more affordable cache storage solution compared to traditional database storage.

Answer 60

Caching reduced the frequency of expensive database writes by storing and updating data in a cache, allowing less critical or frequently updated information to be managed more cost-effectively.

Answer 61

Cache aside aims to minimize database hits by lazily loading data into the cache, fetching from the database if it's a cache miss, and updating the cache for future requests.

Answer 62

Read-through caching automatically maintains cache consistency with the database, handled by the cache library or framework, unlike cache aside where explicit logic updates the cache.

Answer 63

Write-through caching routes every write operation through the cache before updating the database, ensuring high data consistency between cache and database, albeit with a slight latency increase.

Answer 64

Write-back caching directly writes data to the cache instead of the database, delaying database writes based on business logic, significantly reducing the frequency of database writes and associated costs.

Answer 65

Cache aside is ideal for read-heavy workloads, caching data that doesn't frequently change, such as customer data, with a longer TTL.

Answer 66

Write-back caching risks data loss if the cache fails before updating the database. This strategy is often combined with others to balance performance and data reliability.

Answer 67

Read-through caching maintains cache consistency automatically through the cache library or framework, unlike cache aside where explicit logic is necessary for cache updates.

Answer 68

Write-through caching ensures high data consistency between the cache and the database, making it suitable for scenarios requiring strict data integrity.

Answer 69

Message queues follow the FIFO (First in, first out) principle, delivering messages in the order they are added to the queue.

Answer 70

Message queues enable asynchronous communication between modules, allowing them to communicate in the background without hindering their primary tasks.

Answer 71

An email service stores messages in a queue until the recipient is online, allowing asynchronous communication between sender and receiver without both needing to be online simultaneously.

Answer 72

Message queues allow tasks like sending confirmation emails in the background while users can continue navigating the application, enhancing the user experience.

Answer 73

A message queue executed the batch job responsible for updating stock prices at regular intervals in the database, optimizing application hosting costs.

Answer 74

The producer is responsible for sending messages, while the consumer receives and processes these messages. They don't need to reside on the same machine to communicate.

Answer 75

Message queues enable defining rules for message processing, such as adding message priorities, acknowledgments, and handling failed messages.

Answer 76

Message queues can be infinite buffers, theoretically unlimited in size, depending on the business's infrastructure resources.

Answer 77

The publish-subscribe message routing model is widely used, allowing consumers to subscribe to specific message types and consume information as needed.

Answer 78

It enables single or multiple producers to broadcast messages to multiple consumers interested in specific topics or types of information.

Answer 79

Similar to a newspaper service delivering news to multiple subscribers, the pub-sub model delivers messages to multiple consumers subscribed to specific topics or segments.

Answer 80

Exchanges in message queues route messages to queues based on their type and established rules, acting as intermediaries between producers and consumers.

Answer 81

Some common exchange types include direct, topic, headers, and fanout, each with distinct functionalities and use cases.

Answer 82

The fanout exchange type excels in the pub-sub pattern by broadcasting messages to all connected queues, allowing multiple consumers to receive the same message.

Answer 83

They power real-time feeds and notification systems, enabling users to receive continuous updates from followed pages or topics of interest.

Answer 84

The next lesson will cover the point-to-point messaging model, an alternative to the pub-sub pattern in message queue communication.

Answer 85

The point-to-point model involves a single producer sending a message to be consumed by only one consumer, akin to a one-to-one relationship compared to the one-to-many relationship in the publish-subscribe model.

Answer 86

Unlike broadcast-style messaging, the point-to-point model facilitates direct communication between entities, allowing only one consumer to receive and process a message from a producer.

Answer 87

The widely used protocols in message queues are AMQP (Advanced Message Queue Protocol) and STOMP (Simple Text Oriented Messaging Protocol), each having various implementations in specific messaging technologies.

Answer 88

A message queue facilitates real-time updates by distributing new posts to user connections, enabling asynchronous communication without the need for frequent database polling.

Answer 89

The pull-based approach relies on frequent database polling by users to retrieve new updates, which can be resource-intensive and lacks real-time synchronization. The push-based approach, with the aid of a message queue, immediately distributes new posts to connected users without the need for frequent polling, enhancing system performance and providing real-time updates.

Answer 90

The pull-based method leads to high database load due to frequent polling and doesn't offer real-time updates since new posts are only displayed upon database polling.

Answer 91

By employing message queues, the push-based method ensures immediate distribution of new posts to connected users without database polling, enhancing performance and enabling real-time updates.

Answer 92

Developers can address failure scenarios by implementing distributed transactions as a single unit, rolling back the entire transaction if either database persistence or message queue push fails. Additionally, failed messages can be stored in the database for future retrieval.

Answer 93

A message queue queues concurrent update requests, processing them sequentially in a FIFO approach, ensuring high availability and consistency.

Answer 94

Facebook uses a message queue to manage surges in user requests during live streaming, fetching real-time data from the streaming server to populate the cache and serve queued requests efficiently.

Answer 95

By queuing update requests and processing them sequentially, a message queue helps maintain system consistency while handling a high volume of concurrent updates.

Answer 96

A pull-based approach is resource-intensive. In this approach, the application servers have to deal with a lot of unnecessary requests. Also, the notification delivery via a pull-based approach is not in real-time.

Answer 97

We can queue all the incoming requests to update a particular resource and then process them one by one instead of letting all the requests update the resource in no particular order, subsequently making things inconsistent.

Answer 98

The rise of the Internet of Things (IoT) has empowered entities to generate, transmit, and process vast amounts of data, enabling self-awareness and decision-making without human intervention.

Answer 99

IoT devices find extensive usage in industries, smart cities, wearables (like healthcare sensors, smartwatches), and gadgets (drones, cellphones), necessitating sophisticated backend systems to manage and derive meaningful insights from the data.

Answer 100

Processing streaming data aids businesses in making informed decisions, understanding customer needs and behavior, creating better products, executing effective marketing campaigns, and gaining deeper insights into the market, enhancing customer-centric approaches and loyalty.

Answer 101

Stream processing is instrumental in monitoring IoT device signals, ensuring correct functionality and uptime, providing critical metadata for businesses to assess product performance and service reliability.

Answer 102

Data ingestion is the process of collecting data from diverse sources, preparing it for system processing, and routing it through data pipelines for analysis and archiving.

Answer 103

The layers include Data Collection, Query, Processing, Visualization, Storage, and Security, each serving specific functions in the data processing ecosystem.

Answer 104

Data arrives from various sources in different formats and speeds; standardization is crucial to convert this heterogeneous data into a uniform format for seamless processing and analysis.

Answer 105

The data processing layer routes standardized data, performs business-specific processing, and directs data flows to different destinations based on requirements.

Answer 106

The data visualization layer presents insights obtained from data analysis in a comprehensible format using tools like Kibana, facilitating easy interpretation for stakeholders.

Answer 107

The data security layer ensures secure data movement and the prevention of security breaches throughout the data processing stages.

Answer 108

Real-time ingestion and batch ingestion are the two primary methods discussed. Real-time ingestion involves receiving data instantly, crucial for systems like medical or financial data analysis. Batch ingestion processes data at regular intervals and suits applications studying trends over time.

Answer 109

Real-time ingestion is favored in systems dealing with critical time-sensitive data, such as medical information (e.g., heartbeat monitoring) or financial data (e.g., stock market events). It ensures prompt availability of information crucial for decision-making.

Answer 110

Developers encounter several challenges, including: Data Variety: Data from different sources arrives in varying formats, demanding standardization, and transformation, leading to a resource-intensive process. Resource Intensiveness: The process demands significant computing resources and time due to data transformation and authentication stages. Security Concerns: Moving data across stages poses security risks, requiring continuous effort to meet organizational security standards.

Answer 111

Real-time data ingestion provides immediate insights but might lack accuracy due to limited data availability for analysis. In contrast, batch processing considers the entire dataset over time, often resulting in more accurate results due to a comprehensive analysis.

Answer 112

Resource-intensive data flow processes demand substantial preparation before ingestion, requiring dedicated teams and, at times, the creation of custom solutions when available tools fail to meet specific needs.

Answer 113

Data movement across various stages poses vulnerability, necessitating continuous vigilance and additional resources to ensure the system meets security standards at every stage of data processing.

Answer 114

LinkedIn developed "Goblin," an in-house data ingestion tool, to address challenges faced by having fifteen data ingestion pipelines, highlighting instances where existing tools couldn't meet their requirements.

Answer 115

The rapid evolution of IoT devices results in changing data semantics, necessitating frequent updates in backend data processing code to adapt to these changes.

Web Application & Software Architecture 2 Flashcards

(141 cards)