Web Application & Software Architecture 2 Flashcards
Working with NoSQL Databases:
Ease of Development:
Simplified Operations:
No stress of managing complex queries or relationships.
Efficiency:
Key-based object retrieval leads to faster operations.
Popular NoSQL Databases:
Industry Usage:
Examples include MongoDB, Redis, Neo4J, Cassandra, Memcache, etc.
Performance Comparison: SQL vs. NoSQL
Technology Performance:
Equality in Performance:
Relational and non-relational databases are equally performant from a technology benchmarking standpoint.
Dependence on System Design:
System design and architecture play a critical role in performance, more so than technology choice.
Performance Comparison: SQL vs. NoSQL
Tech Stack Choices:
Popular Tech Stack Preferences:
Tech stacks like MEAN (MongoDB, ExpressJS, AngularJS/ReactJS, NodeJS) often prefer NoSQL databases.
Reasons for Prevalence:
Convenience, availability of resources, and commercial factors influence tech stack choices.
Performance Comparison: SQL vs. NoSQL
Importance of Fit:
Use Case Alignment:
Focus on picking the technology that best suits the specific use case rather than following popular stacks blindly.
Performance Comparison: SQL vs. NoSQL
Performance Factors:
Application Architecture Impact:
Performance heavily reliant on architecture, database design, network latency, etc.
Complexity Impact:
Join-heavy relational databases may impact response times but can match NoSQL speed when simplified.
Real-World Examples:
Facebook and Quora:
Facebook’s MySQL Use:
Utilizes MySQL for storing user social graphs, making engine tweaks to suit its use case.
Quora’s Efficient MySQL Use:
Efficient partitioning of data in MySQL achieved at the application level.
Emphasis on Design:
Design Impact on Performance:
Well-designed SQL data stores often outperform less optimized NoSQL stores.
Polyglot Persistence:
Hybrid Database Use:
Polyglot Persistence Concept:
Leveraging multiple databases (SQL and NoSQL) in an application for varied persistence needs.
Common Practice:
Large-scale online services often use a mix of SQL and NoSQL for optimal persistence behavior.
Benefits of Polyglot Persistence:
Tailored Solutions:
Specific Data Needs:
Selecting the right database for each unique data storage and access requirement.
Enhanced Performance:
Optimized Performance:
Improved performance by leveraging specialized databases for different functionalities.
Diverse Features:
Utilizing Unique Features:
Accessing and utilizing specific database features tailored for distinct purposes.
Scalability and Flexibility:
Scalability Support:
Scalable solutions catered to varied data handling scenarios.
Drawbacks of Polyglot Persistence:
Complexity Concerns:
Increased Complexity:
Challenges in integrating, managing, and maintaining multiple databases together.
Learning Curve:
Diverse Skill Set:
Requires expertise across different database technologies.
Real-World Application:
Example Scenario:
Social Networking App Design:
Utilizing multiple databases (relational, key-value, wide-column, etc.) to serve different functionalities within the application (user relationships, session management, analytics, ads, search, etc.).
Multi-Model Databases
Integration of Different Models:
Support for Multiple Models:
Enable usage of various data models (graph, document-oriented, relational, etc.) within a single database system.
Unified Database System:
Eliminate the need for managing multiple databases or services for different data models.
Operational Simplification:
Reduced Complexity:
Minimize operational complexities associated with managing multiple persistence technologies.
Single API Integration:
Provides access to different data models through a unified API
Popular Multi-Model Databases:
Notable Examples:
ArangoDB:
Known for its multi-model capabilities supporting graph, document, and key-value data.
CosmosDB:
Microsoft’s offering providing multi-model support for various data types and APIs.
OrientDB:
Combines graph and document capabilities within a single database.
Eventual Consistency
Eventual consistency is a model where datastores prioritize high availability over immediate consistency across all nodes in a distributed system. It’s a fundamental concept in distributed systems, ensuring data eventually reaches a consistent state globally, even if momentarily inconsistent across different nodes or geographical regions.
Key Aspects of Eventual Consistency:
High Availability Focus:
Primary Objective:
Prioritize system availability and continuous write operations over immediate global consistency.
Data Propagation Delay:
Propagation Timeframe:
Data changes take time to propagate across distant nodes or geographic zones.
Momentary Inconsistencies:
Users may observe temporarily different or outdated data due to propagation delays.
Strong Consistency
Data must be consistently the same across all nodes at any given time, requiring locking nodes during updates to ensure synchronicity.
Strong Consistency
Social Application
In a microblogging site, implementing strong consistency would involve locking all nodes globally when a user in one zone updates a post, preventing concurrent updates until a consensus is reached.
Strong Consistency
Locking Nodes
Nodes are locked during updates to ensure only one user can modify data at a time until a global consensus is achieved.
Strong Consistency
Real-World Application: Stock Market System
In financial applications like stock markets, strong consistency ensures users across regions see the same stock prices, preventing chaos due to simultaneous updates.
Strong Consistency
Challenge of Strong Consistency: Scaling and Availability
Strong consistency impedes scalability and availability, limiting concurrent updates while ensuring data consistency.
Strong Consistency
Implementation Strategy: Queuing Write Requests
Managing write requests in a queue ensures strong consistency but can limit system scalability. Details on this are covered in a message queue chapter.
Strong Consistency
Impact on ACID Transactions:
Strong consistency enables ACID transactions but hinders the ability to scale globally due to concurrent update restrictions.
Strong Consistency
Tradeoff with NoSQL and Distributed Systems:
NoSQL databases prioritize scalability and availability, sacrificing global ACID transactions due to their inherent design.
Strong Consistency
Purpose of NoSQL Technology:
NoSQL was developed to scale and ensure high availability, compromising on strong consistency for these benefits.
Strong consistency ensures synchronized data but constrains system scalability and concurrent updates, contrasting with NoSQL’s emphasis on scalability and high availability.
CAP Theorem
In case of network failure, the system can prioritize either availability or consistency, not both simultaneously.
CAP
Trade-off Explanation
During node failures, prioritizing availability allows continued write operations, leading to potential inconsistency upon offline nodes’ return.
CAP
Consistency Priority
Prioritizing consistency requires locking nodes until all are online, ensuring data synchronization and strong consistency but impacting availability.
CAP
Decision Determinants
The choice between availability and consistency depends on use cases and business requirements, defining system behavior during failures.
CAP
Distributed System Impact
The CAP theorem mandates choosing between availability and consistency in distributed systems, impossible to achieve simultaneously.
CAP
Latency Acknowledgment
Nodes globally dispersed face latency issues, making instantaneous consensus impossible despite partition tolerance.
Designing systems that balance availability and consistency remains a fundamental challenge due to the CAP theorem’s implications.
The CAP theorem necessitates trade-offs between availability and consistency, reflecting the reality of distributed system design.
Different types of databases
Document-oriented database
Key-value datastore
Wide-column database
Relational database
Graph database
Time-series database
Databases dedicated to mobile apps and so on.
What defines document-oriented databases among NoSQL systems?
Document-oriented databases store data in a document-oriented model using independent documents, often in a JSON-like format.
Why are document-oriented databases seen as developer-friendly?
They align closely with the application code’s data model, making data storage and querying simpler for developers.
What are some popular document-oriented databases?
MongoDB, CouchDB, OrientDB, Google Cloud Datastore, and Amazon DocumentDB are among the popular choices for document-oriented databases.
In what scenarios should one consider using a document-oriented database?
Semi-structured data and an anticipated need for frequent schema changes.
Uncertainty about the initial schema with expectations of evolving requirements.
When rapid scalability and continuous high availability are crucial.
What are some typical use cases for document-oriented databases?
Real-time feeds, live sports apps, product catalogs, inventory management, user comments, and web-based multiplayer games are typical scenarios well-suited for document-oriented databases.
What defines a graph database in contrast to other types of databases?
Graph databases store data in nodes and edges representing relationships between entities, ideal for modeling complex relationships.
Why might developers prefer graph databases over relational databases for managing relationships?
Graph databases simplify querying complex relationships, eliminating the need for multiple joins often required in relational databases.
What are some real-world applications that leverage graph databases?
Examples include social networks (like Facebook’s graph search), recommendation engines, route planning (as seen in Google Maps), and NASA’s data storage for lessons learned from missions.
What distinguishes the graph data model from other database models?
Graph data models use vertices (nodes) and edges to represent entities and relationships, providing an efficient way to visualize and query complex data structures.
When should one consider using a graph database for a project?
Graph databases are ideal for scenarios involving complex relationships such as social networks, recommendation engines, fraud analysis, knowledge graphs, AI applications, and genetic data storage.
Name some popular graph databases used in the industry.
Neo4J is one of the prominent graph databases used, known for its efficient handling of complex relationships and real-time querying capabilities.
What is the primary feature that distinguishes key-value databases?
Key-value databases use a simple key-value pairing method, enabling quick data retrieval with minimal latency.
What are the typical use cases for key-value databases?
Use cases include caching, managing real-time data, persisting user sessions, implementing queues, creating leaderboards, and pub-sub systems.
Name some popular key-value databases.
Redis, Hazelcast, Riak, Voldemort, and Memcached are among the popular key-value data stores used in the industry.
Why are key-value databases suitable for caching application data?
They offer minimum latency with constant time O(1) for data fetching, making them ideal for use cases requiring super-fast data retrieval.
When should one consider choosing a key-value database for a project?
Key-value databases are best suited for scenarios where data needs to be fetched rapidly with minimal complexity in data retrieval operations.
Provide examples of real-world implementations using key-value databases.
Twitter utilizes Redis in its infrastructure, while Google Cloud implements caching using Memcached on its platform.
What type of data do time-series databases handle?
Time-series databases handle data associated with events occurring over time, often tracked from IoT devices, sensors, financial markets, etc.
Why is it essential to store massive amounts of time-series data?
Storing time-series data enables the study of user patterns, system behaviors, anomalies, and facilitates running analytics to derive insights for informed business decisions.
Name some popular time-series databases.
InfluxDB, TimescaleDB, and Prometheus are among the popular time-series databases used in the industry.
When should one consider using a time-series database?
Time-series databases are ideal for scenarios requiring continuous, real-time data management over extended periods, such as handling IoT device data or running real-time analytics.
What are some real-world implementations of time-series databases?
M3DB powers time-series metrics workflows at Uber, while Apache Druid is used for real-time analytics at Airbnb.
Define time-series data.
Time-series data consists of data points associated with events occurring over time, often collected from various sources like sensors, IoT devices, social networks, etc.
What type of databases belong to the NoSQL family and handle massive amounts of data, particularly Big Data?
Wide-column databases, also known as column-oriented databases, specialize in handling large volumes of data.
How are records structured in a wide-column database?
Records in a wide-column database consist of a dynamic number of columns and can hold billions of columns.
Name a few popular wide-column databases.
Cassandra, HBase, Google BigTable, and ScyllaDB are among the well-known wide-column databases used in various industries.
When is it recommended to use a wide-column database?
Wide-column databases are best suited for scenarios involving Big Data, offering scalability, high performance, and availability.