Software Architecture Flashcards by David Stojanovski

What are main driving forces behind using a NoSQL database?

Better scalability than relational databases for cases with large datasets or high throughput requirements
Open source revolution
Query operations that relational models don’t do well
Desire for a more expressive data model than a rigid schema imposed by relational databases

How well did you know this?

Not at all

Perfectly

What is a key problem with the relational data model for applications that are commonly written today, and what is a way to mitigate this problem?

Most business applications are written in object-oriented languages, which require a translation layer between the inherit representation of an object and how it is represented in tables, rows and columns. ORM frameworks partially abstract away this mismatch between models.

How well did you know this?

Not at all

Perfectly

What are some of the inherent advantages to a JSON data model and when are these advantages realised?

Nested structures are better-suited to database entries that are largely self-contained, like a profile or a blog post with associated comments in a one-to-many tree structure
Closer relationship between the object model from OOP and the JSON model for representing the data
Better locality of related data, as all related data is contained within the structure, rather than requiring joins’
No schema, so the model is more flexible if the exact fields in an entry cannot be rigidly defined

How well did you know this?

Not at all

Perfectly

What is the primary idea behind normalisation in databases?

Removing duplication of meaningful data in databases that will be shared across multiple records in the database by instead using an ID foreign key mapping to a table with standardised values.

How well did you know this?

Not at all

Perfectly

What are the main benefits of normalising a database?

Updating information needs to be done in only one place and will be replicated consistently across all records, such as when a country name changes
Ensures consistency for values that exist within a logical set, e.g. countries and cities
Easier and more semantically significant search, e.g. determining nearby users by storing metadata about their profile location
Easier localisation by having human-readable labels translated into multiple different languages, but all associated with the same primary key

How well did you know this?

Not at all

Perfectly

What are the major drawback of a document-based model?

If there is a many-to-one where many records reference one common record, meaning it would need to be stored as an ID in the document to avoid duplication and multiple queries may be required to retrieve the related data
Even though a model may not have originally required many-to-one or many-to-many relationships or joins, it may evolve over time into a more interconnected structure that does and document databases have limited support for joins
Denormalising data or replicating joins in code can lead to worse maintainability, reliability and performance for the data model

How well did you know this?

Not at all

Perfectly

What are the main advantages of a relational database model over a document database model?

Better support for joins
Better support for many-to-one relationships
Better support for many-to-many relationships

How well did you know this?

Not at all

Perfectly

Why would one denormalise a database?

If using a non-relational data model (such as a document database), support for joins is typically more limited, so denormalising data (by duplicating common data across records), reduces the need for replicating joins in application code.

How well did you know this?

Not at all

Perfectly

What is the distinction between schema-on-read and schema-on-write models in a database?

For a schema-on-read model, no explicit schema is enforced on data added to the database (as in a JSON document model), and the structure of a record is only known once it is read and interpreted by application code. By contrast, schema-on-write databases (like conventional relational databases), enforce a structure on each record as it’s inserted in the database. This guarantees that any record read from the database will match certain constraints.

How well did you know this?

Not at all

Perfectly

Under which circumstances does the difference between a schema-on-read and a schema-on-write database become most apparent and what is the distinction on a high level?

The distinction arises when updating the structure of records in that model, such as fields in a document or columns in a particular table
In a schema-on-read model, application code handles cases where certain records have the old structure and new records, after the change, have a different structure
On a schema-on-write model, a migration must be done on existing records so that they conform with the updated schema, but no special casing in application code is subsequently required

How well did you know this?

Not at all

Perfectly

Under which situations would one prefer a data model without an explicit schema, as opposed to an explicit schema-on-write model?

When there are many different types of objects (in terms of the names and nature of their fields) or they change so frequently that it would be impractical to maintain them as separate tables and perform regular migrations
When the data being stored in the database comes from an external source and there are no guarantees on the structure of data from that source

How well did you know this?

Not at all

Perfectly

When is the data locality provided by a document database model disadvantageous?

As the document is stored in a serialised format, the whole document needs to be read even though only one field, or a relatively subset of the fields, are needed
When updating a field in a record, the whole record needs to be rewritten. This especially becomes a problem if the record increases in size as a result of writes and limits how writes can be done in-place

How well did you know this?

Not at all

Perfectly

What is a good compromise between the choices of using a relational database and a document database in an application that doesn’t perfectly fit either model?

One option would be to use support in modern relational databases for document column types, enabling a relational structure with document querying aspects
The logical alternative is to use a document database with built-in support for joins or does joins on document references implicitly to simplify application code

How well did you know this?

Not at all

Perfectly

What are the major differences between a declarative query language like SQL and an imperative query model?

Declarative languages are more limited in expressive power, but are more concise than imperative queries to write
Declarative languages make no inherent assumptions about the ordering of data records, while there is no guarantee that an imperative query does not assume records are ordered
Declarative languages make no assumptions about how a query is processed by the database, enabling the database to optimise queries more transparently
Imperative code is harder to parallelise, whereas databases can freely use distributed or parallel implementations of a query language

How well did you know this?

Not at all

Perfectly

What is MapReduce and when would it be useful?

MapReduce is a programming model for read-only queries over a large set of documents distributed over many machines
It is useful in scenarios where data needs to be aggregated en masse

How well did you know this?

Not at all

Perfectly

What is one implementation of MapReduce in a document-based model and what are its key components?

Study These Flashcards

MongoDB implements MapReduce
The key components are two functions
- Map: Processing function performed on each record
- Reduce: Aggregates all records processed by map

What is one major drawback of MongoDB’s MapReduce implementation and what did MongoDB introduce to mitigate it?

Study These Flashcards

MapReduce involves authoring two carefully-coordinated functions to correctly aggregate data
The aggregation pipeline mitigates this issue by providing a declarative JSON syntax for querying records

When would it be advantageous to use a graph-like data model and what is the primary advantage?

Study These Flashcards

Graph-like data models are useful when the objects in your data model are all highly-connected and have a high degree many-to-many relationships
By employing a standard graph structure with vertices and edges, standard graph algorithms can be used when analysing data in the network

What are the key components of a property graph model?

Study These Flashcards

A table representing the vertices in the graph, which has a unique identifier and a set of key-value pairs (properties)
A table representing the edges in the graph, has a unique identifier, properties, and most importantly foreign keys pointing to the head and tail node for the edge

What are some important aspects of the property graph model and what advantages does each provide?

Study These Flashcards

No schema defines which nodes can be connected to other nodes, enabling easy evolution of the model
By using indexes on the head and tail node foreign keys, graph traversal is efficient
Edges can define arbitrary relationship types through their properties, enabling the expression of a mix of model types in a graph

What is a real example of a graph database and a query language for that graph database?

Study These Flashcards

Neo4j is one graph database

2. Cypher is an example of a query language for that database

What is a key feature of the Cypher query language that allows it to efficiently query items in a graph database?

Study These Flashcards

Cypher’s expressive graph traversal queries allows defining queries through relationships succinctly, e.g.

MATCH
(person) -[:BORN_IN]-> () -[:Within*0]-> (us:Location {name:’Australia’})
RETURN person.name

This allows the query to traverse any number of edges (:Within*0) that ultimately point back to the Australia vertex, efficiently searching multiple paths.

What is one method that SQL implements graph database query structures and what is a major disadvantage of it?

Study These Flashcards

SQL has recursive common table expressions, which can be used to represent an arbitrary number of joins in a query
Its main disadvantage is that is rather clumsy to use in comparison with a native graph database query structure like Cypher

What is a triple-store model?

Study These Flashcards

A triple-store model is similar to a property graph model, and each graph entry is stored as the following triple: (subject, predicate, object), where

The subject is a vertex in the graph
The predicate is a graph edge if the object is also a vertex, otherwise it defines a property
The object, which is either a vertex (if describing a relationship) or a literal if describing a property of a vertex

What is SPARQL and when would one use it?

SPARQL is an efficient query language that was written to deal with triple-store databases, and is an efficient and concise choice when working with graph databases that use this model, e.g. when working with the Resource Description Framework (RDF).

Software Architecture Flashcards

Kleppman: Designing Data-Intensive Applications (25 cards)