Big Data Lecture 11 Document Stores Flashcards
Why do we need Document Stores?
We need to build the same stack we had for tables now for XML/JSON. From textual stored information on, we have to do it all again!
How can we make trees fit into tables?
We can:<br></br><ul><li>push data flat trees into tables,</li><li>put linked nested data into relational tables,</li><li>fill missing heterogeneous with NULLs.</li></ul>
What is the optimal maximum document size for document stores?
<=16 MB
Which functions of RDBMS do Document Stores implement?
<ul><li>Projection,</li><li>Selection,</li><li>Aggregation,</li><li>but NOT Joins! (To be implemented on the user side.)</li></ul>
Is data in Document Stores validated? If so, when?
If schema is added, then data is validated on pupolation.<br></br><br></br>Schema can also be added later, and them the stored data is validated.
What are implementations of Document Stores?
MongoDB, elasticsearch, MarkLogic, ArangoDB…
How is data loaded into mongoDB?
It is ETLed, we do not have a data lake anymore!
How is data stored in MongoDB?
Using binary encoding of JSON, called BSON (used even if the data is validated).
What is the CRUD paradigm?
Lower level APIs do:<br></br><ul><li>Create,</li><li>Read,</li><li>Update,</li><li>Delete,</li></ul>data.
How does selection work in Document Stores?
We select the data that is matching a certain attribute value, we can also access nested elements (not to be confused with searching for nested elements with exact children).<br></br><br></br>We can have a disjunction of condition or a range query.<br></br><br></br>We can search for values that are not there as for ‘Null’.<br></br><br></br>We can also check contents of an array, if something is in there or not.
How does projection work in Document Stores?
We select using 1/0 columns we want to project, or project away. We cannot mix the values, as we do not know what the full set of columns is, hence we can only choose what we want or what we certainly do not want.
How can we aggregate data from query from Document Stores?
We can use sort, count, skip, limit, distinct, … all same as in RDBMS. We can also use the ‘aggregate’, which takes parameters like Spark query.
How to insert, update and delete in MongoDB?
<ul><li>Using insertOne, or insertMany,</li><li>using updateOne, or updateMany,</li><li>using deleteOne, or deleteMany.</li></ul>
<div>Where one does it for all matching, and one just for the first one in the collection.</div>
What is the granularity of MongoDB?
One document, many people can work on the same database, but only one person can alter one document at one time.
How to query documents on higher level?
Using query langugae! Just like JSONiq, or XQuery!