XQueries Flashcards
Xqueries
A basic definition is to say they are SQL for XML files, and is an extension of XPaths.
FLWR
More general XQuery.
F -> for
L -> let
W -> where
R -> return
FLWR Example
let $doc := doc(“mydoc.xml)
for $s in $doc/university/student
where $s/module = “Computer Science”
return $s/name
let $doc := doc(“mydoc.xml)
The let clause, where we define the document we are going into.
for $s in $doc/university/student
The for clause, where we define the for loop.
where $s/module = “Computer Science”
The where clause, specifying conditions for the return.
return $s/name
The return clause, aka what is returned.
Return clause pairs
<pair>{variableone},{variabletwo}</pair>
What is returned
If we were doing this example:
let $doc := doc(“mydoc.xml)
for $s in $doc/university/student
where $s/module = “Computer Science”
return $s/name
an example for a return could be:
<name>Anna</name>
Order by
let $doc := doc(“mydoc.xml)
for $s in $doc/university/student
where $s/module = “Computer Science”
order by $s/name
return $s/name
Group by
let $doc := doc(“mydoc.xml”)
for $m in $doc/university/student/module
group by $mod:=$m
return <pair>{$mod},{count($m)}</pair>
Distinct values
distinct-values(let $doc := doc(“mydoc.xml)
for $m in $doc/university/student/module
return $m)
Three ways of XML to SQL
- Store XML documents as entries of a table.
- Store XML documents in schema-independent form.
- Store XML documents in shredded form across a number of attributes and relations.
Storing XML as an attribute
Raw XML is stored in serialised form, which makes it efficient to insert document into database and retrieve them in their original form since it is all serial.
Storing XML as a schema-independent representation
In simple terms, this means storing it as a tree structure in our database.
Recursive nature of schema-independent representation
Can cause problems, since if we had to retrieve something three layers deep, that’s three recursions, aka three queries.
Denormalised index
Overcomes recursive problem by containing combinations of path expressions and a link to the node and parent node.
Essentially, its a table which gives us the index of what we want instead of having to traverse the tree itself.
Storing XML in shredded form
In other words, putting extracted information from XML into database.
XML is decomposed into its constituent elements and data distributed over number of attributes in one or more relations.
All of this makes it easier for indexing values of some elements, and can also provide hierarchy.
NoSQL
“not only SQL” or “not relational”.
NoSQL Advantages
- Faster access to data.
- Fault-tolerance.
- Flexibility in data storage.
- Full ACID compliance can sometimes be relaxed.
Web Service database split
NoSQL Database:
Used for slightly less important things, like user profiles or shopping carts. This data doesn’t always need to be 100% correct; if something is not updated properly, it isn’t devastating. As we can see, the way it is store is slightly less strict, and contains very simply queries.
Relational Database:
Very important data like credit card transactions; requires compliancy and security.
NoSQL database setup
Typically distributed with nodes running on commodity hardware with a standard network.
NoSQL database setup advantages
- availability
- consistency (not the same as acid)
- scalability
- high performance
- partition tolerance
Availability
Every non-failing node always execute queries.
Consistency
Every read receives the most recent write, or an error.
Scalability
More capacity by adding new nodes.
High Performance
Often achieved by very simple interface.
Partition Tolerance
Even if nodes fail, the remaining subnetworks can continue their work.
CAP Theorem
States we cannot achieve consistency, availability and partition tolerance all at the same time.
Consistency and Availability
Single node DBMS.
Consistency and partition tolerance
NoSQL databases.
Availability and partition tolerance
NoSQL databases.
BASE
An alternative of ACID to compensate for the CAP theorem. Stands for Basically Available, Soft state, Eventually consistent.
Basically Available
Rather than enforcing consistency, it will ensure availability of data instead.
Soft state and Eventual consistency
The database state might occasionally be inconsistent but will eventually be made consistent.
Common classification for NoSQL databases
- key-value stores
- document store
- column stores
- graph databases
Key-value stores
Given a key and value, we can insert data into a database.
Given a key, we can find a value in a database.
Available systems for key-value stores
- Apache Cassandra
- Amazon DynamoDB
- Apache Voldemort
- Memcached
- Redis
- Riak
Distributed Storage
Each key value pair (k, v) is stored at some node.
Distributed Storage Steps
- Assign values v for key k to integer between 0 and (2^n)-1, in which (2^n)-1 gives us the amount of space to store places for nodes, as well as duplicates for these nodes. We do the hash function for these numbers.
- Distribute nodes to some of the integers (typically random).
- If (k,v) is assigned to integer i, then store at node following i.
Adding new nodes
Can be done easily with horizontal fragmentation, aka we split C horizontally and store it at different locations.
Then, any key-value pairs that would have been in C that are stored in A are now transferred over to C.
Replication
Ensures availability, storing copies of the key-value pairs on multiple nodes.
For example, if we have a key-value pair which is stored in the north-eastern A node, if A receives a duplicate then it will be stored in B.
If we receive multiple duplicates, we store one at each consecutive node.
Properties for key-value stores
- Scalability -> simple adding via horizonal fragmentation
- Availability and fault-tolerance -> via replication.
- High performance -> apply a hash function to determine anode, then ask the node. Same with writing.
Eventual consistency for key-value stores
One of the problems with key-value stores is that we cannot ensure consistency due to CAP Theorem.
Therefore, we provide eventual consistency, which allows multiple versions of data item to be present at the same time (versioning).
If the newer version is not available, the older one is updated and used instead.
Document Stores
Database which stores a collection of documents.
Document is essentially semi-structured data associated with an object id.
These documents are typically represented in JSON.
JSON vs XML
For simplicity reasons, the difference is syntax alone.
XML:
<students>
<student>
<name>Anna</name>
<number>57904</number>
</student>
</students>
JSON:
{“students”:[
{“name “ : “Anna”,
“number” : 57904},
…
]}
MongoDB Document Store commands
- creating/managing collections
- insert/update/delete documents
- finding documents
- indexing documents
Creating/managing collections in MongoDB
db.createCollection(“students”)
Insert/update/delete documents in MongoDB
db.students.insert({name: “Anna”})
Finding documents in MongoDB
db.studuents.find({name: “Anna”})
Indexing documents in MongoDB
db.students.createIndex({name: 1})
Techniques in MongoDB
- horizontal fragmentation -> collections are split into horizontal fragments based upon shard key, which is the indexed field in all documents.
- replication -> horizontal fragments of collections are replicated.
Column Stores
Hard to explain, but examples include Google Bigtable and Apache HBase.
HBase commands
- creating tables
- inserting rows
- finding documents
Creating tables in HBase
create ‘STUDENT’, ‘Name’, ‘ID’
Inserting rows in HBase
put ‘STUDENT’, ‘row1’, ‘Name:Fname’,’Anna’
Finding documents
get ‘STUDENT’, ‘row1’
scan ‘STUDENT’
Techniques in HBase
Fragmentation is split into two:
- Top level -> rows are divided into regions.
- Bottom level -> regions store different column families in different nodes.
Timestamps in HBase
Each item has a timestamp and one can access past versions of the database if setup.
Graph Databases
Simply, stores data as a graph.
Data is accessed using SQL-like path query language.
Also implements indexes.