Index Methods for Spatial Data Flashcards
Why do we use indexes in databases? Use examples.
An ordered data set is useful for searching and operations, but it can only be ordered for one field. A binary search can only be performed on one field and all other fields require a linear search.
For example, a data set can be indexed using first names. It can also be indexed using last names. Searches can be performed quickly using either the first name or the last name.
Explain the structure of B-tree and R-tree.
B-tree
Consists of a root node, internal nodes and leaf nodes. Each node contains pointers of two types: Node pointers (point to another node) and data pointers (point to data in the data set).
An n-B-tree can have at most n node pointers.
Internal nodes can have a pre-determined limit of child nodes. When data is inserted or removed from the data structure, the number of child nodes varies within a node. In order to maintain the predetermined limit, internal nodes can be joined or split.
R-tree
- An extended B-tree to store rectangular data.
- Each node in the R-tree is associated with a rectangle that includes all the rectangles of its descendents.
- R-tree enables fast access to spatial data. The search starts from the root and continues downward to the leaf nodes where data is stored. Only the branches containing data are investigated.
Explain and give examples of space-filling curves. Explain why they are useful.
Space-filling curve is a curve with a range that contains the entire 2-dimensional unit square. They include:
Hilbert curve
Moore curve
Gosper curve
Ideally, an index should have the property that “close objects should have a similar index value”. This should be realised by a continuous bijective function “f” that maps a 2D connected dataset R2 onto a 1D connected set R.
Theoretically, no such continuous bijective functions exists, but space filling curves have suitable properties to be used as index functions.
Explain the working principles of Morton Key.
- Convert the numbers to binary (base 2)
- Interleave the bits. Take the last digit in “y” and place it as the last digit for the Morton Key (mk)
- Take the last digit in “x” and place it as the second to last digit for the Morton Key
- Repeat steps 2 and 3 until all units are placed.
- If the binary numbers x and y are not of the same length, it is okay to add zeros at the start of the binary number. Other strategies for dealing with different binary lengths are possible, such as simply skipping over the shorter binary after it has been exhausted (shown below).
Calculate the Morton Key for (20, 9)
Convert (20, 9) to binary:
20 = 10100
9 = 1001
Interleaving results in (101100001)
101100001
= 28 + 26 + 25 + 20
= 256 + 64 + 32 + 1
= 353
What is a kD tree?
In a 2-dimensional kD-tree, each point in the data set is a node. Each point divides the space into two parts by a horizontal or vertical line.
The first point A is the root node and divides the planar space into two parts by a vertical line through the point.
The next point B is then placed in the node to the left or right of the root node, depending on its position in the space. It divides the part again using a horizontal line into two new parts.
The procedure is then repeated for each joint.
Explain some of the challenges of search engine indexing.
• Parallelism – collision between competing tasks, such as updating of the index while responding to search queries
• Compression – efficiency for large-scale searching
• Natural language processing – tokenization in
multilingual context
• Search neutrality – returning the most relevant results without manipulation. This is an ethical challenge!