8 GRIDS Flashcards
What is the main focus of the chapter?
The chapter focuses on handling multidimensional data and introduces the nearest-neighbor search as a use case.
What is nearest-neighbor search?
Nearest-neighbor search is finding the data point closest to a given search target.
Define nearest-neighbor search formally.
Given a set of N data points X = { x1, x2, …, xN }, a target value x’, and a distance function dist(x, y), find the point xi in X that minimizes dist(x’, xi).
How does nearest-neighbor search differ from binary search?
Binary search tests for an exact match, while nearest-neighbor search finds the closest match.
What types of problems can nearest-neighbor search address?
It can address spatial and non-spatial problems, such as finding nearby coffee shops or similar historical temperatures.
What is the linear scan algorithm used for?
It is a baseline algorithm for nearest-neighbor search that iteratively checks each data point.
What distance function is used in the example for nearest-neighbor search?
The absolute distance function: dist(x, y) = |x – y|.
What does the linear scan algorithm do?
It computes the distance from the target to each candidate point and updates the closest candidate.
What is the primary limitation of the linear scan algorithm?
It becomes inefficient as the number of data points increases.
What is the structure of a grid used for?
Grids are used for storing two-dimensional data, consisting of fixed bins or cells.
How are grid cells defined?
Grid cells are defined by spatial bounds, with multiple points potentially falling within the same cell.
What is an example of a data structure for two-dimensional spatial points?
Point { Float: x, Float: y }.
What information must be tracked in a grid’s top-level data structure?
- Number of bins along the x- and y-dimensions
- Spatial bounds (x_start, x_end, y_start, y_end)
What is the formula to map a point’s spatial coordinates to a grid bin?
xbin = Floor((x – x_start) / x_bin_width) and ybin = Floor((y – y_start) / y_bin_width).
True or False: A grid can store only one value per bin.
False.
What does the variable ‘x_bin_width’ represent?
The width of the bins along the x-dimension.
What advantage does using a grid provide in nearest-neighbor searches?
It allows for limiting searches based on spatial structure.
Fill in the blank: Nearest-neighbor search is most interesting when dealing with _______.
multidimensional data.
What are some example storage formats for two-dimensional data points?
- Ordered tuple (x, y)
- Small, fixed-size array [x, y]
- Composite data structure
What is the purpose of the distance function in nearest-neighbor search?
To define how to measure the ‘closeness’ between the search target and other values.
What is the typical structure of a grid for two-dimensional data?
Grid { Integer: num_x_bins, Integer: num_y_bins, Float: x_start, Float: x_end, Float: x_bin_width, Float: y_start, Float: y_end, Float: y_bin_width, Matrix of GridPoints: bins }.
What is the typical data structure for two-dimensional data?
Grid { Integer: num_x_bins, Integer: num_y_bins, Float: x_start, Float: x_end, Float: x_bin_width, Float: y_start, Float: y_end, Float: y_bin_width, Matrix of GridPoints: bins }
How is a point’s bin determined in a fixed-size grid?
xbin = Floor((x – x_start) / x_bin_width), ybin = Floor((y – y_start) / y_bin_width)
What significant change occurs with spatial partitioning in data storage?
Data can no longer be stored as fixed bins; each square may contain an arbitrary number of data points, requiring its own internal data structure.