8 GRIDS Flashcards by Kaman Hung

What is the main focus of the chapter?

The chapter focuses on handling multidimensional data and introduces the nearest-neighbor search as a use case.

How well did you know this?

Not at all

Perfectly

What is nearest-neighbor search?

Nearest-neighbor search is finding the data point closest to a given search target.

How well did you know this?

Not at all

Perfectly

Define nearest-neighbor search formally.

Given a set of N data points X = { x1, x2, …, xN }, a target value x’, and a distance function dist(x, y), find the point xi in X that minimizes dist(x’, xi).

How well did you know this?

Not at all

Perfectly

How does nearest-neighbor search differ from binary search?

Binary search tests for an exact match, while nearest-neighbor search finds the closest match.

How well did you know this?

Not at all

Perfectly

What types of problems can nearest-neighbor search address?

It can address spatial and non-spatial problems, such as finding nearby coffee shops or similar historical temperatures.

How well did you know this?

Not at all

Perfectly

What is the linear scan algorithm used for?

It is a baseline algorithm for nearest-neighbor search that iteratively checks each data point.

How well did you know this?

Not at all

Perfectly

What distance function is used in the example for nearest-neighbor search?

The absolute distance function: dist(x, y) = |x – y|.

How well did you know this?

Not at all

Perfectly

What does the linear scan algorithm do?

It computes the distance from the target to each candidate point and updates the closest candidate.

How well did you know this?

Not at all

Perfectly

What is the primary limitation of the linear scan algorithm?

It becomes inefficient as the number of data points increases.

How well did you know this?

Not at all

Perfectly

What is the structure of a grid used for?

Grids are used for storing two-dimensional data, consisting of fixed bins or cells.

How well did you know this?

Not at all

Perfectly

How are grid cells defined?

Grid cells are defined by spatial bounds, with multiple points potentially falling within the same cell.

How well did you know this?

Not at all

Perfectly

What is an example of a data structure for two-dimensional spatial points?

Point { Float: x, Float: y }.

How well did you know this?

Not at all

Perfectly

What information must be tracked in a grid’s top-level data structure?

Number of bins along the x- and y-dimensions
Spatial bounds (x_start, x_end, y_start, y_end)

How well did you know this?

Not at all

Perfectly

What is the formula to map a point’s spatial coordinates to a grid bin?

xbin = Floor((x – x_start) / x_bin_width) and ybin = Floor((y – y_start) / y_bin_width).

How well did you know this?

Not at all

Perfectly

True or False: A grid can store only one value per bin.

False.

How well did you know this?

Not at all

Perfectly

What does the variable ‘x_bin_width’ represent?

The width of the bins along the x-dimension.

How well did you know this?

Not at all

Perfectly

What advantage does using a grid provide in nearest-neighbor searches?

It allows for limiting searches based on spatial structure.

How well did you know this?

Not at all

Perfectly

Fill in the blank: Nearest-neighbor search is most interesting when dealing with _______.

multidimensional data.

How well did you know this?

Not at all

Perfectly

What are some example storage formats for two-dimensional data points?

Ordered tuple (x, y)
Small, fixed-size array [x, y]
Composite data structure

How well did you know this?

Not at all

Perfectly

What is the purpose of the distance function in nearest-neighbor search?

To define how to measure the ‘closeness’ between the search target and other values.

How well did you know this?

Not at all

Perfectly

What is the typical structure of a grid for two-dimensional data?

Grid { Integer: num_x_bins, Integer: num_y_bins, Float: x_start, Float: x_end, Float: x_bin_width, Float: y_start, Float: y_end, Float: y_bin_width, Matrix of GridPoints: bins }.

How well did you know this?

Not at all

Perfectly

What is the typical data structure for two-dimensional data?

Grid { Integer: num_x_bins, Integer: num_y_bins, Float: x_start, Float: x_end, Float: x_bin_width, Float: y_start, Float: y_end, Float: y_bin_width, Matrix of GridPoints: bins }

How well did you know this?

Not at all

Perfectly

How is a point’s bin determined in a fixed-size grid?

xbin = Floor((x – x_start) / x_bin_width), ybin = Floor((y – y_start) / y_bin_width)

How well did you know this?

Not at all

Perfectly

What significant change occurs with spatial partitioning in data storage?

Data can no longer be stored as fixed bins; each square may contain an arbitrary number of data points, requiring its own internal data structure.

How well did you know this?

Not at all

Perfectly

What data structure is commonly used to store points within a bin?

Linked list

What is the structure of an individual point in the grid?

GridPoint { Float: x, Float: y, GridPoint: next }

What is the high-level structure of a grid at the time of creation?

The spatial bounds and number of bins along each dimension are fixed.

What is the process to insert a point into a grid?

Find the correct bin and prepend the new point to the beginning of the linked list for that bin.

What does the function GridInsert check before adding a point?

It checks if the point is within the grid.

What does the approx_equal function do?

It checks if two points are within a threshold distance in both dimensions.

What are the steps to delete a point from a grid?

Find the correct bin, traverse the linked list to find a match, and remove the match by splicing it out.

How does the deletion function handle floating-point precision?

It uses the approx_equal function to check for equality due to limited precision.

What is the purpose of pruning bins in nearest-neighbor searches?

To exclude bins that are too far away from the current best distance, avoiding unnecessary computations.

What mathematical operation is used to compute Euclidean distance?

sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2))

What is the purpose of the MinDistToBin function?

To compute the closest distance from a target point to a given bin.

What does a linear scan over bins involve?

Iterating through all the bins and applying the minimum distance test before checking contents.

Fill in the blank: The function to insert a new point into a grid is called _______.

GridInsert

True or False: Each grid square can only contain a fixed number of data points.

False

What is the return value of the GridInsert function if a point cannot be inserted?

False

What happens if a point falls outside the range covered by the spatial data structure?

The function may return a Boolean to indicate whether the point could be inserted or may throw an exception.

What is the function of the GridDelete method?

To delete a point from the grid.

What condition must be checked before proceeding with deletion from a bin?

Check if the bin is empty.

How does the deletion function identify which point to delete?

It uses identifying information like name or ID number to determine which point to delete.

Fill in the blank: The code for computing the minimum distance from a point to a bin is encapsulated in the _______ function.

MinDistToBin

What does the return value of the MinDistToBin function indicate for invalid bins?

It returns Inf to indicate an invalid bin.

What is the purpose of the best_candidate variable in the GridLinearScanNN function?

To store the nearest neighbor found during the search.

What is the initial value of best_dist in the GridLinearScanNN function?

Inf

What is the purpose of the nearest-neighbor search algorithm described?

To find the closest point to a target point in a grid structure ## Footnote The algorithm uses a linear scan with pruning tests to efficiently search through bins.

What is the initial value of best_dist in the nearest-neighbor search?

Infinity ## Footnote This indicates that no best point has been found so far.

What condition must be met for the algorithm to check individual points in a bin?

The minimum distance to the bin must be less than the best distance found so far ## Footnote This allows the algorithm to prune out entire bins that cannot contain a better neighbor.

What happens if the grid is sparsely populated during the search?

It may be more costly to check each of the bins than checking each point individually.

What does the GridLinearScanNN function allow for in terms of target points?

It works with target points inside or outside the grid’s spatial bounds.

Describe the expanding search method used in the improved scanning method.

It prioritizes bins by proximity to the target point and halts when remaining bins are further than the nearest neighbor found ## Footnote This method reduces wasted computation by avoiding distant bins.

What is the visual analogy used to describe the expanding search?

Searching for car keys in a house, starting from the area where they should be and spiraling outwards.

What is the goal of checking neighboring bins after finding an initial candidate nearest neighbor?

To determine if there is a closer point in adjacent bins.

True or False: The initial candidate nearest neighbor found in the same bin as the target point is guaranteed to be the closest point.

False ## Footnote There could be closer points in adjacent bins.

What pattern does the simplified expanding search use to move outward?

A diamond-shaped pattern based on Manhattan distance.

What does the GridCheckBin function do?

It checks whether any points within a specified bin are closer to the target point than a given threshold.

What happens if none of the bin's points are closer than the threshold in GridCheckBin?

The function returns null.

What does the explore variable indicate in the GridSearchExpanding function?

It indicates whether the next iteration may include a valid bin to explore.

What is the significance of the grid's bin size?

It impacts the efficiency of the search; larger bins may require checking more points.

Fill in the blank: The code for performing the expanding search uses a WHILE loop to explore outward from the initial bin by increasing amounts of _______.

[steps]

What is the termination condition for the outer WHILE loop in the GridSearchExpanding function?

When every bin visited could not contain a closer neighbor or lies off the grid.

What does the MinDistToBin function check?

It checks whether the bin indices are valid and determines the distance to that bin.

How does the algorithm handle target points that lie outside the grid?

It maps them to their closest bin in the grid.

What is the impact of grid size on search efficiency?

The size of the grid’s bins affects the number of points checked per bin and the overall efficiency of the search.

What happens when we partition the grid into finer bins?

We may need to search more individual bins before finding the first candidate nearest neighbor, increasing the cost of checking bins.

True or False: A fine grid always improves search efficiency.

False

What are the tradeoffs of shrinking the size of the grid's bins?

* Increased number of individual bins to search * Higher chances of encountering empty bins * Increased cost of checking bins

What is an example of an ineffective fine grid?

A grid partitioned into 1 m by 1 m squares may contain mostly empty bins.

What are the implications of using coarse grid sizes?

Coarse grids, like 5 km by 5 km squares, may bucket entire cities together, leading to many empty bins.

What factors influence the optimal grid size?

* Number of points * Distribution of points

What is the challenge of applying grid-based techniques to higher-dimensional data?

The space required to store data structures increases exponentially with dimensions, requiring K^D individual bins.

What is the main drawback of using grids for higher-dimensional problems?

As the number of grid bins increases, the percentage of empty bins also likely increases, leading to wasted work.

What technique will be introduced in the next chapter to handle higher-dimensional data?

The k-d tree.

How can nearest-neighbor search be applied beyond spatial data?

It can be used to find similar items based on specific attributes, such as selecting a coffee brand based on strength and acidity.

What is a common distance measure used for non-spatial data?

Weighted Euclidean distance.

Fill in the blank: Nearest-neighbor search allows us to find points that are '______' to some target value.

[close]

Why is it important to consider all dimensions of interest in distance computation?

To ensure that the search accounts for all relevant factors, avoiding unsuitable results.

What is a key difference in data structure between grids and arrays?

Grids use linked-list or other structures to store multiple values per bin, unlike arrays.

What common task is illustrated by choosing the right number of bins?

Tuning our data structure for the specific problem at hand.

What will the next chapter combine to improve search efficiency?

The adaptive properties of trees with the spatial properties of grids.

8 GRIDS Flashcards

(82 cards)