All Flashcards

Question

Parallel Binary-Merge Sort

Answer 1

Data is partitioned vertically Locally 1. each processor calculates entropy for its features Globally 1. each processor shares entropy and target class count 2. determine the best splitting attribute 3. share which records to include in the subsequent partitions iterative process: repeat the steps above

Answer 2

1. data is partitioned horizontally 2. each processor needs to exchange counts with other processors 3. determine the best splitting attribute for the root node 4. redistribute records by attribute to the processor assigned to this attribute 5. each processor shares entropy and information gain values 6. determine 2. splitting attribute 7. …

Answer 3

1. compute entropy for dataset 2. for every attribute/feature: 1. calculate entropy for all categorical values 2. take average information entropy for the current attributeS 3. calculate the gain for the current attribute 3. pick the highest gain attribute 4. repeat until the tree is complete

Answer 4

**Initialization:** 1. Divide the dataset among processors 2. Replicate the initial centroids to each processor **In each processor:** 1. Compute the distance of each local data point to the centroids 2. Construct local clusters 3. Maintain a sum and a count of each local cluster 4. At each iteration, the master process computes the new means and sends them to all processors 5. Repeat steps 1-4 until convergence

Answer 5

**Initialization** 1. Divide dataset D among P processors, and sort the data within each processor 2. Divide the initial centroids among processors 3. Allocate data points to the nearest cluster centroid **In each processor**: 1. For each cluster, calculate the distance between each local data point and the cluster centroid 2. For extreme low and high data points in each cluster: 1. If they are closer to centroid of other cluster of the same processor, then move these data points into the new cluster 2. If they are closer to centroid of other cluster of different processor, then move these data points into a new processor 3. Repeat steps 1 & 2 until convergence

Answer 6

The bounded stream will have a defined start and an end. […] we can ingest the entire data set before starting any computation

Answer 7

to handle events that arrive late to the application (e.g. event time =/= received time)

All Flashcards

(39 cards)