sdi 6 Flashcards

Question 1

Q

yelp scale estimation

Answer

A

assume 500M places, 100K qps, 20% growth in the number of places and QPS each year

Question 2

Q

yelp db schema

Answer

A

4 tables: locations, reviews, location photos, review photos

Question 3

Q

how many bytes are the following?
- ID
- name of a location
- lat/long
- description/review
- category/rating

Answer

A

8 bytes
256 bytes
8 bytes
512 bytes
1 byte

1 byte = 8 bits; 32 bits means we can have 2^32 unique IDs

Question 4

Q

yelp api definition

Answer

A

search(api_dev_key, search_terms, user_location, radius_filter,
maximum_results_to_return,
category_filter, sort, page_token)

category_filter (string): Optional category to filter search results, e.g., Restaurants, Shopping Centers
sort (number): Optional sort mode: Best matched (0 - default), Minimum distance (1), Highest rated (2).

Returns: (JSON)
A JSON containing information about a list of businesses matching the search query. Each result entry
will have the business name, address, category, rating, and thumbnail.

Question 5

Q

How will we build a QuadTree?

Answer

A

We will start with one node that will represent the whole world in
one grid. Since it will have more than 500 locations, we will break it down into four nodes and
distribute locations among them. We will keep repeating this process with each child node until there
are no nodes left with more than 500 locations.

Question 6

Q

How will we find the grid for a given location?

Answer

A

We will start with the root node and search
downward to find our required node/grid. At each step, we will see if the current node we are visiting
has children. If it has, we will move to the child node that contains our desired location and repeat this
process. If the node does not have any children, then that is our desired node.

Question 7

Q

How will we find neighboring grids of a given grid?

Answer

A

We can keep a pointer in each node to access its parent. We
can keep expanding our search for neighboring grids by going up through the parent pointers.

another option is to connect all leaf nodes with a doubly linked list.

Question 8

Q

yelp - search workflow

Answer

A

We will first find the node that contains the user’s location. If that node has enough desired places, we can return them to the user. If not, we will keep expanding to the neighboring nodes until either we find the required number of places or exhaust our search based on the maximum radius.

Question 9

Q

yelp - data sharding

Answer

A

based on region (problems w/ hot regions and dense regions)
better option: hash function will map each LocationID to a server where we will store that place.

Question 10

Q

yelp - basic system design considerations

Answer

A

we need to store and index each dataset (places, reviews, etc.). For users to query this massive database, the indexing should be read-efficient, since when searching for nearby places, users expect to see the results in real time. Given that the location of a place doesn’t change that often, we don’t need to worry about frequent updates of the data.

Question 11

Q

yelp - How would we insert a new Place into our system?

Answer

A

we need to insert it into the databases as well as in the QuadTree. If our tree resides on 1 server, it is easy to add a new Place, but if the QuadTree is distributed among different servers, first we need to find the grid/server of the new Place and then add it there.

Question 12

Q

How can we return most popular places within a given radius?

Answer

A

Let’s assume we keep track of the
overall popularity of each place. An aggregated number can represent this popularity in our system. We will store this number in the database as well as in the QuadTree. While searching for the
top 100 places within a given radius, we can ask each partition of the QuadTree to return the top 100 places with maximum popularity. Then the aggregator server can determine the top 100 places among all the places returned by different partitions.

Question 13

Q

uber backend - how do we handle drivers’ frequently changing locations?

Answer

A

Since all active drivers report their location every 3s, there’ll be a lot more updates than querying on our QuadTree. We can keep the latest position reported by each driver in a HT and update our QT less frequently. Let’s assume we guarantee that a driver’s current location will be reflected in the
QT within 15s.

Question 14

Q

How would “Request Ride” use case work?

Answer

A

The customer requests.
An Aggregator server takes the request and asks QuadTree servers to return nearby drivers.
Aggregator server collects all results and sorts them by ratings.
Aggregator sends a notification to the top (say 3) drivers; whoever accepts the request first will be assigned. If none of the 3 respond, Aggregator asks the next 3 on the list.
Once a driver accepts a request, the customer is notified.

Question 15

Q

uber backend - Do we need to repartition a grid as soon as it reaches the maximum limit?

Answer

A

We can have a cushion to let each grid grow a little bigger beyond the limit before we decide to partition it. Let’s say our grids can grow/shrink an extra 10% before we partition/merge them. This should decrease the load for a grid
partition or merge on high traffic grids.

Question 16

Q

ticketmaster Design Considerations

Answer

A

system should be scalable, highly available to keep up w/ traffic surges when popular movies come out
no partial ticket orders
to stop abuse, we can restrict users from booking more than 10 seats at a time

Question 17

Q

ticketmaster high level design

Answer

A

web servers will handle users’ sessions and application servers will handle ticket management, storing data in the databases and working w/ cache servers to process reservations.

Question 18

Q

ticketmaster - which APIs should we have?

Answer

A

SearchMovies, ReserveSeats

Question 19

Q

SearchMovies api def

Answer

A

SearchMovies(api_dev_key, keyword, city, lat_long, radius, start_datetime, end_datetime, postal_code, includeSpellcheck, results_per_page, sorting_order)

start_datetime (string): Filter movies with a starting datetime.
includeSpellcheck (Enum: “yes” or “no”):
sorting_order (string): Sorting order of the search result. Some allowable values : ‘name,asc’, ‘date,desc’, ‘distance,asc’, ‘date,name,desc’.

Returns
{
“MovieID”:
“ShowID”:
“Title”:
…
“Seats”: [
{ “Type”: (reg/prem)
“Price”:
“Status”: full/almost full/etc
}
{ … } ]
}

Question 20

Q

ReserveSeats api def

Answer

A

ReserveSeats(api_dev_key, session_id, movie_id, show_id, seats_to_reserve[])

session_id (string): User’s session ID to track this reservation. Once the reservation time expires,
user’s reservation on the server will be removed using this ID.

Returns: (JSON)
Returns the status of the reservation, which would be one of the following: 1) “Reservation Successful” 2) “Reservation Failed - Show is Full,” 3) “Reservation Failed - Retry, as other users are holding reserved seats”.

Question 21

Q

ticketmaster db design

Answer

A

standard tables for movies, users, cities.
4. payments (fk is BookingID)
5. cinemas (fk is CityID)
6. cinema halls (fk is CinemaID)
7. cinema seats, for the physical seat only. SeatNumber, Type. (fk is CinemaHallID)
8. shows (fks: CinemaHallID, MovieID)
9. bookings (fks: UserID, ShowID)
10. show seats (fks: ShowID, CinemaSeatID, BookingID)

Question 22

Q

ticketmaster - what are the 2 services we need?

Answer

A

ActiveReservationService: keep track of all active reservations (not yet booked) and remove expired reservations.
WaitingUserService: keep track of all the waiting user requests and, as soon as enough seats become available, it will notify the longest waiting user to choose the seats

Question 23

Q

ActiveReservationsService

Answer

A

We can keep all the reservations of a ‘show’ in the DB and in memory in a HT:
key: ShowID
val: Linked HashMap; { ‘BookingID’ : creation timestamp }
LHM allows us to jump to any reservation to remove it when the booking is complete. Its head points to the oldest reservation, so that the reservation can expire when the timeout is reached.

In the db, we store the reservation in the ‘Booking’ table and the expiry time will be in the Timestamp column. The ‘Status’ field will have a value of ‘Reserved (1)’ and, when booking is complete, ‘Status’ changes to ‘Booked (2)’ and the reservation record is removed from the above LHM.

Question 24

Q

WaitingUsersService

Answer

A

We can keep all the waiting users of a ‘show’ in the DB and in memory in a HT:
key: ShowID
val: Linked HashMap; { UserID: time they started waiting }
- Clients can use Long Polling to keep themselves updated on their reservation status.

Question 25

Q

ticketmaster - how to handle reservation expiration?

Answer

A

ActiveReservationsService keeps track of expiry times. Client will be shown a timer for expiry time, which could be a little out of sync with the server, so we can add a 5s buffer on the server. This ensures that the client never times out after the server, preventing a successful purchase.

Question 26

Q

ticketmaster - concurrency

Answer

A

2 users must not be able to book the same seat.
- If we are using an SQL server we can use Transaction Isolation Levels to lock the rows before we can update them.
One thing to note here; within a transaction, if we read rows, we get a write lock on them so that they can’t be updated by anyone else.
Once the above database transaction is successful, we can start tracking the reservation in
ActiveReservationService.

Question 27

Q

Serializable

Answer

A

‘Serializable’ is the highest Transaction Isolation Level (sql) and guarantees safety from Dirty (reading uncommited data), Nonrepeatable (reading a row twice but getting different data each time), and Phantoms (retrieves rows using the same criteria and new rows are returned).

Question 28

Q

What happens when ActiveReservationsService or WaitingUsersService crashes?

Answer

A

have a master-slave configuration so that, when the master crashes, the slave can take over. Similarly, we’ll have a master-slave setup for databases to make them fault tolerant.