sdi 6 Flashcards

1
Q

yelp scale estimation

A

assume 500M places, 100K qps, 20% growth in the number of places and QPS each year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

yelp db schema

A

4 tables: locations, reviews, location photos, review photos

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how many bytes are the following?
- ID
- name of a location
- lat/long
- description/review
- category/rating

A
  • 8 bytes
  • 256 bytes
  • 8 bytes
  • 512 bytes
  • 1 byte

1 byte = 8 bits; 32 bits means we can have 2^32 unique IDs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

yelp api definition

A

search(api_dev_key, search_terms, user_location, radius_filter,
maximum_results_to_return,
category_filter, sort, page_token)

category_filter (string): Optional category to filter search results, e.g., Restaurants, Shopping Centers
sort (number): Optional sort mode: Best matched (0 - default), Minimum distance (1), Highest rated (2).

Returns: (JSON)
A JSON containing information about a list of businesses matching the search query. Each result entry
will have the business name, address, category, rating, and thumbnail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How will we build a QuadTree?

A

We will start with one node that will represent the whole world in
one grid. Since it will have more than 500 locations, we will break it down into four nodes and
distribute locations among them. We will keep repeating this process with each child node until there
are no nodes left with more than 500 locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How will we find the grid for a given location?

A

We will start with the root node and search
downward to find our required node/grid. At each step, we will see if the current node we are visiting
has children. If it has, we will move to the child node that contains our desired location and repeat this
process. If the node does not have any children, then that is our desired node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How will we find neighboring grids of a given grid?

A

We can keep a pointer in each node to access its parent. We
can keep expanding our search for neighboring grids by going up through the parent pointers.

another option is to connect all leaf nodes with a doubly linked list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

yelp - search workflow

A

We will first find the node that contains the user’s location. If that node has enough desired places, we can return them to the user. If not, we will keep expanding to the neighboring nodes until either we find the required number of places or exhaust our search based on the maximum radius.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

yelp - data sharding

A
  • based on region (problems w/ hot regions and dense regions)
  • better option: hash function will map each LocationID to a server where we will store that place.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

yelp - basic system design considerations

A

we need to store and index each dataset (places, reviews, etc.). For users to query this massive database, the indexing should be read-efficient, since when searching for nearby places, users expect to see the results in real time. Given that the location of a place doesn’t change that often, we don’t need to worry about frequent updates of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

yelp - How would we insert a new Place into our system?

A

we need to insert it into the databases as well as in the QuadTree. If our tree resides on 1 server, it is easy to add a new Place, but if the QuadTree is distributed among different servers, first we need to find the grid/server of the new Place and then add it there.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we return most popular places within a given radius?

A

Let’s assume we keep track of the
overall popularity of each place. An aggregated number can represent this popularity in our system. We will store this number in the database as well as in the QuadTree. While searching for the
top 100 places within a given radius, we can ask each partition of the QuadTree to return the top 100 places with maximum popularity. Then the aggregator server can determine the top 100 places among all the places returned by different partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

uber backend - how do we handle drivers’ frequently changing locations?

A

Since all active drivers report their location every 3s, there’ll be a lot more updates than querying on our QuadTree. We can keep the latest position reported by each driver in a HT and update our QT less frequently. Let’s assume we guarantee that a driver’s current location will be reflected in the
QT within 15s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How would “Request Ride” use case work?

A
  1. The customer requests.
  2. An Aggregator server takes the request and asks QuadTree servers to return nearby drivers.
  3. Aggregator server collects all results and sorts them by ratings.
  4. Aggregator sends a notification to the top (say 3) drivers; whoever accepts the request first will be assigned. If none of the 3 respond, Aggregator asks the next 3 on the list.
  5. Once a driver accepts a request, the customer is notified.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

uber backend - Do we need to repartition a grid as soon as it reaches the maximum limit?

A

We can have a cushion to let each grid grow a little bigger beyond the limit before we decide to partition it. Let’s say our grids can grow/shrink an extra 10% before we partition/merge them. This should decrease the load for a grid
partition or merge on high traffic grids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ticketmaster Design Considerations

A
  • system should be scalable, highly available to keep up w/ traffic surges when popular movies come out
  • no partial ticket orders
  • to stop abuse, we can restrict users from booking more than 10 seats at a time
17
Q

ticketmaster high level design

A

web servers will handle users’ sessions and application servers will handle ticket management, storing data in the databases and working w/ cache servers to process reservations.

18
Q

ticketmaster - which APIs should we have?

A

SearchMovies, ReserveSeats

19
Q

SearchMovies api def

A

SearchMovies(api_dev_key, keyword, city, lat_long, radius, start_datetime, end_datetime, postal_code, includeSpellcheck, results_per_page, sorting_order)

start_datetime (string): Filter movies with a starting datetime.
includeSpellcheck (Enum: “yes” or “no”):
sorting_order (string): Sorting order of the search result. Some allowable values : ‘name,asc’, ‘date,desc’, ‘distance,asc’, ‘date,name,desc’.

Returns
{
“MovieID”:
“ShowID”:
“Title”:

“Seats”: [
{ “Type”: (reg/prem)
“Price”:
“Status”: full/almost full/etc
}
{ … } ]
}

19
Q

ReserveSeats api def

A

ReserveSeats(api_dev_key, session_id, movie_id, show_id, seats_to_reserve[])

session_id (string): User’s session ID to track this reservation. Once the reservation time expires,
user’s reservation on the server will be removed using this ID.

Returns: (JSON)
Returns the status of the reservation, which would be one of the following: 1) “Reservation Successful” 2) “Reservation Failed - Show is Full,” 3) “Reservation Failed - Retry, as other users are holding reserved seats”.

20
Q

ticketmaster db design

A

standard tables for movies, users, cities.
4. payments (fk is BookingID)
5. cinemas (fk is CityID)
6. cinema halls (fk is CinemaID)
7. cinema seats, for the physical seat only. SeatNumber, Type. (fk is CinemaHallID)
8. shows (fks: CinemaHallID, MovieID)
9. bookings (fks: UserID, ShowID)
10. show seats (fks: ShowID, CinemaSeatID, BookingID)

21
Q

ticketmaster - what are the 2 services we need?

A

ActiveReservationService: keep track of all active reservations (not yet booked) and remove expired reservations.
WaitingUserService: keep track of all the waiting user requests and, as soon as enough seats become available, it will notify the longest waiting user to choose the seats

22
Q

ActiveReservationsService

A

We can keep all the reservations of a ‘show’ in the DB and in memory in a HT:
key: ShowID
val: Linked HashMap; { ‘BookingID’ : creation timestamp }
LHM allows us to jump to any reservation to remove it when the booking is complete. Its head points to the oldest reservation, so that the reservation can expire when the timeout is reached.

In the db, we store the reservation in the ‘Booking’ table and the expiry time will be in the Timestamp column. The ‘Status’ field will have a value of ‘Reserved (1)’ and, when booking is complete, ‘Status’ changes to ‘Booked (2)’ and the reservation record is removed from the above LHM.

23
Q

WaitingUsersService

A

We can keep all the waiting users of a ‘show’ in the DB and in memory in a HT:
key: ShowID
val: Linked HashMap; { UserID: time they started waiting }
- Clients can use Long Polling to keep themselves updated on their reservation status.

24
Q

ticketmaster - how to handle reservation expiration?

A

ActiveReservationsService keeps track of expiry times. Client will be shown a timer for expiry time, which could be a little out of sync with the server, so we can add a 5s buffer on the server. This ensures that the client never times out after the server, preventing a successful purchase.

25
Q

ticketmaster - concurrency

A

2 users must not be able to book the same seat.
- If we are using an SQL server we can use Transaction Isolation Levels to lock the rows before we can update them.
One thing to note here; within a transaction, if we read rows, we get a write lock on them so that they can’t be updated by anyone else.
Once the above database transaction is successful, we can start tracking the reservation in
ActiveReservationService.

26
Q

Serializable

A

‘Serializable’ is the highest Transaction Isolation Level (sql) and guarantees safety from Dirty (reading uncommited data), Nonrepeatable (reading a row twice but getting different data each time), and Phantoms (retrieves rows using the same criteria and new rows are returned).

27
Q

What happens when ActiveReservationsService or WaitingUsersService crashes?

A

have a master-slave configuration so that, when the master crashes, the slave can take over. Similarly, we’ll have a master-slave setup for databases to make them fault tolerant.