E4 Flashcards
Some of the challenges of creating big data applications
- Scaling problems
- Fault-tolerance issues
- Data corruption issues
Best approach to scaling problems?
Use multiple database servers and spread the table across all servers.
Each server will have a subset of the data.
Scaling using multiple databases. How?
- Deploy more database servers
- Use a different hash function
- Redistribute the users according to the new hash function
- Change the code of our application
Fault-tolerance issues
When we have many databases it starts to become frequent that the hard drive in one of the databases goes bad
- We need to deal with having one of the databases down
- We need to add backups to each of the databases
Our system is not resilient to hardware errors
Data corruption issues
At some point we deploy code with a bug: instead of incrementing each video viewership by one unit, our code increments by two units. We notice the mistake only 24 hours later.
Now we have corrupted data: every video watched in the past 24 hours have their viewership inflated. How do we solve this?
Our system is not resilient to human errors
The desired properties of Big Data systems are related both to
Complexity and scalability
Complexity
generally used to characterize something with many parts where those parts interact with each other in multiple ways
Scalability
ability to maintain performance in the face of increasing data or load by adding resources to the system
A big data system must
- perform well
- be resource-efficient
- easy to reason about
Desired properties of a Big Data system
- Robustness and fault tolerance
- Low latency
- Minimal maintenance
- Ad hoc queries