Systems Design Flashcards
What is ML systems design?
It takes a systems approach to MLOps, which means that the ML system is considered holistically to ensure that all the components (business requirements, the data stack, infrastructure, deployment, monitoring) and their stakeholders can work together to satisfy the specified objectives and requirements
What are the four general requirements for ML systems?
- Reliability
- Scalability
- Maintainability
- Adaptability
Why is reliability a requirement for ML systems?
The system should continue to perform the correct function at the desired level of performance even in the face of adversity (hardware or software faults, human error). Traditional software systems yield an error, but ML systems can fail silently
Why is scalability a requirement for ML systems?
ML systems can grow in multiple ways: complexity (more parameters), traffic volume (more predictions per given time), model count (more use cases). These are examples of resource scaling, but handling growth also includes artifact management
Why is maintainability a requirement for ML systems?
Structuring workloads and set up infrastructure such that all contributors can work using tools they want is important. Code should be documented. Code, data, and artifacts should be versioned. Models should be sufficiently reproducible.
Why is adaptability a requirement for ML systems?
To adapt to shifting data distributions and business requirements, the system should have some capacity for both discovering aspects for performance improvement and allowing updates without service interruption
What are some examples of nonprobability sampling?
- Convenience sampling
- Snowball sampling
- Judgment sampling
- Quota sampling
What is convenience sampling?
A nonprobability sampling method where samples of data are selected based on their availability. This sampling method is popular because it’s convenient
What is snowball sampling?
A nonprobability sampling method where future samples are selected based on existing samples. For example, to scrape legitimate Twitter accounts, you start with a small number of accounts, then you scrape all the accounts they follow, and so on
What is judgment sampling?
A nonprobability sampling method where the experts decide what samples to include
What is quota sampling?
A nonprobability sampling method where samples are selected based on quotas for certain slices of data without any randomization. For example, when the same number of samples are selected per age group for a survey, regardless of the actual age distribution
What are some examples of random sampling?
- Simple random sampling
- Stratified sampling
- Weighted sampling
- Reservoir sampling
- Importance sampling
What is simple random sampling?
In this form of random sampling, all samples in the population are given equal probabilities of being selected
What is a drawback of simple random sampling?
Rare categories of data might not appear in your selection
What is stratified sampling?
A random sampling method where the population is divided into groups that are relevant and sample from each group separately. For example, to sample 1% of data that has classes A and B, 1% can be sampled from each class separately. This way, both classes will be included in the selected, no matter how rare class A or B is