0. Intro Flashcards

Question

What are some important differences between different storage engines in MySQL?

Answer 1

MySQL offers several storage engines with distinct characteristics: MyISAM vs InnoDB: InnoDB supports transactions while MyISAM doesn't MyISAM uses full table locks InnoDB is the modern default Memory (HEAP) Engine: Stores tables only in RAM Very fast but data is lost on server restart Good for temporary tables and quick lookups Archive Engine: Provides automatic data compression Slower to query but uses less disk space Good for storing logs and historical data NDB Engine: Used for clustering Supports network-based distribution Designed for high availability

Answer 2

Early Facebook provides an instructive example of partitioning. They initially partitioned their infrastructure by school (Harvard.thefacebook.com, MIT.thefacebook.com, etc.). This approach: Made initial scaling simpler Allowed for easy horizontal growth Created challenges when implementing cross-school features Required rethinking when implementing university-spanning features like messaging

Answer 3

Companies like Amazon implement redundancy through: Availability Zones: Separate physical locations within a region Independent power sources Independent network connectivity Designed to fail independently Regions: Geographically dispersed locations (Virginia, West Coast, Asia, Europe, etc.) Complete isolation between regions Allow for global distribution of services Help with disaster recovery and compliance

Answer 4

RAID configurations offer different balances of performance, redundancy, and capacity: Mechanical Drives: PATA/IDE (older): Parallel interface, obsolete SATA: Common in desktops, 7200 RPM typically SAS: 10,000 or 15,000 RPM, used in servers RAID Levels: RAID 0: Striping, no redundancy RAID 1: Mirroring, 50% capacity used for redundancy RAID 5: Single parity, can lose one drive RAID 6: Double parity, can lose two drives RAID 10: Combination of RAID 1 and 0 The lecture also discussed hot-swappable components in enterprise servers: Multiple power supplies that can be replaced while running Multiple hard drives that can be swapped out Automatic rebuild capabilities when new drives are inserted Looking at the transcript again, I notice we also haven't covered: Specific implementation details of load balancer vendors (Barracuda, Cisco, Citrix, F5) Detailed discussion of server redundancy within data centers Specific PHP acceleration tools and their configurations

Answer 5

Modern servers utilize multiple CPUs and cores in sophisticated ways. Typically, servers have at least 2-4 CPUs, with each CPU containing multiple cores. For example, a quad-core machine can literally perform four tasks simultaneously. This is a significant advancement from older single-core systems where the operating system had to rapidly switch between tasks to create the illusion of simultaneous processing. For web servers, this means being able to handle multiple requests truly in parallel. However, most users don't fully utilize multi-core capabilities for basic tasks like email or word processing, leading to some inefficiency in personal computing. The trend toward more cores has also enabled the rise of virtualization, allowing providers to effectively divide powerful servers into multiple VPSs.

Answer 6

VPS providers offer varying levels of service and typically start at higher price points than shared hosting (around $50/month or more versus $10/month for shared hosting). Notable providers include: Amazon EC2: Offers self-service, elastic computing with per-minute billing Digital Ocean: Known for developer-friendly interfaces Linode: Popular for Linux-based hosting Rackspace: Offers managed services These providers use hypervisor technology (from companies like VMware or Citrix) to divide physical servers into virtual machines, giving each customer their own isolated operating system instance.

Answer 7

FTP (File Transfer Protocol) sends all data, including usernames and passwords, in plain text, making it vulnerable to network sniffing attacks. SFTP (Secure File Transfer Protocol) encrypts all traffic, providing several security benefits: Encrypted authentication credentials Encrypted file transfers Protection against man-in-the-middle attacks Secure session handling While encrypting public content like images might seem unnecessary, the encryption of credentials is crucial. In modern web hosting, SFTP should be considered a minimum requirement for security.

Answer 8

Enterprise load balancers can be surprisingly expensive. The lecture mentioned a campus example of a "small" load balancer costing $220,000, which was considered relatively inexpensive for enterprise hardware. Companies like Citrix, F5, and Cisco sell load balancers that can cost hundreds of thousands of dollars. These prices typically include: Hardware redundancy Support contracts Advanced features like SSL acceleration Management software High-performance capabilities This high cost is one reason many organizations opt for software-based solutions like HAProxy.

Answer 9

Several vendors dominate the hardware load balancer market: Barracuda: Known for security features and mid-market solutions Cisco: Offers integrated networking and load balancing Citrix: Popular for application delivery and load balancing F5: Known for high-end enterprise solutions These vendors typically provide: Dedicated hardware appliances Advanced traffic management Security features Performance optimization Usually sold in pairs for redundancy Software alternatives like HAProxy offer similar features at much lower cost but require more technical expertise to configure and maintain.

Answer 10

In a secure infrastructure, specific ports must be carefully managed: Port 80: Standard HTTP traffic Port 443: HTTPS/SSL encrypted traffic Port 22: SSH for secure remote administration Port 3306: MySQL database connections Internal configuration should follow principle of least privilege: Web servers should only accept 80/443 from load balancers Database servers should only accept 3306 from web servers Management ports like 22 should be restricted to VPN or specific IP ranges All other ports should be blocked by default

Answer 11

Network architecture should be designed in layers: Public Layer: Load balancers accepting internet traffic Only ports 80/443 exposed DDoS protection implemented Application Layer: Web servers in private network No direct internet access Communication only with load balancers and database layer Database Layer: Most restricted zone No internet access Accepts connections only from application layer Separate network segment Management Layer: Restricted access through VPN Separate administrative network Strong authentication requirements

Answer 12

Database failover can be implemented through several strategies: Master-Master Configuration: Two active database servers Both accept writes Automatic synchronization between servers Applications can use either server If one fails, all traffic goes to surviving server Master-Slave with Promotion: One master, multiple slaves Automated monitoring of master health Automatic promotion of slave to master if needed Requires reconfiguration of application servers May involve brief downtime during transition Implementation Details: Heartbeat monitoring between servers Automated scripts for failover DNS or load balancer reconfiguration Data consistency checks Automated recovery procedures when failed server returns

Answer 13

Database partitioning involves dividing data across multiple servers based on specific criteria. Early Facebook implemented this by having separate servers for different universities (harvard.thefacebook.com, mit.thefacebook.com). This approach: Simplified initial scaling Made growth manageable by adding new servers for new schools Created challenges when implementing cross-school features Required architectural changes when implementing university-spanning features like messaging Demonstrated both benefits and limitations of vertical partitioning

Answer 14

Load Balancers are critical infrastructure components that: - Distribute incoming user requests across multiple application servers - Ensure even distribution of computational load Key benefits include: - Improved service availability - Enhanced system reliability - Increased request processing capacity - Prevents any single server from becoming a bottleneck Enable dynamic routing of user requests using various distribution strategies: - Round-robin - Least connections - Weighted algorithms Critical for horizontal scaling of web services.

Answer 15

The Golden Rule of Scalability states: - Every server must contain identical codebase - No user-specific data should be stored locally on: - Local disk - Server memory - Temporary storage Key design principles: - Treat each server as interchangeable - Separate application logic from data storage - Implement centralized state management Ensures consistent user experience across all servers and enables seamless horizontal scaling.

Answer 16

Effective Session Management requires: - Centralized External Data Store: - Accessible by all application servers - Stores user session information - Provides consistent state across servers Recommended Storage Solutions: - Redis (Preferred for performance) - External databases - Distributed caching systems Key Characteristics of Ideal Session Store: - High availability - Low latency - Horizontal scalability - Persistent storage Benefits include: - Eliminates server-specific session dependencies - Enables seamless user experience across server instances.

Answer 17

Deployment Best Practices: - Use Automated Deployment Tools: - Capistrano (Recommended in context) - Jenkins - Ansible - Docker/Kubernetes Deployment Principles: - Ensure consistent code across all servers - Minimize downtime during updates - Support rollback capabilities - Automate deployment processes Recommended Deployment Workflow: - Create a "super-clone" base image - Use Image as template for new instances - Perform initial code deployment on new instances - Validate and integrate new servers.

Answer 18

Machine Images (e.g., AWS AMI) are critical for: - Standardized Server Provisioning: - Capture entire server configuration - Include operating system - Preinstalled software - Basic configurations Benefits include: - Rapid server deployment - Consistent server environments - Simplified scaling - Reduced manual configuration.

Answer 19

Horizontal Scaling Architecture Principles: - Design for Statelessness: - Servers should be interchangeable - No persistent local state - External state management Infrastructure Components: - Load balancers - Centralized caching - Distributed databases - Stateless application servers Scaling Strategies: - Add more servers to handle increased load - Dynamic server provisioning - Automatic scaling based on traffic.

Answer 20

Performance Optimization Strategies: - Data Storage Considerations: - Prefer in-memory caches (Redis) - Minimize database round trips - Use efficient serialization Caching Approaches: - Distributed caching - Content delivery networks - Intelligent cache invalidation Latency Reduction Techniques: - Minimize network hops - Use geographically distributed servers - Implement efficient routing.

Answer 21

Security Strategies for Distributed Systems: - Server Isolation: - Minimize attack surface - Use network segmentation - Implement strict access controls Image and Deployment Security: - Regular security patches - Minimal installed software - Hardened base images.

Answer 22

Key Distributed System Challenges: - Complexity Factors: - Consistency management - Latency issues - Network partitions - Failure handling Technical Challenges: - Maintaining data integrity - Ensuring system responsiveness - Managing distributed state.

Answer 23

The two paths are: Path #1 - Traditional MySQL Scaling: - Keep MySQL and optimize it - Hire a Database Administrator (DBA) - Implement master-slave replication - Continuously upgrade master server with more RAM - Eventually requires sharding and denormalization. Path #2 - NoSQL Approach: - Denormalize from the beginning - Eliminate JOIN operations from queries.

Answer 24

Path #1 involves: Technical Aspects: - Maintains existing MySQL infrastructure - Implements master-slave replication - Reads are directed to slave servers - Writes go to master server. Resource Requirements: - Dedicated Database Administrator - Significant hardware investments. Challenges: - Becomes increasingly expensive - Solutions get more complex over time.

Answer 25

Core Strategy: - Early denormalization of data - Elimination of JOIN operations. Technical Implications: - All joins handled in application code. Timing Considerations: - Earlier implementation is better. - Easier with smaller datasets.

Answer 26

Caching becomes necessary because: Performance Factors: - Database requests still slow down over time. Solution Requirements: - Cache layer implementation. Implementation Considerations: - Cache strategy selection. - Data freshness requirements.

Answer 27

Early implementation of Path #2 offers: Technical Benefits: - Easier data migration. - Less code to modify. Business Advantages: - Lower migration costs. - Reduced downtime risk.

Answer 28

An in-memory cache serves as a crucial buffering layer between your application and data storage, functioning as a key-value store that prioritizes speed and efficiency. Systems like Memcached and Redis exemplify this approach. The preference for in-memory caching over file-based solutions stems from several key advantages: - Significantly simplifies server cloning and auto-scaling operations. - Performance benefits are substantial.

Answer 29

The Cached Database Queries pattern represents a traditional approach to caching where database query results are stored directly in the cache, using a hashed version of the query as the cache key. However, this pattern suffers from several significant limitations that make it less ideal for modern applications: - The primary challenge lies in managing cache expiration effectively.

Answer 30

The Cached Objects pattern represents a modern, object-oriented approach to caching that aligns naturally with contemporary programming practices. This pattern's strength lies in its ability to enable asynchronous processing and distributed computing.

Answer 31

One of the most critical data types to cache is user session information, which should never be stored directly in a database. Content-heavy elements like fully rendered blog articles represent another excellent candidate for object caching.

Answer 32

Redis distinguishes itself through its rich feature set, offering capabilities that extend beyond simple caching. Memcached, by contrast, excels in its simplicity and scalability.

Answer 33

In the bakery analogy, synchronous processing is like going to a bakery and being told to wait 2 hours for your bread to be baked. The asynchronous alternative offers two better approaches: either having bread pre-baked or taking special orders for later pickup.

Answer 34

Async Pattern #1 is like a bakery pre-baking bread at night to sell in the morning. The benefits are substantial: - Extremely improved website performance. - Enhanced scalability.

Answer 35

Async Pattern #2 is like a bakery taking special orders that can't be pre-made. The typical workflow consists of several steps: - User initiates a computationally intensive task on the website.

Answer 36

1. User initiates a computationally intensive task on the website. 2. Frontend sends the task to a job queue. 3. User receives immediate confirmation that their job is being processed. 4. Worker processes continuously monitor the queue for new jobs. 5. When a worker completes a job, it sends a completion signal. 6. Frontend periodically checks for completion signals. 7. User is notified when their task is complete.

Answer 37

Tasks that: - Can't be pre-computed. - Are unique to each user. - Take significant time to process. - Would otherwise block the user interface.

Answer 38

1. RabbitMQ: A comprehensive message queuing system with robust queue management and advanced routing capabilities. 2. ActiveMQ: A full-featured message broker supporting multiple protocols, good for Java-based applications. 3. Redis List: A simpler, lightweight solution using Redis data structures, suitable for less complex requirements.

Answer 39

1. A queue system to store tasks. 2. Workers to process the queued tasks. 3. A mechanism to signal task completion. 4. A way to monitor job status.

Answer 40

1. Performance Benefits: - Nearly infinite backend scalability. - Responsive frontend experience. - Reduced server load. - Better resource utilization. 2. User Experience Improvements: - No waiting for long operations. - Immediate feedback on task submission. - Ability to continue using the application. - Smoother overall interaction. 3. Technical Advantages: - Better system architecture. - Improved error handling. - Enhanced monitoring capabilities. - More flexible scaling options. 4. Business Benefits: - Higher user satisfaction. - Increased system capacity. - Better resource management. - Improved service reliability.

0. Intro Flashcards

(65 cards)