0. Intro Flashcards
What is vertical scaling and what are its limitations?
A: Vertical scaling refers to the process of adding more resources to a single server or machine to handle increased load. This typically involves upgrading the existing hardware components such as adding more RAM, installing a faster or more powerful processor, increasing disk space, or upgrading to faster storage solutions like SAS drives or SSDs.
The main challenge with vertical scaling is that you eventually hit a ceiling that can’t be overcome simply by adding more resources. This ceiling exists because there’s a physical limit to how much you can upgrade a single machine - you can only buy hardware that currently exists in the market, and technology hasn’t advanced enough to create machines with unlimited resources. For example, even if you have unlimited budget, you can only add so many CPUs or cores to a single server before you hit the limitations of current technology.
Additionally, vertical scaling often becomes financially inefficient at a certain point. The cost of high-end hardware components typically increases exponentially rather than linearly, meaning you pay significantly more for each incremental improvement in performance. This approach also doesn’t address the fundamental risk of having all your resources concentrated in a single machine.
What is horizontal scaling and how does it differ from vertical scaling?
Horizontal scaling involves adding more machines to your infrastructure rather than upgrading a single machine. Instead of buying one extremely powerful (and expensive) server, you distribute your workload across multiple less expensive servers. This approach fundamentally changes how you architect your system, as you need to implement mechanisms to coordinate between these multiple machines.
The key advantage of horizontal scaling is that there’s no theoretical limit to how far you can scale - you can keep adding more machines as needed. It’s often more cost-effective because you can use commodity hardware rather than specialized high-end equipment. This approach also provides better redundancy since your system isn’t dependent on a single machine. If one server fails, the others can continue handling requests.
However, horizontal scaling introduces new complexities. You need to implement load balancing to distribute traffic across your servers effectively. You must handle session management across multiple servers, ensure data consistency, and manage database replication. These challenges require more sophisticated architectural solutions, but they ultimately provide a more robust and scalable system than vertical scaling alone.
What are the different approaches to load balancing and their pros/cons?
Load balancing is essential for distributing traffic across multiple servers, and there are several approaches, each with distinct advantages and challenges.
Round Robin DNS is one of the simplest approaches, where your DNS server returns different IP addresses for each request in rotation. While this is easy to implement and doesn’t require special hardware, it has significant limitations due to DNS caching. Clients might cache the DNS response, meaning they’ll continue hitting the same server even if it’s overloaded.
Hardware load balancers are purpose-built devices that can make intelligent routing decisions based on various factors like server load, response time, and current connections.
These devices often provide excellent performance and reliability but come at a significant cost - enterprise-grade load balancers can cost upwards of $100,000.
They also need to be configured in pairs for redundancy to avoid becoming a single point of failure.
Software load balancers, like HAProxy or Linux Virtual Server, provide a more cost-effective alternative. These solutions can run on commodity hardware and offer significant flexibility in configuration.
They can implement various algorithms for load distribution and often provide features like health checking and SSL termination. While they require more technical expertise to configure and maintain, they’re often the most practical solution for many organizations.
Application-aware load balancing takes things a step further by making routing decisions based on the content or type of request.
For example, you might route image requests to specific servers optimized for serving static content while sending dynamic requests to application servers.
This provides better resource utilization but requires more complex configuration and maintenance.
What are the key considerations when choosing a web hosting company?
When selecting a web hosting company, several critical factors need to be evaluated.
First, accessibility is crucial - you need to ensure that your hosting provider’s IP ranges aren’t blocked in regions where your users are located. This is particularly important if you’re serving users in countries or organizations that might restrict access to certain providers.
SFTP support is another essential feature. Unlike regular FTP, SFTP (Secure File Transfer Protocol) encrypts all traffic, including usernames and passwords. While encrypting regular content like images or public HTML files might not seem crucial, protecting authentication credentials is vital for security. Without encryption, anyone monitoring network traffic could potentially capture your login information.
You should be wary of hosting companies offering “unlimited everything” for very low prices. These offers typically involve shared hosting, where hundreds of customers share the same physical server. While this might work for small websites, it’s not suitable for applications that need reliable resources or plan to scale. The host is typically banking on the fact that most customers won’t use many resources, but this can lead to performance issues if you need consistent resource availability.
How can you improve performance?
How do you handle session state in a distributed environment?
Managing session state across multiple servers presents several challenges and possible solutions.
One approach is using
1. shared storage, where all web servers write their session data to a common file server or database. This could be implemented using technologies like fiber channel, iSCSI, or NFS, but introduces a single point of failure unless** properly redundant**.
Another solution is
2. Sticky sessions through load balancer cookies. The load balancer can insert a cookie containing either the server identifier or a random number mapped to a specific server. This ensures subsequent requests from the same user go to the same backend server. While this doesn’t require shared storage, it can create uneven load distribution if certain users are more active than others.
- Database-based session storage is another option, where session data is stored in a database like MySQL rather than files. This can be combined with database replication for redundancy and can work well with proper caching mechanisms like Memcached to reduce database load.
What are the different types of database replication architectures and their use cases?
In the context of MySQL replication, there are several common architectures. The Master-Slave configuration involves one master database that handles all writes (INSERT, UPDATE, DELETE operations) and one or more slave databases that handle reads (SELECT queries). This setup works particularly well for read-heavy workloads, like Facebook’s early infrastructure where profile views were much more common than profile updates.
The Master-Master configuration allows writes to multiple master databases, with changes synchronizing between them. This provides better fault tolerance since either master can handle both reads and writes if the other fails. However, it’s more complex to maintain and requires careful consideration of potential conflicts.
For larger scale operations, you might implement partitioning alongside replication. For example, early Facebook used different servers for different universities, effectively partitioning data by school. While this simplified scaling initially, it created challenges when implementing cross-school features like messaging.
What role does caching play in scalable architectures and what are the different caching strategies?
Caching is crucial for performance optimization and can be implemented at multiple levels. At the content level, sites like Craigslist implement file-based caching by generating static HTML files from dynamic content. While this provides excellent performance since web servers are optimized for serving static files, it creates challenges for content updates and design changes.
Database query caching can be enabled in MySQL through configuration settings. This caches the results of identical SELECT queries, providing significant performance improvements for frequently accessed, rarely changed data. However, the cache needs to be carefully sized and managed to prevent memory issues.
Memcached provides a more flexible caching solution by offering a distributed memory caching system. It stores key-value pairs in RAM for quick access and can be used across multiple servers. When memory fills up, it automatically removes the least recently used items. This is particularly effective for storing user sessions, profile data, and other frequently accessed information that doesn’t need to be permanently stored.
How do you implement security in a scaled infrastructure?
Security in a scaled infrastructure requires careful consideration of network architecture and access controls. At the network level, you typically want to implement different security zones. The public-facing load balancers should only accept traffic on necessary ports (typically 80 and 443 for HTTP/HTTPS, and possibly 22 for SSH management).
Internal network traffic between load balancers and web servers often doesn’t need to be encrypted since it’s within your controlled network. This “SSL termination” at the load balancer level reduces CPU overhead on web servers and simplifies certificate management. However, you need to ensure your internal network is properly secured.
Database servers should never be directly accessible from the internet. They should only accept connections from application servers, and only on the necessary ports (like 3306 for MySQL). This follows the principle of least privilege, where each component only has the access it absolutely needs to function.
What strategies can be used for high availability in a scaled environment?
High availability requires eliminating single points of failure throughout your infrastructure. For load balancers, this typically means implementing active-active or active-passive pairs. In active-active configuration, both load balancers handle traffic simultaneously, while in active-passive, one stands ready to take over if the primary fails. They communicate through “heartbeat” messages to detect failures.
For network infrastructure, redundancy includes having multiple switches, with servers connected to both (dual-homing). This ensures network connectivity even if one switch fails. Power redundancy is achieved through multiple power supplies in servers and multiple power sources for the data center.
Geographic redundancy is implemented through multiple data centers in different availability zones. These should have separate power supplies, network connections, and physical security. DNS-based global load balancing can direct traffic between data centers based on factors like user location and data center health.
What considerations are important when implementing partitioning in a scaled system?
Partitioning involves dividing your data or services across multiple servers based on specific criteria. This can be done horizontally (sharding) where different rows of the same table are stored on different servers, or vertically where different types of data are separated (like having separate servers for images versus HTML content).
Early Facebook implemented a form of partitioning by school, with separate servers for different universities. While this worked initially, it created challenges when implementing cross-partition features like inter-school messaging. This highlights the importance of choosing partition keys that align with your application’s access patterns.
When implementing partitioning, you need to consider how to handle cross-partition queries, maintain consistency across partitions, and manage partition growth. You also need a strategy for rebalancing data when adding new partitions or when existing partitions become unbalanced.
What is SFTP and why is it important for web hosting?
SFTP (Secure File Transfer Protocol) is critical for web hosting security. Unlike regular FTP, SFTP encrypts all traffic between client and server. While encrypting regular content like images or public HTML files might seem unnecessary since they’ll be publicly accessible anyway, SFTP’s encryption is crucial for protecting sensitive data like usernames and passwords during file transfers. Without this encryption, anyone monitoring network traffic could potentially capture login credentials, making standard FTP a significant security risk in modern web hosting environments.
What is VPS hosting and how does it differ from shared hosting?
A VPS (Virtual Private Server) represents a significant step up from shared hosting in terms of resource isolation and control. Using hypervisor technology from companies like VMware or Citrix, VPS providers take powerful physical servers and divide them into multiple virtual machines. Each VPS gets its own dedicated portion of the server’s resources and its own operating system instance.
Unlike shared hosting where hundreds of customers share the same operating system and resources, VPS provides guaranteed resource allocation and complete operating system isolation. However, system administrators at the VPS company may still have access to your virtual machine, especially through features like single-user mode or diagnostic mode. For complete privacy, organizations would need to operate their own physical servers.
What are some common causes of scalability failures despite redundancy?
Even with careful planning, several factors can still cause system-wide failures:
Building-level failures: Power outages, fires, or natural disasters affecting an entire data center
Network connectivity issues: Problems with ISPs or network infrastructure
Cascading failures: When the failure of one component overloads others
Multi-zone failures: As seen with Amazon Web Services, where multiple availability zones can be affected simultaneously
DNS caching issues: Even after failing over to backup systems, cached DNS entries can continue directing traffic to failed systems
Human error: Misconfiguration or accidental shutdowns
These scenarios highlight the importance of not just having redundancy within a facility, but also maintaining geographic distribution of services and having comprehensive disaster recovery plans.
How do modern CPUs and cores affect server performance?
Modern server performance is significantly impacted by CPU architecture. Most current servers have multiple CPUs, with each CPU containing multiple cores. For example, a quad-core processor can literally perform four tasks simultaneously, unlike older single-core processors that could only handle one task at a time.
While older systems created the illusion of multitasking by rapidly switching between tasks (giving each program a split second of CPU time), modern multi-core systems can perform true parallel processing. This is particularly beneficial for web servers handling multiple simultaneous requests. However, software must be properly designed to take advantage of multiple cores, and not all applications can effectively utilize all available cores.
What is Amazon EC2 and how does it relate to scalability?
Amazon EC2 (Elastic Compute Cloud) is a self-service VPS platform that allows you to spawn virtual machines on demand. The key advantage is its elasticity - you can automatically scale up or down based on demand. For instance, if your site suddenly gets popular (like being featured on Reddit or getting “slashdotted”), you can automatically spawn more servers to handle the increased traffic. When traffic subsides, these servers can automatically power down, helping manage costs since you only pay for the time your instances are running (typically charged per minute).
What approaches can websites take for handling static vs dynamic content?
The lecture used Craigslist as an example of an interesting approach to content delivery. Despite being a dynamic website where users can post ads, Craigslist saves the generated pages as static HTML files. This approach has several implications:
Advantages:
Very fast content delivery since Apache excels at serving static files
Reduced server load since pages don’t need to be generated for each request
Can handle high traffic volumes efficiently
Disadvantages:
Requires additional storage space since you’re storing both the database content and generated HTML
Makes site-wide design changes difficult since you need to regenerate all HTML files
Duplicates common elements (headers, footers) across many files
Less flexible for real-time content updates
What role do different data center components play in a scaled infrastructure?
A complete data center infrastructure includes multiple components:
Switches:
Need redundant switches to avoid single points of failure
Servers should connect to multiple switches (dual-homing)
Must be configured to avoid network loops
Power Systems:
Redundant power supplies in servers
Multiple power sources for the facility
UPS systems and generators for backup
Network Connectivity:
Multiple internet connections from different providers
Different physical entry points to the building
Redundant internal networking
Physical Security:
Access controls
Environmental monitoring
Fire suppression systems
What challenges exist with multi-data center architectures?
Operating across multiple data centers introduces several complexities:
Data Synchronization:
Keeping databases synchronized across locations
Managing session state between centers
Ensuring consistent user experience
Traffic Routing:
Implementing global load balancing
Handling DNS-based routing
Managing failover between locations
Latency Issues:
Dealing with increased latency between data centers
Managing data replication delays
Handling cross-data center queries
Consistency Challenges:
Maintaining data consistency across locations
Handling conflicts in multi-master setups
Managing distributed transactions
What is the Query Cache in MySQL and how does it work?
MySQL’s query cache is a performance optimization feature that can be enabled by setting query_cache_type=1 in my.cnf configuration file. When enabled, MySQL stores the complete result set of SELECT queries along with the exact query text. If the same query is executed again and the underlying data hasn’t changed, MySQL can return the cached result instead of re-executing the query. This is particularly effective for read-heavy workloads where the same queries are executed frequently.