Openstack Scaling and Capacity Planning Flashcards
What is the primary scaling approach used by OpenStack?
Horizontal scaling - adding more servers with identical configurations rather than upgrading to larger servers
What determines hardware failure likelihood in servers?
The chance of failure is highest at the start and end of its life cycle
What is the default CPU overcommit ratio in nova.conf?
0.6673611111111111
What are the four methods to segregate an OpenStack cloud?
Cells,
regions,
availability zones,
and
host aggregates
How does the OpenStack dashboard currently handle regions?
It uses only a single region, so one dashboard service should be run per region
What should be considered when estimating API service load?
Usage patterns,
user access patterns
VM lifetime,
and frequency of VM creation/termination,
What are the key considerations for hardware procurement in OpenStack?
Hardware should be stable, supported by OpenStack-compatible Linux distribution, and have the same CPU type to support instance migration
What is the purpose of burn-in testing?
Testing that stresses hardware to its limits to trigger early-stage failures, typically done through CPU or disk benchmarks over several days
How can you determine the scalability needs of your cloud?
By tracking metrics like
core count,
VM expectations,
storage requirements,
and applying ratios based on flavor templates
What are the default OpenStack flavors?
m1.tiny (1 core, 512MB RAM), m1.small (1 core, 2GB RAM), m1.medium (2 cores, 4GB RAM), m1.large (4 cores, 8GB RAM), m1.xlarge (8 cores, 16GB RAM)
What is the typical core capacity of a basic cloud controller server?
An eight core, 8GB RAM server typically handles up to a rack of compute nodes
What must be considered when horizontally scaling user-facing services?
They should be
load balanced using standard HTTP methods like
DNS round robin,
hardware load balancer, or
software like Pound or HAProxy
What special consideration must be made when load balancing the dashboard’s VNC proxy?
The VNC proxy uses
WebSocket protocol
which can be challenging for
L7 load balancers to handle
How can you calculate the expected number of VMs your cloud can support?
(Equation)
(overcommit fraction × cores) / virtual cores per instance
What role does nova-cells play in a cell-based deployment?
It manages the communication between the API cell and child cells
What are the shared services between all availability zones?
Keystone and all nova services
What metric should be considered when adding object storage nodes?
A weight should be specified that reflects the capability of the node
How does nova-scheduler handle compute nodes with different specifications?
It automatically handles differences in core count and RAM amounts
What is the main purpose of the API cell in a cells deployment?
It runs the nova-api service but no nova-compute services
What services can be configured to use multiple processes?
nova-api and glance-api through a flag in their configuration files
What types of separation does a region provide?
Discrete separation with separate API endpoints and no coordination between regions
How are availability zones defined in OpenStack?
They are defined locally on each server to identify the zone in which a specified compute host resides
What is a common use case for host aggregates?
To provide information for use with the nova-scheduler, such as grouping hosts that share specific flavors or images
What impact does leaving the OpenStack dashboard instances tab open have?
It refreshes the list of VMs every 30 seconds, potentially increasing load significantly
How should you handle service availability in a scaled cloud?
User-facing services should be load balanced using standard HTTP load-balancing methods
What is unique about the API cell’s configuration?
It’s the only cell that runs nova-api but doesn’t run nova-compute services
How are resource weights handled in object storage nodes?
A weight should be specified to reflect each node’s capability when adding new storage nodes
What is the recommended setup for dashboard services across regions?
Run one dashboard service per region since it only uses a single region
What determines the processing power needed for cloud controller cores?
User access patterns, VM creation frequency, and average VM lifetime
How does instance listing affect cloud controller load?
Frequent instance listing operations can significantly increase load on nova-api and its database
What is the purpose of burn-in testing duration?
To run long enough (several days) to trigger potential early-stage hardware failures
How should metadata keys be handled in host aggregates?
They should be set consistently and matched with instance type extra specs
What is the relationship between cells in a deployment?
Cells are configured in a tree structure with an API cell at the top level
How does OpenStack handle hardware diversity in compute nodes?
nova-scheduler manages differences in core count and RAM, but CPU speeds affect performance
What should be considered when planning cloud controller capacity?
API service load, database server load, and queue server load based on usage patterns
What is the recommended approach for scaling OpenStack services?
Horizontal scaling with identically configured services that communicate via message bus
Why is hardware stability important in a cloud environment?
To provide a stable foundation for hosting volatile cloud resources and services
What is the Configuration option that was deprecated regarding availability zones?
CONF.node_availability_zone was deprecated but still works
What is the recommended hardware type for OpenStack deployment?
Standard value-for-money offerings that most hardware vendors stock
How can you verify capacity requirements for cloud expansion?
By monitoring metrics detailed in Chapter 13, Logging and Monitoring
What types of benchmark tests are recommended for burn-in testing?
CPU and disk benchmarks run over several days
What is the main consideration when configuring services like nova-api for multiple processes?
Changes must be made through flags in their configuration files to enable work sharing between cores
How does the euca-describe-availability-zones command differ in verbose mode?
Verbose mode shows internal availability zones while non-verbose mode hides them
What happens to the internal availability zone in availability zone listings?
It’s hidden in non-verbose euca-describe-availability_zones output
What is considered experimental in OpenStack’s segregation methods?
The cells feature is considered experimental
What is the key difference in managing child cells vs. the API cell?
Child cells run all typical nova services except nova-api, while API cell runs only nova-api
What are the components needed in a child cell deployment?
All typical nova-* services except nova-api, plus its own message queue and database service
How can you calculate the storage requirements for your cloud?
Multiply the flavor disk size by the number of expected instances
What is Table 5.2’s recommended use case for cells?
When you need a single API endpoint for compute or require a second level of scheduling
What services are shared between child cells and the API cell?
Only Keystone and nova-api
What considerations should be made for database loading in a scaled deployment?
MySQL load balancing options should be configured, and AMQP brokers should use built-in clustering support
What is the primary difference between segregating with cells/regions versus availability zones/host aggregates?
Cells and regions segregate entire cloud deployments, while availability zones and host aggregates divide a single Compute deployment
What is the AggregateInstanceExtraSpecsFilter used for?
To ensure instances are scheduled only on hosts in aggregates that define the same key to the same value
What are the two filters that use allocation ratio values in host aggregates?
AggregateCoreFilter and AggregateRamFilter
What is the recommended way to handle allocation ratios across multiple aggregates?
Each host should have only one allocation ratio for each resource, even when in multiple aggregates
What was changed regarding availability zones for services like nova-scheduler and nova-network?
These services now span all availability zones instead of having their own
What is CONF.internal_service_availability_zone used for?
It’s used when running operations like nova host-list, euca-describe-availability-zones verbose, and nova-manage service list
What happened to CONF.node_availability_zone?
It was renamed to CONF.default_availability_zone and is now only used by nova-api and nova-scheduler services
How should high-intensity and low-intensity computing loads be handled in the same cloud?
By setting different CPU and RAM allocation ratios through aggregate metadata and instance type extra specs
What is the recommended abstraction method for allocation ratios?
Define an additional key-value pair like ‘overcommit’ with values ‘high,’ ‘medium,’ or ‘low’ rather than matching directly on allocation ratios
What parameters should be monitored for key hardware specifications?
Storage performance (spindles/core), memory availability (RAM/core), network bandwidth (Gbps/core), and CPU performance (CPU/core)
What types of separation can availability zones provide?
Physical isolation, redundancy, power supply separation, and network equipment separation
How should compute nodes be added to an existing installation?
They can be simply added and will be automatically picked up by the existing installation
What determines if a service can be installed on a new server for expansion?
Services that communicate only using the message queue internally can be easily added to new servers
What is the advantage of using regions for site separation?
They allow for discrete separation with separate API endpoints while maintaining shared infrastructure
What is the primary concern when having a host in multiple aggregates?
Avoiding conflicting allocation ratio values for the same resource across different aggregates
What are the key metrics to monitor for cloud growth?
Resource usage and user growth patterns to determine when to procure additional resources
What is the benefit of using non-identical hardware in OpenStack?
New nodes don’t need to be the same specification or vendor as existing nodes, providing flexibility in scaling
How does nova-scheduler handle CPU differences in compute nodes?
It handles core count and RAM differences, but CPU speed differences will affect user experience
What is the significance of WebSocket protocol in dashboard deployment?
It’s used by the VNC proxy and can cause challenges with L7 load balancers
What are the three main building blocks for hardware procurement?
Compute, object storage, and cloud controller
What factors affect cloud controller core requirements?
User patterns, VM creation frequency, and typical VM lifetime