DevOps Flashcards

Question

S3

Answer 1

- S3 is a distributed file storage system - You can upload files via the AWS API - Files can be private or public - S3 is outside of region settings - These are static files.

Answer 2

- Cloudfront is a distributed CDN (Content Delivery Network) - It is outside of region settings - Can be configured to serve content from an application server or an S3 bucket - Can be configured to use a custom domain - Expiry/Invalidation of cached content can be configured - Invalidation can also be caused by API

Answer 3

- The certificate manager stores any SSL certificates that you wish to use in other AWS services - We will be using this to enable HTTPS in our application

Answer 4

- Route 53 is a domain name manager - You can register new domains, or transfer existing domains to AWS nameservers - Once you have a domain hosted on AWS, you can configure DNS records for the domain - Route 53 integrates gracefully with other AWS services, allowing you to do things that would be difficult on other hosting platforms (geodns, elb)

Answer 5

Cloudwatch is used to monitor the health of your servers Makes sure website is up, working.. the goal is to get to the bugs before the users get them. Monitored resources include: CPU RAM Network Hard Disk “Alerts” are configurable in the AWS UI Alerts notify devs by email when resource usage passes given thresholds Can be integrated with other notification services (PagerDuty)

Answer 6

Opsworks enables the automation of machine configuration IT orchestration or IT automation Uses “Layers” to specify roles for each machine Can use chef or puppet to author setup scripts Scripts get triggered by instance lifecycle events Deployment scripts can be triggered by API Integrates well with other AWS services More detail to come!

Answer 7

IAM (identity and access management) will come up frequently throughout your AWS usage For our purposes we will ignore any IAM usage (for NOW, just default) IAM allows teams of developers to use the same AWS account, with seperate privileges, configured by an admin IAM also allows you to create roles that are not intended for use by humans, but for API usage

Answer 8

AWS API requires you to authenticate via 2 tokens: aws_access_key_id aws_secret_access_key Tokens can be managed via (Account Name) > “My Security Credentials” The easiest way to interact with the AWS API from your local machine is to use the aws_cli tool The tool, as well as other AWS libraries, look for credentials in 2 files under ~/.aws: credentials config The config file stores the default region setting (us-west-2)

Answer 9

Jenkins is an open source pluggable build workflow manager with a web UI Plugins and a public plugin repository allow Jenkins to be integrated with many services Projects are configured in workflow stages: Source Control Build Steps Post Build Steps (even if these fail, its still considered successful) Each stage of the workflow can impact the next step (e.g. on failure) Project can be configured to trigger each other The web UI shows a dashboard that can visualize build status, as well as test results

Answer 10

Uglify: - to save space (turn variables to one letter) - remove white space - condense repetition - make code a lot smaller - Take source code and create "binary"

Answer 11

Any command we run is a compiled program. The binary is the output of a compilation process. CI/CD sere builds a "sacred" version of out output for production after testing.

Answer 12

Every time we find a bug, write a code that tests that you've fixed it.

Answer 13

Jenkins plugins allow you to write custom source control integrations, build steps or post build steps Plugin developers are also able to integrate with the web UI to provide options for users Plugins needed for our exercises: Default Plugins Install RVM plugin Install NVM Wrapper plugin

Answer 14

Make sure the same version we test on is ultimately the same version we are looking to deploy.

Answer 15

When we have multiple projects, we might have different versions or dependencies, so we will have different gem sets.

Answer 16

Jenkins can integrate with many different source control systems Traditional source control system would just poll on a regular basis Github uses web hooks (http requests) Jenkins Github plugin can support github webhooks Can be configured to only run on certain branches

Answer 17

Build steps are steps in the workflow ``` These steps will be things like: Run Tests Build Production Assets Publish Assets to S3 Deploy via Opsworks ``` We will use the “bash script” build step Although we could write our own Jenkins plugins to do each of these steps

Answer 18

The post build stage of the workflow These are activities that are not considered crucial to the build process Examples include: Source Control triggering (build status) Test Results publishing Notifying developers

Answer 19

The ability to do a SQL join. Mongo is a noSQL therefore not relational. mySQL/postgres are. Interface on top of EC2 that allows us to automatically create an instance. - RDS provides pre-configured instances of common relational databases - Because many of the features of these databases are shared across all implementations, the UI provides controls for common features

Answer 20

- DB is just a data structure that instead of being stored in RAM, its stored on disk. - Primary key refers the key in a BST. - Data is stored in tables - Tables stored as rows and columns - Underlying data-structure is typically a *BST* or Array (fixed fields) - Queries (searching) is typically performed via SQL - Able to relate different tables - Able to provide different ways to “Key” the BST - “Primary Key” is typically disk represented - Able to provide additional “indexes” to increase search speed - BST does not have all the data, but it has all the keys. And these keys are associated with that data on record. - We want auto incrementing ID. - Every time it adds a record, it has to rearrange the search tree. - if we search by ID, we g tit in log n. if we go by user name, its much slower. - Using hash map we are giving up order ( we typically use them to cache data). - BST inserts are logn. - Think phonebook (continuguous like LL or array) and index to phonebook (bst) Reads: O(logn) Writes: O(logn) - When DB gets big, you will see your performances slowdown. - Either use index to find data or resort to "table scan" if you need to search linearly. - Choose the opportunities to make indexes carefully. Having to write again and again will add up even if the average run time is O log n. 7 O log n or 7000 O log n adds up. - reads vs writes. balancing act

Answer 21

Think about apps by thinking about that Data layer first. ("app has users, which have songs that belong to artists etc.. ")

Answer 22

Indexes tell the database to also represent the table in a secondary BST (or Hash), with keys only, values pointing to the original record. When querying, the database will try to use the best index available for optimal speed. Primary Key Index Foreign Key Index Prefix Index Full Text Index Multi-column Index Indexes will increase lookup speed, but slow inserts/deletes.

Answer 23

- Master/Slave model *One DB is the source of truth. We can write to master, but cant write to slave. - Master is read/write - Slave is read only, used for large reads, also used for backups - Master uses transaction log shipping to update slave Indexes can vary between master and slaves - Build into the system, build into the system a one true location of the data (single source of truth). - When data has more than one source of truth, things get messy really quickly. - If we want to spit up the data into, we want to separate data onto different servers based on their joins/associations. CPU usage will be evenly spread across two machines. We are breaking the app into *microservices*. We are sharding the DB (separate vertically: billing and app DATA., or horizontally: one user mapping to other items on other DBs across multiple servers... a DB per client). You can do indexed reads on other data bases while your source of truth isn't indexed.

Answer 24

- All database implementations support some way to take a backup of the data - AWS calls this feature “Snapshots” - AWS helps you take snapshots and restore them through the UI - You can also manually take backups in postgres: - pg_dump --host= > latest.dump - pg_restore --host= --dbname= latest.dump

Answer 25

* Create Process to create a new application server... | * make sure everything is in sync.

Answer 26

* Process is error prone esp if done maually. | * connect, run script, disconnect.

Answer 27

*Best way to keep machines in sync. *Neeed a way to tll serve is something is out of sync. Faster

Answer 28

Downtime is always a concern, so speed is desired We want all machines to be in a known state at all times. We need 2 scripts. - setup, deployment. Deployment scripts are comprised of multiple steps - Every step has a way to move forward and a way to move back. Failure of a single step can cause the machine to be in an unknown state ``` Failures can occur from: Bugs in scripts Network Issues Version Locking Issues !!very person. Resource Inconsistencies (bits can randomly flip fro 1 to 0 for example). ``` We can mitigate risks by: Minimizing network distance Automating as much as possible Idempotency with rollback

Answer 29

A manager or server that urns something like Puppet or chef.. Example: "run version A on all machines. Now run B".

Answer 30

Just do one test deployment to make sure it runs properly.

Answer 31

Deploy half of servers.. and test for errors... is the new version of software more error prone? Then we switch and update. We will never have downtime.

Answer 32

Each one is case by case: Database Migrations Check which version and what we want to update to. Load Balancing Have multiple load balancers. A/B testing Run two version of our code an test it out. Business decisions? Push vs Pull Pull report back to check if the versions match. Multiple server roles with multiple environments

Answer 33

Puppet scripts define the “final state” of the server * blocks of code that have dependencies to each others packaged as a script. those things are what need to happed to move to the next stage. "Dependency graph". Each “step” in the script is modelled as a “resource” Each resource can be dependent on other resources Puppet makes a graph of changes to apply based on deps Each resource type has the ability to compare the current state to the desired state * to do a puppet deploy. Show new git version. serve will maker graph to get us ready. We aren't writing a script. We ware defining the last state, and the server makes a graph that gets there? Each resource type also has the ability to modify the server’s state, to deploy or rollback Puppet server communicates bi-directionally with puppet agents on the machines Can be run in solo mode Ruby (DSL):

Answer 34

* Still defined in block of resources, but with no resources. Chef “recipes” define “repeatable steps” to get a server into a desired state. Recipes are combined into cookbooks (pluggable) Each “step” in the script is modeled as a “resource” Each resource is intended to be parameterized and idempotent Rollback is then just redeploy with previous params Chef server communicates bi-directionally with chef agents on the machines Can be run in solo mode Ruby (DSL: Domain Specific language)

Answer 35

*Does not use middle man role. (not cutover or pilot etc... run it from our local machine). It does not have a central management server. Connects to your nodes and pushes out small programs, called “Ansible modules” Parallel Modules are models of the desired state of the system Ansible executes these modules (over SSH by default), and removes them when finished No servers, daemons, or databases Python

Answer 36

``` Client/server model, similar to puppet and chef Parallel or series Plugin system SLS scripts Python ```

Answer 37

AWS service Uses client/server model Uses custom server and custom client agent Uses either puppet or chef scripts Therefore uses puppet/chef in solo mode Integrates with lifecycle of AWS EC2 Instances

Answer 38

* does not enforce how we set up our machines. Stacks: Group machines together based on shared resources Layers: Group machines together based on roles Custom recipes per lifecycle event Apps: Deployable identifier, scripts can use app name

Answer 39

Opsworks (and Chef) use a json configuration scheme to apply Chef Attributes: - Stack level - - Overwritten by Layer level - -- Overwritten by Deployment activity Chef internally uses a similar override scheme in its recipes: - default - force_default - normal - override - force_override - automatic

Answer 40

This is a string. Simply a variable thats passed to a JSON file. In the case of AWS it is only a stinig.

Answer 41

when we register a domain, we are buying a right to build onto a server - Hierarchical Server structure (this is very important - need to be able to edit DNS records) - Responsible for mapping a string url to an IP address - 13 Root servers around the world (IANA) - - [a-m].root-servers.net - Nslookup

Answer 42

A - Address - Maps name to IP Addy. It's an ALIAS. CNAME - Canonical name - Maps name to other names (same site hosted with multiple names) SOA - Authenticate that you own the domain? Text record for DNS? MX - Mail exchanger record. This allows different emaill providers to communicate w each other. (ex: hotmail sent to gmail). For receiving emails. NS - Name Server records TXT - Text records

Answer 43

Whereas hashing goes one way, crypto takes in x and key and it can go both ways. If the function is symmetricm if key1 === key2. otherwise, its asymmetric. encryption is like being sent a locked box without the key. if public/private, the function is asymmetric.

Answer 44

Get to the point where both parties have a shared key.

Answer 45

Application Cache vs HTTP Cache Very fast storage, typically in memory Great for storing results to frequent queries Invalidation can be a problem Memcached - Older and larger community source base - Simple data types - HTTP Cache integration Redis (Database mostly meant for caching) - More modern, but less robust - Complex data types - Often hand-rolled solutions

Answer 46

Typically available on port 80 Ascii/UTF8 data format

Answer 47

Spatially located proximal to large populations Less hops, less delays Typically backed by an application If CDN doesn’t have the requested file in cache, get it from the application server Uses http cache headers to decide how long to keep the file in cache Some provide invalidation via API ``` Building your own is hard, best to use: Akamai Cloudfront Cloudera ChinaCache ```

Answer 48

“Cache-Control” http header is used to direct caching rules for any device in the request/response route. no-store - do not cache or even store the file in the client no-cache - do not cache the file public - may be cached by any device (some status codes dont cache by default) private - store in browser cache only max-age= - set maximum cache time must-revalidate - use validation headers

Answer 49

Cache validation can be configured via a number of different headers. Etag - fingerprints the response and returns the hash value - If-none-match client header Last-Modified - weak validation - If-modified-since client header

Answer 50

The “vary” header tells any caching device to cache multiple copies based on another given header (the key).

Answer 51

Cloudfront is the AWS CDN Cloudfront uses Squid under the hood Can use all the previous mentioned headers Can also override http headers Provides “Web” and “RTMP”

DevOps Flashcards

(75 cards)