Cloud Computing Flashcards

Question

What is the Big O?

Answer 1

* Mathematical notation describing the limiting behavior of a function * It is used to classify algorithms according to the complexity class (how their requirements grow as the input size grows) * It gives us an upper bound (worst case) of how much time/space the algorithm will need

Answer 2

It illustrates how fast a problem of size, n, grows depending on the complexity class

Answer 3

An O(log n) algorithm is **highly efficient**, as the operations per instance required to complete decrease with each instance.

Answer 4

* Binary trees and binary search * Binary search; algorithm that finds the position of a target value within a sorted array

Answer 5

* in situations where the algorithm has to sequentially read its entire input * Example; a procedure that adds up all elements of a list requires time proportional to the length of the list

Answer 6

* Common with algorithms that involve nested iterations over the data set * Examples: * multiplying two n-digit numbers by a simple algorithm * simple sorting algorithms such as bubble sort, selection sort and insertion sort

Answer 7

* Public Key Cryptography * It is computationally hard to find prime factors of large numbers so we use such numbers on purpose to make decryption unfeasible running time is upper bounded by a polynomial expression T(n) = O(n^c)

Answer 8

Exponential algorithms T(n) = O(2n^c)

Answer 9

With current hardware, we can maximum do around a 21!

Answer 10

* Administrative * increase number of organisations/users to easily share a single distributed system * Functional * enhance the system by adding new functionality at minimal effort * Geographic * expand from concentration in a local area to a more distributed geographic pattern, and keep performance * Load * expand/contract the resource pool to accommodate heavier/lighter loads or numbers of inputs * Generation * the ability of a system to scale up by using new generations of components (& different vendors)

Answer 11

scalability is only possible if we architect and engineer our system to take scalability into account. We must ask ourselves * which axis do we expect the system to grow? * where is redundancy required? * how do we manage heterogeneity? * where are the pitfalls and bottlenecks? * etc.

Answer 12

* Scalability * enlarge by adding more resources * Elasticity * able to provision resources at any time * Performance * good response time * High availability * avoid downtime / low downtime * the 5 9s… 99.999 = only amazon, google, apple etc. that are this efficient (a few seconds downtime) * Maintainability * automate deploys (DevOps) * Monitoring * know the status of the system * Security * keep the data secure

Answer 13

because you can’t improve what you don’t measure * measure your system to find bottlenecks * optimize those bottlenecks * verify the improvements * rinse and repeat!

Answer 14

It is so complicated to create well-working distributed systems and software BUT… if you structure it as a collection of loosely coupled microservices, you will make your life much easier. you make it easy to scale each microservice individually (usually in a container)

Answer 15

Smart data structures and dumb code work a lot better than the other way around importance: Data structures \> Code * Code is easy to change * Data schemas are difficult to migrate

Answer 16

Message Queues Software services to pass messages between programs/components used to scale components horizontally =\> all listen from a queue and write another queue Examples: AMQP, AWS SQS, RabbitMQ, Redis

Answer 17

* Vertical * better hardware to the single node (single computer) * faster CPU, more CPUs, more memory, better disk * Often the easier solution (when possible) * very common on databases stateful systems * BUT * there are hardware limits * diminishing returns (gets expensive to improve) * involves downtime * so keep a hot spare, fx. RDS Multi A-Z “standby” replica with automatic failover * Horizontal * using multiple nodes (computers in a cluster) * common on stateless web servers * high availability because one node can crash without problems * cheap solution (cheap hardware, cheap virtual machines) * BUT… * architecture must support horizontal scaling * management overhead cost

Answer 18

* Often the easier solution (when possible) * BUT * there are hardware limits * diminishing returns (gets expensive to improve) * involves downtime * so keep a hot spare, fx. RDS Multi A-Z “standby” replica with automatic failover very common on databases stateful systems

Answer 19

* high availability because one node can crash without problems * cheap solution (cheap hardware, cheap virtual machines) * BUT… * architecture must support horizontal scaling * management overhead cost common on stateless web servers

Answer 20

* Distributed FILE SYSTEMS * File Transfer Protocol (FTP) * Content Delivery Network (CDNs) * Amazon Simple Storage Service (S3) * Hadoop Distributed File System (HDFS) * Distributed DATABASES * Relational vs. Non-relational * ACID Properties * Scaling Relational DBs

Answer 21

File Transfer Protocols * Standard network protocol for the transfer of files between a client and server * It allows authentication * By default, it is insecure. But use SFTP instead * Can be scaled to some extent * Very simple and also too simple for many use cases

Answer 22

* A CDN is a geographically distributed network of proxy servers and their data centers (near the end-users at the ISP) * from Denmark, you get your content on Netflix, Spotify etc. from a server close by either in Denmark, Germany, Norway, Sweden or UK. * CDNs serve most of the internet content such as live and on-demand video-streaming, downloadable files, software, updates etc. and generally content on mobile and web.

Answer 23

Amazon Simple Storage Service * Provides storage through web services interfaces and APIs * Store arbitrary files up to XX terabytes * Guarantees 99.9 % monthly uptime (less than 43m of downtime monthly) Competitors: Google Cloud Storage and Microsoft Azure Storage

Answer 24

Hadoop Distributed File System is a distributed, scalable and portable file system written in Java Stores large files, replicated on commodity machines (small and cheap)

Answer 25

* Great for typical tabular data (think Excel structure) * Examples: MySQL, PostgreSQL * It is really difficult to distribute relational databases * ACID * Atomicity, Consistency, Isolation, Durability * Guarantee validity in the event of errors, power failures..

Answer 26

NoSQL * Easier to be distributed (scale horizontally) * (easier because they don’t guarantee the ACID properties) * Other models * Column: Cassandra, HBase * Document: CouchDB, MongoDB, IBM Domino * Key-value: Dynamo, Redis, Riak * Graph: Neo4J, OrientDB * Multi-model: ArangoDB, Couchbase * NO ACID!! * CAP theorem (choose 2 of 3) * Consistency * Availability * Partition-Tolerance

Answer 27

For relational databases… * Atomicity * Each transaction is “all or nothing” =\> if part of the transaction fails, the entire transaction fails and the database state is left unchanged * Must provide atomicity in each and every situation, including power failures, errors and crashes * Consistency * One transaction will bring the database from one valid state to another WHILE no programming errors will result in the violation of any defined rules (including constraints, cascades, triggers and so forth) * Isolation * transactions are executed sequentially =\> one after the other * Durability * Once a transaction has been committed, it will remain so, even in the event of power loss, crashes or errors =\> SQL statements are stored permanently after execution

Answer 28

There are many approaches to scale them * Partitioning of large tables * Primary / Replicas * Other Tricks: * Add indexes, improve the schema * UUIDs instead of sequential IDs * Preload data in a cache * Queries in batches * Persistent connections

Answer 29

referred to as master/slave by some databases 1 primary and n replicas * Write in the primary (will replicate) * Read on the replicas * especially, the complex and expensive queries such as backups, consolidations, aggregations etc. Hot spare: * If the primary dies, promote a replica to be the new primary =\> minimum downtime

Answer 30

A partition is a division of a logical database into independent parts * Split in databases or in tables * split given a certain criteria Examples of partition practices: * Round-Robin partitioning * simplest strategy * with n partitions, assign each row (i) to a partition, sequentially (i mod n) * Range partitioning * the key is inside a certain range (example, range of zip codes) * List partitioning * assigned from a list of values (example, a partition containing a list of countries. If they key is one of these countries, the partition is used) * Hash partitioning * Applies a hash function (“signature”) to yield the partition number * Composite partitioning * Use combinations of the other partitioning schemes

Answer 31

# Choose 2 of 3 * Consistency * Every read receives the most recent write or an error * Availability * Every request receives a non-error response - without guarantee that it contains the most recent write * Partition tolerance * The system continues to operate despite messages being dropped/delayed by the network between nodes With…. * Consistency + Availability * Network failures do happen * Consistency + Partition Tolerance * You’ll get errors/timeouts * Availability + Partition Tolerance * You’ll get out-of-date responses

Answer 32

type of computer software in which the copyright holder grants users the right to study, change, and distribute the software for any purpose

Answer 33

* designed for all purposes * ACID (Atomicity, Consistency, Isolation, Durability) * Mathematical background * Vertically scalable (but not horizontally = over multiple computers)

Answer 34

Standard query language

Answer 35

* Rather called “not only SQL” than NoSQL * Non-relational * Cluster friendly, Horizontal scaling * Schema-less = No burden of up-front schema design * 21 century web * Open source * Minimum overhead * Solution to impedance mismatch * Examples: Redis, MongoDB, Cassandra etc.

Answer 36

JSON, XML,

Answer 37

* Key-value data models * Column-family * Document-based * Graph

Answer 38

html, file not found

Answer 39

forbidden, due to not being human (looks like a computer)

Answer 40

Both used for front-end development: * HTML (Hypertext Markup Language) * Gives websites structure and stores the content * Our target for web scraping * HTML contains tags and references to style…. * CSS (Cascading Style Sheets) * Style… gives format to content and provides visualization opportunities (style, font, color, border, images, positioning etc.)

Answer 41

* Who owns the hardware? * Who can customize the infrastructure? * Flexibility to scale? * Security? * Cost? Is it predictable?

Answer 42

* SHARED physical hardware * owned and operated by a 3rd-party provider * Multi-tenant environment with pay-as-you grow scalability * Best for non-sensitive, public-facing operations and unpredictable traffic * Examples: AWS, MS Azure, Google Cloud

Answer 43

* Infrastructure DEDICATED to your business * hosted on-site or in data center * Greater level of control and security * for strict regulations and governance obligations * Customizable * Best for sensitive, business-critical operations * Example: OpenStack

Answer 44

* Combine public cloud with private cloud * Leverage the best of both worlds: * Public cloud for non-sensitive operations * Private cloud for business-critical operations * Highly flexible/agile and cost-effective solution You use different providers to achieve this

Answer 45

It is a free software platform for cloud computing that offers servers, services and resources to deploy a private cloud. Clients: * Auto companies, Hollywood, Airlines, Banks (Visa, Paypal go through OpenStack), Supermarkets (Walmart, Target etc.), Telecom (fx AT&T), Energy companies, Academic Researchers, Bloomberg = Everybody and huge companies

Answer 46

1. **Run** the program for any purpose 2. **Study** how the program works, and change it to make it do what you wish 3. **Redistribute** and make copies so you can help your neighbor 4. **Improve** the program, and release your improvements to the public, so that the whole community benefits

Answer 47

* Dropbox * Simplicity * Instagram * Scaling slowly by invitation and thus also managing costs to AWS * Whatsapp * MVP first and scale from there * Snapchat * Invested 3B in Google Cloud ($2b) and AWS ($1b) after IPO * Netflix * Use multiple public cloud providers but also have their own private * Multi-CDN near end-users

Answer 48

A great online tool where you can see the tech stack of many startups and larger ventures such as Netflix, AirBnb, Spotify etc.

Answer 49

Downtime of even minutes can have huge cost for companies as seen by for example Gmail, Paypal (1h, but had to reimburse merchants for lost sales), hotmail (deleting 17k accounts), AWS (big shit, 2011), Salesforce, etc. Ways to prevent it: 1. Learn from your mistakes 1. Post-mortems used to build up an institutional memory 2. Expect failures 1. Use Simian Army (inc. Chaos Monkey) to test your infrastructure 2. Use multiple cloud providers 3. Transparency helps rebuilding trust 1. Should it still happen, make sure to write a proper post-mortem

Answer 50

Chaos Monkey is an open-source software tool that was developed by Netflix engineers to test the resiliency and recoverability of their Amazon Web Services (AWS). The software simulates failures of instances of services running within Auto Scaling Groups (ASG) by shutting down one or more of the virtual machines. It works by intentionally disabling computers in Netflix's production network to test how remaining systems respond to the outage. Chaos Monkey is now part of a larger suite of tools called the Simian Army designed to simulate and test responses to various system failures and edge cases.

Answer 51

A record explaining what went right and wrong over the course of a project. Always blameless and stating how we can prevent failures from happening again. Goal: build up an institutional memory and develop a set of best practices. Also great for trust-building.

Answer 52

The Simian Army is a collection of open source cloud testing tools created by the online video streaming company, Netflix. The tools allow engineers to test the reliability, security, resiliency and recoverability of the cloud services that Netflix runs on Amazon Web Services (AWS) infrastructure.

Answer 53

“Data spills occur with the regularity of oil spills. The victim of identity theft, bogged down in unwanted credit cards and bills, is just as trapped and unable to fly as the bird caught in the oil slick, its wings coated with a glossy substance from which it struggles to free itself.”

Answer 54

* Intellectual property * catch-all phrase for the concepts below * Copyright * promoting authorship and art (covers the details of expression of a work) * Patent law * promote the publication of useful ideas and provide a great incentive in the form of a temporary monopoly * Trademark law * enabling buyers to know what they are buying (brands, advertising)

Answer 55

GNU General Public License is a widely used free software license, which guarantees end users the freedom to RUN, STUDY, REDISTRIBUTE and IMPROVE the software. The GPL license is copyleft, therefore you must disclose your source code and make your modified version of your code open source as well. Under GPL you can't sub-license, meaning, you can’t change any of the original license terms or introduce any of your own. You’re also required to state all the changes you make to the original code.

Answer 56

Family of permissive free software licenses, imposing minimal restrictions on the use and redistribution of covered software. BSD is more relaxed / free so you can do what you want.

Answer 57

Software is not run at the user-machine but in the cloud. Hence, the freedoms don’t apply as the user can’t see the code, study it or adapt it.

Answer 58

* The GNU Affero General Public License is a modified version of the ordinary GNU GPL version 3. * It has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there.

Answer 59

“Some data should be **freely available** to everyone to **use and re-publish** as they wish, without restrictions from copyright, patents or other mechanisms of control” Used to be a requirement for Science. Now it is discussed.

Answer 60

“Citizens have the right to access the documents and proceedings of the government to allow for effective public oversight” important for public scrutiny - especially with finances

Answer 61

* ISPs must treat all data on the internet the same * Don’t discriminate or charge differently by user, content, site… * The internet was build with net neutrality as it allows for innovation, competition and equality

Answer 62

A systematic approach to hacking passwords by trying all combinations. optimized by prioritizing likely possibilities through frequency tables, dictionary attacks, and most common passwords

Answer 63

when an attacker disguises as a trustworthy entity, to obtain your sensitive information by tricking you.

Answer 64

Targeted attacks where the attacker gathers personal information about a specific target. This is generally very successful. Can be targeted at executives (in such cases CEO fraud / whaling)

Answer 65

Man-in-the-middle When two parties communicate between each other but an attacker is in the middle collecting credentials and altering messages. You need encrypted message services. HTTPS \> HTTP

Answer 66

* Mechanical =\> Electronic * Manual =\> Automatic * Analog =\> Digital * Single-purpose =\> General-purpose

Answer 67

Adding redundancy should not deteriorate the performance! (then it is not scalable) Redundancy = duplication of critical components in order to improve system reliability

Answer 68

The value of T(n) is bounded by a value that does not depend on the size of the input. Examples * Accessing any single element in an array * Determining if an integer is odd or even

Answer 69

* Atomicity * Consistency * Isolation * Durability Relates to relational databases

Answer 70

“The relationship between the collection and dissemination of data, technology, the public expectation of privacy, and the legal and political issues surrounding them”

Answer 71

“If you torture the data long enough, it will confess to anything” =\> Fake news

Cloud Computing Flashcards

(98 cards)