Weeks 1 - 6 Flashcards

Question

Describe what the kube-controller-manager does?

Answer 1

Unsure of if this is needed to know or not. - Node controller: responsible for noticing and responding when nodes go down - job controller: wacthes for job objects that represent one-off tasks, then creates Pods to run those tasks to until completion - Endpoints controller: populates the enpoints object - service account and token controllers:create default accounts and API access tokens for new namespaces

Answer 2

Each worker node has a Kubelet (the brain of each worker node), which registers the node with the cluster, watches the API server (to execute task and maintain reporting channel), and reports task failure.

Answer 3

The specified number of replicas of a service. | A replicaSet controller is used to manage the status of pods to provide self-healing and scaling.

Answer 4

ACID properies ensure a database maintains data integrity and consistency for a DBMS. Atomicity - all parts of a transaction be completed successfully otherwise, the transaction is aborted (never partially execture, done or not done) Consistency - concurrent execution of transactions, yields consistent results. If not consistent, roll back (to consistency again). Isolation - data used during one transaction cannot be used by a second until the first is completed. Durability - ensures that the result or effect of a committed transaction persists in case of a system failure.

Answer 5

- SQL is table-based DB, NoSQL is document-based, key-value pairs, graph databases or wide-column bases DB - SQL has a predefined schema, NoSQL has a dynamic schema for unstructured data - SQL is best fit for heavy-duty transactional type application - NoSQL is best fit for hierarchical data storage - SQL follows ACID properties - NoSQL follows CAP theorem

Answer 6

NoSQL: - A class of DBMS - does not use SQL as querying language - Distirbuted, fault-tolerant architecture - No fixed or formal schema - No joins Benefits: - flexible, support large numbers of concurrent users, supports unstructured and semistructured data - scalable - high performance, rare to see downtime - rapidly adapt to changing requirements - can store Big Data and Meta Data

Answer 7

Column stores: - data stored by columns rather than rows. - Helpful as row-based systems are not efficient for column-wide operations. - Columnar store can be more efficiently accessed for some particular operations. great for Big Data process - offers high performance - highly scalable Graph databases: - nodes and relationships: node = entity (like a user, category, or piece of data), relationship = two nodes association (like friendship, works for, etc.) - useful for network data - models complex relationships and supports graph-based algorithms Key-value stores: - value can be different types - key is string - useful for frequent I/O operations in simple data model (shopping carts, mobile apps, etc.) Document-based - designed to store and query data as JSON-like documents - easier for developers to store and query data in a database by using the same document-model format they use in their application code - works well with cases like catalogs, user profiles, and content management systems where each document is unique and evolves over time Database Shards: - horizontal partition of data in a DB, each partition is referred to as a shard - each shard acts as the single source for the subset of data

Answer 8

States that it is impossible for a distributed data store (NoSQL) to simultaneously provide more than two out of the following three: - Consistency (every read recieves the most recent write, or an error) - Availability (every request recieves a (non-error) response - without the guarantee that it contains the most recent write.) - Partition tolerance (the system continues to operate despite an arbitrary number of messages occurring in network between nodes. A system can sustain any amount of network failure that doesn't result in a failure of the entire network- individual nodes may fail but the DB will run). Tolerant to failure and reconfiguration.

Answer 9

CA: consistency and availability - all clients will have same view of data - each client can always read and write - the system may not have tolerance to failure and reconfiguration. - SQL systems --> eg. MySQL AP: availability and partition tolerant - each client will always be able to read and write - the system is fault tolerant to partitions - the clients will have inconsistent views of the data - Cassandra CP: consistency and partition tolerant - every user has the same view of the data - the system is fault tolerant to partitions - users may not always be able to access data - Mongo DB, HBase, Redis

Answer 10

The opposite of ACID transactions in relational databases. Basically Available: - one distributed system has failure parts but the total system is still working properly - system is guaranteed to be available in the event of a failure Soft-state: - guarantees consistency and durability in RDBMS, allows delays (for short periods of time) - the state of the data could change without application interactions due to eventual consistency. Eventually consistent: - rather than requiring consistency after every transaction, it is enough for the distributed database to eventually be in a consistent state. - data will be replicated to different nodes and will eventually reach a consistent state, but not guaranteed consistency on a transaction level.

Answer 11

- NoSQL DB - open-source - schema free - store data by keys and values (similar to JSON objects) --> high performance, as reduces models with high I/Os - does have hierarchical objects - nested DBs - rich query language --> supports CRUD operations - horizontal scalability - CP NoSQL: always consistent, partition tolerance, isn't always available - a record is a document (field and value pairs)

Answer 12

Create, read, update, delete

Answer 13

MongoDB, Cassandra, HBase

Answer 14

- column-oriented distributed database - open-source - high performance in managing very large amounts of data --> big data use case - scalable - fault-tolerant - AP NoSQL: eventually consistent, high degree of partition tolerance, always available

Answer 15

- Open-source Apache - Runs on HDFS - Java - CP NoSQL: consistent and partition tolerant, but not always available to users due to failures. - column-oriented NoSQL

Answer 16

SQL: - Pre-defined schema - Table-based databases - Strcutured query language for data manipulation and definition - best fit for heavy-duty transactional type applications - Follows ACID properties - Eg: MySQL, SQLite, Oracle NoSQL: - Dynamic schema for unstrctured data, no pre-defined schema - Queries focused on collection of documents - Best suited for hierarchical data storage - Follows CAP Theorem - Cassandra, HBase, Mongo DB, Redis

Answer 17

- read/write speeds of transitional DBs are not fast enough for modern use cases (such as session stores) - introducing new tables or modifying an existing schema can be extremely complex, makes adapting to applications new features very difficult. - limited no. of concurrent operations

Answer 18

- an in-memory data structure storage | - used as a distributed, in-memory key-value DB

Answer 19

- an in-memory data structure storage - supports large variety of data structures/types Used for: - used as a distributed, in-memory key-value DB, - cache, providing memory access where applications store common and repeatedly read objects in Redis. Caching makes data retrieval fast and limits DB server load. - session store (unique session for each user) instead of relying on DB - real-time analytics - metering service, helps manage the load on legacy servers during peak usage by rate-limiting no. calls applications make every few seconds

Answer 20

Ubiquitous, convenient, and on-demand access to a shared pool of computing resources and architecture that can be rapidly provisioned and released with minimal interaction and management effort with the service provider. Characteristics: - ubiquitous - on-demand/self-service - measured service - resilient --> via checkpoints, restarting, health checking and monitoring against specifications of behaviour - shared resource pool - rapid scalability and elasticity

Answer 21

1. Private - owned by single organisation - most expensive model - high level of data privacy and control over data and computational resources 2. Public - public access to a pool of shared computation resources - limited data control and privacy - cheapest model - owned and managed by third-party cloud provider - no need for high-level IT proficiency to use 3. Community - Like a public cloud but limited to a community to cloud consumers to share, with the management and expenses paid across the community members - it may be owned by the community members or by a third-party provider - higher amount of security than public cloud, less than private - useful for sharing data and computation resources across specified community or group - e.g. Government cloud for multiple departments to access 4. hybrid - any combination of two of the model types, most commonly private + public cloud - used to provide security for sensitive data and to share with only specified cloud users within a public cloud - con be complex and challenging to create and maintain due to disparity across the cloud environments

Answer 22

IaaS (Infrastructure) - consumer manages everything themself, full administrative control over cloud computational resources - user sets up and configured the bare infrastructure - data, OS, middleware, software, runtime - cloud provider provisions and manages the physical processing, storage, networking and hosting required PaaS (Platform) - consumer manages application and data - limited administrative control - user develops, tests, deploys, and manages cloud services and cloud-based solutions - provider provisions underlying infrastructure, middleware, and other IT resources required, as necessary. SaaS (Software) - access to front-end user-interface - usage and usage-related configuration control - provider manages, implements, and maintains cloud service

Answer 23

- public cloud environment that provides a certain level of isolation between the different users using the resources provisioned (IaaS) - isolation achieved through private IP subnet that secure channels for specific users to share resources between - VPC uses encryption to create a private network, within the public cloud, by using privately shared computing infrastructure. basically, an allocated private area for specified users to use within a public cloud, providing them with more resources. - achieves the benefits of both private and public cloud

Answer 24

AWS: - regional VCP network - regions contain multiple zones - however, the zones cannot communicate with one another through subnets (like GCP can), only one subnet per zone - subnets are bonded to specific zones (1:1) - need to specifically rout/configure between multiple subnets with tunnels for communications, which is very expensive - VPC is regional and needs extra settings to communicate across VPCs - VPCs are hierarchical with multiple layers of control at the region, zone, subnet and instance level. GCP: - VCP is used as a global resource - regions contain multiple zones - of which, the zones can communicate transparently through subnets within the same region - subnets:zones = Many:Many - relatively flat level of control

Answer 25

- A range of IP addresses within the network for private use - IP addresses are allocated to specific people, graned by the cloud providers or administration

Answer 26

- Matches subnets (private IP addresses) to public IP addresses for connections with the public internet - Private IPs: Public IPs = 1:M

Answer 27

- The process of distributing a set of tasks over a set of resources - used to be more efficient and reliable - optimizes response time and avoids unevenly overloading compute nodes, especially if others are left idle.

Answer 28

Round Robin - one of the simplest methods of distributing client requests across a group of servers - round-robin load balancer forwards a client request to each server in turn, looping through - suitable for: not identified hardware specifications between nodes, best for clusters containing servers with identical specifications - can result in overloading of an imbalanced cluster Weighted - same as round-robin LB in terms of cyclic distribution - but the node with higher specifications will be appointed the greater number of requests - can set up the LB with assigned weights to each node according to hardware specifications (higher specifications = higher weight). Least Connections - considers the number of current connections each server has when LBing - less connections, higher priority for assignment Weighted Least Connections - applies a weight component based on the computing capacities (hardware spec) of each server - load-balancer considers the weights of each server, and the number of clients currently connected to each server. Random - algorithm matches clients and services by random using a random number generator - in cases where the load balancer receives a large number of requests, this will be able to distribute the requests evenly to the nodes - like Round Robin, the Random algorithm is suitable for clusters consisting of nodes with similar configurations

Answer 29

Refers to the OSI (Open Systems Interconnect) model. Layer 7: - refers to the application layer of the model (highest level) - application layer is the HTTP (most dominant protocol) layer, on the Internet - Layer 7 LBs base their routing decisions on various characteristics of the HTTP header and on the actual contents of the message, such as the URL, data type, or information in a cookie - more expensive than Layer 4 but is more efficient + don't need to duplicate the same data on all of the load-balanced servers due to ability to determine what type of data (via URL) Layer 4: - refers to the TCP/UDP layer of the model (transport layer) - this layer is responsible for the transmission of data segments between points on a network - requires dedicated hardware - bases routing decisions on IP address and ports, which is information found in TCP stream

Answer 30

db. collection.insertOne() | db. collection.InsertMany()

Answer 31

db. collection.find() | db. collection.findOne()

Answer 32

db.collection.update()

Answer 33

db.collection.remove()