Weeks 1 - 6 Flashcards

1
Q

What is Docker? What are the motivations of using Docker technology?

A

Docker is a set of platform as a service (PaaS) products that use OS-level virtualisation to deliver software in packages called containers.
Motivations:
- portability and speed: enables you to separate applications from infrastructure so you can deliver software quickly. Reduces time spent on back-end development.
- light-weight: as it virtualises the application instead of the OS kernel AND application layers, which VMs do and are much slower as a result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Docker Image?

A

Lightweight, standalone, executable package of software that had everything you need to run an application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Docker Registry?

A

A stateless, highly scalable server-side application that stores and lets you distribute Docker images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Docker container?

A

A runnable instance of an image. You can modify images in a container environment, which is a sandbox environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Dockerfile?

A

A text file that contains a collection of instructions and commands for building a docker image and running as a container.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RUN vs CMD?

A

RUN and CMD are both Dockerfile instructions. RUN lets you execute commands inside of your Docker image. CMD lets you define a default command to run when your container starts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to reduce Image Size?

A
  • smaller image base
  • only add necessary dependencies
  • cleanup commands to remove no longer needed libraries and downloads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe Docker’s layer-wise architecture design?

A
  • refers to Docker images
  • Each layer is a filesystem and based on the previous image
  • each layer cannot be changed (read-only) after it has been constructed
  • layer-wise architecture makes reuse and customisation of images much easier (an add layers to existing images)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are microservices?

A

An architectural and organisational approach to software development
where software is comprised of small independent services that communicate over well-defines APIs.
(Like splitting up a big program into specific services and functions, basic units of the service that cannot be further divided).
(extra) As opposed to monolithic architecture, microservices make up the business logic and data access layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pros and cons to microservices?

A

Pros:

  • technological freedom, language independent
  • easy deployment - usable code
  • agility - small teams can work on each microservice –> which can be a problem in monolithic architecture for teams
  • resilience
  • scalable

Cons:

  • infrastructure overhead - servers and database usage
  • complicated networking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Docker Compose?

A

A tool for defining and running multi-container Docker applications and running them as a single service. Used to configure applciation’s services. Single command to create a start all services (running containers) from configuration. Each container runs in isolation but can interact with each other when required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Benefits of Docker Compose?

A
  • single host deployment - ie. you can run everything on a single piece of hardware
  • quick and easy configuration
  • high productivity - reduces time it takes to perform tasks
  • security - containers are isolated from each other, reducing the threat landscape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Key features of Docker Swarm

A
  • creates multiple containers on multiple hosts (unlike M:1 for compose)
  • decentralised design - easy for teams to manage and access the environment
  • scalable - can scale up or down as you wish, can decide on number of tasks you want the swarm to complete, swarm master will automatically adjust
  • load balancing- specification of how to distribute service containers between nodes
  • highly secure - each node enforces transport layer security (TLS) mutual authentication and encryption to secure communications between itself an other nodes
  • rolling updates - swarm manager lets you control delay between service deployment to different sets of nodes. Any failures occur, you can roll bac to a previous version of the service.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an orchestrator? Give two examples

A

Automated configuration, management, and coordination of computer systems, applications, and services.
Examples: Docker Swarm and Kubernetes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Docker Machine?

A

ALlows you to provision Docker machines in a variety of environments - VMs either on local systems or on cloud provider systems, and physical computers.
Used to set up as many hosts as desired, local and remote hosts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Kubernetes?

A

Provides automatic deployment, scaling and mangement of containerised applications across multiple hosts (a cluster). It is a container orchestration system.
Follows the replica architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Benefits of Kubernetes?

A
  • Automated rollouts and rollbacks (resilience)
  • Storage orchestration
  • Self-healing
  • Load balancing
  • Horizontal scaling –> automated to create new containers, remove containers, reallocation of resources, etc.
    note: The automatic creation and deletion of containers achcives rollbacks/rollouts, self-healing and, scaling operations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe the Kubernetes Master Node?

A

Master node (control panel/plane), should have multiple masters. Contorls everything. Scheduler watches for unassigned tasks, and assigns them to available resources matching specific requirements.
Detects, and responds to cluster events.
Controller manager controls the nodes (replicaSet controller, endpoint controller, namespace controller).
All connected through an API server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Kubernetes architecture?

A
  • Multiple master nodes ideally
  • Worker nodes controlled by master
    Master node contains:
  • kube-scheduler, monitors new Pods with no assigned node, and selects a node for them to run on.
  • kube-controller-manager, runs controller processes which are separately running for different purposes (node control, job control, enpoints control, token and service account control)
  • kube-API server
  • etcd - consistent and highly-available key value store used as K8s’ store for cluster data
  • cloud-controller-manager, integrates cloud-specific control and config to link the cluster into cloud provider’s API. Not needed for on-premises clusters.

Each worker node contains:

  • Kubelet (the brain of each worker node), which registers the node with the cluster, watches the API server (to execute task and maintain reporting channel), and reports task failure.
  • Kube-proxy (network proxy maintinas network rules and communication between nodes)
  • Container runtime (performs container-related tasks)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Kube-proxy?

A

Kube-proxy is responsible for local cluster networking, makes sure each worker node gets it own IP address, handles routing and load-balancing.
Is a part of the Kubernetes architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a Kubernetes Pod?

A
  • Container (Docker) = Pod (Kubernetes)
  • Sandbox enviornment for hosting containers.
  • Containers must always run inside of Pods.
  • Mutiple containers in a Pod share the same Pod environment (networking, unique cluster IP address, storage (called ‘volumes’), container information).
  • pod must be ran on a node, which the kube-Scheduler decides on
  • Generally, one container per pod to keep things clean and easy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Declerative model for K8?

A

It is like the wishlist, or desired state, for an image to be (like a docker-compose file). It works by declaring the desired state of a microservice in a manifest file (remember MF), which is posted to the API server. This is stored in the ETCD and Kubernetes will implement the desired state to the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Imperative vs. Declarative model in K8?

A

Imperative –> how you get what you want.
Declerative –> what you want.
Imperative is a long list of platform-sepcific commands, whereas declarative is a simpler, more concise file. Declarative is ideal as is enables scalablility, self healing, version control, by telling clsuter how things should look.

24
Q

Describe the Pod scheduling sequence / Kubernetes Workflow

A

Control panel/master node:

  1. Request comes into the API server
  2. API stores request in the data store (etcd)
  3. Controller creates new Pod for the requested task.
  4. Scheduler watches for unassigned Pods and assigns Pods to worker nodes.

Worker nodes:

  1. Kubelet watches for Pod assignments. Receives Pod assignment and deploys requested containers via container runtime (e.g. Docker)
  2. Kube-proxy manages communication between Pods.
  3. Kubelet updates Pod status to API server.

Master/Control Panel:
8. Removes request from etcd storage.

25
Q

Describe what the kube-controller-manager does?

A

Unsure of if this is needed to know or not.

  • Node controller: responsible for noticing and responding when nodes go down
  • job controller: wacthes for job objects that represent one-off tasks, then creates Pods to run those tasks to until completion
  • Endpoints controller: populates the enpoints object
  • service account and token controllers:create default accounts and API access tokens for new namespaces
26
Q

What is a Kubelet and where does it sit in the architecture?

A

Each worker node has a Kubelet (the brain of each worker node), which registers the node with the cluster, watches the API server (to execute task and maintain reporting channel), and reports task failure.

27
Q

What is a ReplicaSet? What is a ReplicaSet Controller’s purpose?

A

The specified number of replicas of a service.

A replicaSet controller is used to manage the status of pods to provide self-healing and scaling.

28
Q

What is ACID? Transaction Properties

A

ACID properies ensure a database maintains data integrity and consistency for a DBMS.

Atomicity - all parts of a transaction be completed successfully otherwise, the transaction is aborted (never partially execture, done or not done)
Consistency - concurrent execution of transactions, yields consistent results. If not consistent, roll back (to consistency again).
Isolation - data used during one transaction cannot be used by a second until the first is completed.
Durability - ensures that the result or effect of a committed transaction persists in case of a system failure.

29
Q

Key differences between SQL and NoSQL?

A
  • SQL is table-based DB, NoSQL is document-based, key-value pairs, graph databases or wide-column bases DB
  • SQL has a predefined schema, NoSQL has a dynamic schema for unstructured data
  • SQL is best fit for heavy-duty transactional type application
  • NoSQL is best fit for hierarchical data storage
  • SQL follows ACID properties
  • NoSQL follows CAP theorem
30
Q

What is NoSQL? What are some benefits?

A

NoSQL:

  • A class of DBMS
  • does not use SQL as querying language
  • Distirbuted, fault-tolerant architecture
  • No fixed or formal schema
  • No joins

Benefits:

  • flexible, support large numbers of concurrent users, supports unstructured and semistructured data
  • scalable
  • high performance, rare to see downtime
  • rapidly adapt to changing requirements
  • can store Big Data and Meta Data
31
Q

What are the types of NoSQl Databases? 4 types.

A

Column stores:

  • data stored by columns rather than rows.
  • Helpful as row-based systems are not efficient for column-wide operations.
  • Columnar store can be more efficiently accessed for some particular operations. great for Big Data process
  • offers high performance
  • highly scalable

Graph databases:

  • nodes and relationships: node = entity (like a user, category, or piece of data), relationship = two nodes association (like friendship, works for, etc.)
  • useful for network data
  • models complex relationships and supports graph-based algorithms

Key-value stores:

  • value can be different types
  • key is string
  • useful for frequent I/O operations in simple data model (shopping carts, mobile apps, etc.)

Document-based

  • designed to store and query data as JSON-like documents
  • easier for developers to store and query data in a database by using the same document-model format they use in their application code
  • works well with cases like catalogs, user profiles, and content management systems where each document is unique and evolves over time

Database Shards:

  • horizontal partition of data in a DB, each partition is referred to as a shard
  • each shard acts as the single source for the subset of data
32
Q

What is CAP Theorem?

A

States that it is impossible for a distributed data store (NoSQL) to simultaneously provide more than two out of the following three:

  • Consistency (every read recieves the most recent write, or an error)
  • Availability (every request recieves a (non-error) response - without the guarantee that it contains the most recent write.)
  • Partition tolerance (the system continues to operate despite an arbitrary number of messages occurring in network between nodes. A system can sustain any amount of network failure that doesn’t result in a failure of the entire network- individual nodes may fail but the DB will run). Tolerant to failure and reconfiguration.
33
Q

Describe the NoSQL Implementation options: CA, AP, CP.

Provide examples.

A

CA: consistency and availability
- all clients will have same view of data
- each client can always read and write
- the system may not have tolerance to failure and reconfiguration.
- SQL systems –> eg. MySQL
AP: availability and partition tolerant
- each client will always be able to read and write
- the system is fault tolerant to partitions
- the clients will have inconsistent views of the data
- Cassandra
CP: consistency and partition tolerant
- every user has the same view of the data
- the system is fault tolerant to partitions
- users may not always be able to access data
- Mongo DB, HBase, Redis

34
Q

What is BASE (ACID alternative)

A

The opposite of ACID transactions in relational databases.
Basically Available:
- one distributed system has failure parts but the total system is still working properly
- system is guaranteed to be available in the event of a failure

Soft-state:

  • guarantees consistency and durability in RDBMS, allows delays (for short periods of time)
  • the state of the data could change without application interactions due to eventual consistency.

Eventually consistent:

  • rather than requiring consistency after every transaction, it is enough for the distributed database to eventually be in a consistent state.
  • data will be replicated to different nodes and will eventually reach a consistent state, but not guaranteed consistency on a transaction level.
35
Q

Key features of MongoDB

A
  • NoSQL DB
  • open-source
  • schema free
  • store data by keys and values (similar to JSON objects) –> high performance, as reduces models with high I/Os
  • does have hierarchical objects - nested DBs
  • rich query language –> supports CRUD operations
  • horizontal scalability
  • CP NoSQL: always consistent, partition tolerance, isn’t always available
  • a record is a document (field and value pairs)
36
Q

What are CRUD operations?

A

Create, read, update, delete

37
Q

Three examples of NoSQL DBs?

A

MongoDB, Cassandra, HBase

38
Q

Key features of Cassandra DB

A
  • column-oriented distributed database
  • open-source
  • high performance in managing very large amounts of data –> big data use case
  • scalable
  • fault-tolerant
  • AP NoSQL: eventually consistent, high degree of partition tolerance, always available
39
Q

Key features of HBase?

A
  • Open-source Apache
  • Runs on HDFS
  • Java
  • CP NoSQL: consistent and partition tolerant, but not always available to users due to failures.
  • column-oriented NoSQL
40
Q

Key differences between SQL and NoSQL? Give three examples of each.

A

SQL:

  • Pre-defined schema
  • Table-based databases
  • Strcutured query language for data manipulation and definition
  • best fit for heavy-duty transactional type applications
  • Follows ACID properties
  • Eg: MySQL, SQLite, Oracle

NoSQL:

  • Dynamic schema for unstrctured data, no pre-defined schema
  • Queries focused on collection of documents
  • Best suited for hierarchical data storage
  • Follows CAP Theorem
  • Cassandra, HBase, Mongo DB, Redis
41
Q

Why is it difficult for traditional SQL architecture to accommodate to modern application needs?

A
  • read/write speeds of transitional DBs are not fast enough for modern use cases (such as session stores)
  • introducing new tables or modifying an existing schema can be extremely complex, makes adapting to applications new features very difficult.
  • limited no. of concurrent operations
42
Q

What is Redis? What is it used for?

A
  • an in-memory data structure storage

- used as a distributed, in-memory key-value DB

43
Q

What is Redis? What is it used for?

A
  • an in-memory data structure storage
  • supports large variety of data structures/types

Used for:

  • used as a distributed, in-memory key-value DB,
  • cache, providing memory access where applications store common and repeatedly read objects in Redis. Caching makes data retrieval fast and limits DB server load.
  • session store (unique session for each user) instead of relying on DB
  • real-time analytics
  • metering service, helps manage the load on legacy servers during peak usage by rate-limiting no. calls applications make every few seconds
44
Q

What is cloud computing?

Name some of its characteristics.

A

Ubiquitous, convenient, and on-demand access to a shared pool of computing resources and architecture that can be rapidly provisioned and released with minimal interaction and management effort with the service provider.

Characteristics:

  • ubiquitous
  • on-demand/self-service
  • measured service
  • resilient –> via checkpoints, restarting, health checking and monitoring against specifications of behaviour
  • shared resource pool
  • rapid scalability and elasticity
45
Q

4 types of deployment models for cloud environments

A
  1. Private
    - owned by single organisation
    - most expensive model
    - high level of data privacy and control over data and computational resources
  2. Public
    - public access to a pool of shared computation resources
    - limited data control and privacy
    - cheapest model
    - owned and managed by third-party cloud provider
    - no need for high-level IT proficiency to use
  3. Community
    - Like a public cloud but limited to a community to cloud consumers to share, with the management and expenses paid across the community members
    - it may be owned by the community members or by a third-party provider
    - higher amount of security than public cloud, less than private
    - useful for sharing data and computation resources across specified community or group
    - e.g. Government cloud for multiple departments to access
  4. hybrid
    - any combination of two of the model types, most commonly private + public cloud
    - used to provide security for sensitive data and to share with only specified cloud users within a public cloud
    - con be complex and challenging to create and maintain due to disparity across the cloud environments
46
Q

Describe the three cloud delivery models. - haven’t finished this card yet

A

IaaS (Infrastructure)

  • consumer manages everything themself, full administrative control over cloud computational resources
  • user sets up and configured the bare infrastructure
  • data, OS, middleware, software, runtime
  • cloud provider provisions and manages the physical processing, storage, networking and hosting required

PaaS (Platform)

  • consumer manages application and data
  • limited administrative control
  • user develops, tests, deploys, and manages cloud services and cloud-based solutions
  • provider provisions underlying infrastructure, middleware, and other IT resources required, as necessary.

SaaS (Software)

  • access to front-end user-interface
  • usage and usage-related configuration control
  • provider manages, implements, and maintains cloud service
47
Q

What is a Virtual Private Cloud (VPC) network?

A
  • public cloud environment that provides a certain level of isolation between the different users using the resources provisioned (IaaS)
  • isolation achieved through private IP subnet that secure channels for specific users to share resources between
  • VPC uses encryption to create a private network, within the public cloud, by using privately shared computing infrastructure. basically, an allocated private area for specified users to use within a public cloud, providing them with more resources.
  • achieves the benefits of both private and public cloud
48
Q

Discuss the differences between VPCs in AWS and GCP?

A

AWS:

  • regional VCP network
  • regions contain multiple zones
  • however, the zones cannot communicate with one another through subnets (like GCP can), only one subnet per zone - subnets are bonded to specific zones (1:1)
  • need to specifically rout/configure between multiple subnets with tunnels for communications, which is very expensive
  • VPC is regional and needs extra settings to communicate across VPCs
  • VPCs are hierarchical with multiple layers of control at the region, zone, subnet and instance level.

GCP:

  • VCP is used as a global resource
  • regions contain multiple zones
  • of which, the zones can communicate transparently through subnets within the same region
  • subnets:zones = Many:Many
  • relatively flat level of control
49
Q

What is a subnet?

A
  • A range of IP addresses within the network for private use
  • IP addresses are allocated to specific people, graned by the cloud providers or administration
50
Q

What is NAT used for? Relates to VCPs.

A
  • Matches subnets (private IP addresses) to public IP addresses for connections with the public internet
  • Private IPs: Public IPs = 1:M
51
Q

What is load balancing?

A
  • The process of distributing a set of tasks over a set of resources
  • used to be more efficient and reliable
  • optimizes response time and avoids unevenly overloading compute nodes, especially if others are left idle.
52
Q

What are the Load Balancing algorithms? Discuss.

A

Round Robin

  • one of the simplest methods of distributing client requests across a group of servers
  • round-robin load balancer forwards a client request to each server in turn, looping through
  • suitable for: not identified hardware specifications between nodes, best for clusters containing servers with identical specifications
  • can result in overloading of an imbalanced cluster

Weighted

  • same as round-robin LB in terms of cyclic distribution
  • but the node with higher specifications will be appointed the greater number of requests
  • can set up the LB with assigned weights to each node according to hardware specifications (higher specifications = higher weight).

Least Connections

  • considers the number of current connections each server has when LBing
  • less connections, higher priority for assignment

Weighted Least Connections

  • applies a weight component based on the computing capacities (hardware spec) of each server
  • load-balancer considers the weights of each server, and the number of clients currently connected to each server.

Random

  • algorithm matches clients and services by random using a random number generator
  • in cases where the load balancer receives a large number of requests, this will be able to distribute the requests evenly to the nodes
  • like Round Robin, the Random algorithm is suitable for clusters consisting of nodes with similar configurations
53
Q

Discuss Layer 4 and Layer 7 load balancers.

A

Refers to the OSI (Open Systems Interconnect) model.
Layer 7:
- refers to the application layer of the model (highest level)
- application layer is the HTTP (most dominant protocol) layer, on the Internet
- Layer 7 LBs base their routing decisions on various characteristics of the HTTP header and on the actual contents of the message, such as the URL, data type, or information in a cookie
- more expensive than Layer 4 but is more efficient + don’t need to duplicate the same data on all of the load-balanced servers due to ability to determine what type of data (via URL)

Layer 4:

  • refers to the TCP/UDP layer of the model (transport layer)
  • this layer is responsible for the transmission of data segments between points on a network
  • requires dedicated hardware
  • bases routing decisions on IP address and ports, which is information found in TCP stream
54
Q

How to create using CRUD operations?

A

db. collection.insertOne()

db. collection.InsertMany()

55
Q

How to read in CRUD operations?

A

db. collection.find()

db. collection.findOne()

56
Q

How to update in CRUD operations?

A

db.collection.update()

57
Q

How to delete in CRUD operations?

A

db.collection.remove()