General Flashcards

1
Q

Encapsulation

A

Encapsulation is the mechanism of binding the data together and hiding it from the outside world. Encapsulation is achieved when each object keeps its state private so that other objects don’t have direct access to its state. Instead, they can access this state only through a set of public functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Abstraction

A

Abstraction can be thought of as the natural extension of encapsulation. It means hiding all but the relevant data about an object in order to reduce the complexity of the system. Abstraction helps by hiding internal implementation details of objects and only revealing operations that are relevant to other objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Polymorphism

A

Polymorphism is the ability of an object to take different forms and thus, depending upon the context, to respond to the same message in different ways. Take the example of a chess game; a chess piece can take many forms, like bishop, castle, or knight and all these pieces will respond differently to the ‘move’ message.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do i write clean code?

A
  • The logic should be straightforward to make it hard for bugs to hide
  • The dependencies minimal to ease maintenance
  • Performance close to optimal so people aren’t tempted to turn it into a mess
  • Clean code does one thing well
  • Bad code tries to do too much. Clean code is focused
  • Clean code is simple, orderly, and can be read easily
  • No duplication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The boy scout rule

A
  • Leave the campground cleaner than you found it
  • Cleaning things up means it’s much more difficult for the code to rot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does good code rot?

A
  • The requirements changed in ways that thwart the original design
  • Schedules were too tight to do things right
  • Stupid managers and intolerant customers
  • At the end of the day, most falls on the programmer. It is unprofessional to bend to the will of managers who don’t understand the risks of making messes
  • Would a doctor listen to a patient to stop washing their hands? No, the patient doesn’t know about the risks of diseases and infection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

serialization (encoding)

A
  • The translation from the in-memory representation to a byte sequence
  • Python object to JSON string
  • JSON and XML are standardized serialization/encodings that can be read by many programming languages. They are textual formats
  • Big downside is that speed can be quite slow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

deserialization (decoding)

A
  • JSON string to python object
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Binary serialization (encoding)

A
  • Both JSON and XML use a lot of space compared to binary formats
  • As such binary encoding can be much faster but comes at the expense of not being human readable. Though probably OK for internal data
  • Examples of binary encodings for JSON: MessagePack, BSON, BJSON, Apache Thrift and Protocol Buffers (protobuf), and Apache Avro
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Load Balancers

A
  • Different load balancing methods (algorithms)
    • Round robin
      • Requests to the application servers are distributed in a round-robin fashion
      • Each server is assigned an equal portion of the traffic and in circular order
    • Least connected
      • next request is assigned to the server with the least number of active connections
    • IP-hash
      • A hash function is used to determine what server should be selected for the next request based on the client’s IP address’s hash value
      • Maintains session persistence
  • Overall goal is to distribute incoming requests to ensure high availability, reliability, and performance by avoiding overloading a single server
  • Session persistence. Ensure subsequent requests from the same client are directed to the same backend server
  • They also support SSL/TLS termination which offloads this burden from the backend servers
  • Improve perf: reduces load on any individual server
  • Ensure high availability: eliminates single point of failure servers
  • Scalability: easily scale infra as demand increases
  • Stateful load balancing
    • Source IP affinity: Assigns a client to a specific server based on the clients IP address
    • Session affinity: Allocates a client to a specific server based on a session identifier such as a cookie or URL parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

API Gateway

A
  • Design for routing. Receive requests from clients and route them to the appropriate micro service. Clients can access a variety of services through a single entrypoint
  • Rate limiting
  • Caching
  • Authentication and Authorization
  • Load balancing
  • Transformation. For example convert from XML to JSON
  • Can incorporate a Web Application Firewall (WAF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Linear and non linear data structures

A

Linear
Direct access: array, matrix
Sequential access: LL, stack, queue

Non linear
Hierarchy: tree, trie, heap
Unorder: Hash table, set, graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Arrays

A

Search unordered: O(n) linear
Search ordered: O(log n) can use binary search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Matrix

A
  • The primary array holds all the ROWS while each element in the subarray is a column value
  • Multiple list comprehension

[<return> <outer> <inner> <inner> ... <option>]</option></inner></inner></outer></return>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

TCP vs UDP

A

TCP
- Ensures reliable, ordered, and error-checked delivery of bytes between apps
- Retransmits lost or corrupted packets
- Establishes a connection/handshake before sending
- Used when data accuracy is more critical than speed

UDP
- Connectionless
- Sends messages called datagrams without establishing a connection
- Does not guarantee reliability or order
- Low overhead and fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DNS Resolver

A

DNS resolvers are usually provided by internet service providers (ISPs) or other organizations. They act as intermediaries between users and DNS servers, receiving DNS queries from users and sending them to the appropriate DNS servers to be resolved. Once the resolver receives the answer, it caches the information and returns it to the user.

17
Q

DNS Resolution

A

DNS resolution is the process of converting a domain name into its corresponding IP address. There are two types of DNS queries involved in this process: recursive and iterative queries.

  • Recursive query: In a recursive query, the DNS resolver asks for the complete answer to a query from the DNS server. If the server has the answer, it responds with the required information. If not, the server takes responsibility for contacting other DNS servers to find the answer and then returns it to the resolver. Recursive queries put more responsibility on the DNS server to find the requested information.
  • Iterative query: In an iterative query, the DNS resolver asks the DNS server for the best answer it has at the moment. If the server doesn’t have the complete answer, it responds with a referral to another server that might have more information. The resolver then contacts that server with a new iterative query, repeating the process until it finds the complete answer. In iterative queries, the resolver takes on more responsibility for finding the requested information.

To speed up the DNS resolution process, resolvers and servers cache the results of previous queries. When a resolver receives a query, it first checks its cache to see if the answer is already available. If it finds the cached information, it returns the answer without contacting other servers, saving time and reducing network traffic.

Each DNS record has an associated Time To Live (TTL) value, which specifies how long the record should be stored in the cache. TTL is measured in seconds, and once the TTL expires, the cached information is removed to ensure that outdated information is not used

18
Q

Hashing

A

Takes an input (a message or key) and returns a fixed-size string which looks random. This output is the hash value.

The hash function converts the input data (like a book title) into a fixed length value

Examples:
- Quick data retrieval
- Data integrity Checks: Can hash the value for that file and even if a tiny portion changes the hash value will be different
- Password security
- Hash tables. Uses a hash function to compute an index where the desired value is stored
- Data deduplication: If you don’t want duplicates, just save the hash as opposed to all of the data
- Load balancing

19
Q

Clean Code - Meaningful Names

A
  • Classes should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info. A class should not be a verb
  • Methods should have verb or verb phrase names like postPayment, deletePage, or save
  • Accessors, mutators, and predicates should be prefixed with get, set, and is
  • Shorter names are generally better than longer ones, so long as they are clear. Add no more context to a name than is necessary
20
Q

Clean Code - Functions

A
  • The first rule of functions is that they should be small. Very small.
  • Functions should do one thing. They should do it well. They should do it only.
  • A way to know if a function is doing more than “one thing” is if you can extract another function from it with a name that is not merely a restatement of its implementation
  • We want the code to read like a top-down narrative. We want every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions. Don’t mix multiple levels of abstraction in a single function:

High level of abstraction: getHtml()
Intermediate: String pagePathName = PathParser.render(pagePath)
low: .append(‘\n’)

  • You know you’re working on clean code when each routine turns out to be pretty much what you expected
  • Boolean parameters are ugly. They loudly proclaim that this function does more than one thing
  • Have no side effects: Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will make unexpected changes to the variables of its own class. They are devious and damaging mistruths that often result in strange temporal couplings and order dependencies. Great example is a “checkPassword” function that initializes a session if the password is correct… that’s a side effect!
21
Q

Heap

A
  • Data structure based off of binary trees. They have two types

1) Max heap: every parent node has a value >= children
2) Min heap: every parent node has a value <= children

This ordering property allows for efficient operations like finding max or min values

  • Heaps are used in priority queues. Removing elements based on priority instead of insertion order

Find min/max O(1)
Insert O(log n)
Remove O(log n)
Heapify (create heap from array) O(n)

22
Q

Clean Code - Comments

A
  • As code refactors, comments often get separated from the code as things get shifted around. The comments don’t always follow the code and they often become orphaned blurbs of decreasing accuracy
  • Truth can only be found in one place, the code
  • Examples of when it can be good to use a comment: explanation of intent, warning of consequences, amplification
  • Don’t use a comment when you can use a function or variable name
23
Q

Generator Functions - Python Morsels

A
  • A generator function is a function with a yield statement in it
  • When you call a generator function it does not run the function
  • It gives you a generator object
  • If you loop over the generator object, that will run the function until a yield statement is hit and the generator object will put itself on pause, yielding the next item, over and over until the generator object is done and you have consumed all the items within it
24
Q

Iterator - Python Morsels

A

If you want a class to be iterable it needs to implement iter and next

class MyIterable

    def \_\_iter\_\_(self):
        return self

    def \_\_next\_\_(self):
     # must raise StopIteration or can use yield
  • Iterator: the object which actually performs iteration over the iterable
  • You cannot call next() on lists or other sequences. They are not iterators
  • You can only call next() on iterators. It’s signature is literally next(iterator, default)
  • Call iter(my_list) to get an iterator
  • Note, iterators get exhausted just like generators. A generator is a special type of iterator which lazily yields each item
  • Iterators are iterables: they return themselves when passed to iter
  • The iter() function relies on __iter__ as implemented in the class. This simply returns an iterator
  • The next() function relies on __next__ as implemented in the class. This returns the next val in the sequence
25
Q

CAP Theorem

A
  • Generally, it’s an abstract framework for understanding the trade-offs between three essential properties of distributed systems: consistency, availability, and partition tolerance
  • States it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance

Consistency:
- All clients see the same data
- Sacrifies availability
- Ramification is returning an error to the client

Availability:
- Any client which requests data gets a response even if some of the nodes are down
- Sacrifies consistency
- Ramification is continue to allow reads but stale data may be returned

Partition tolerance:
- A partition indicates a communication break between two nodes. Partition tolerance means the system continues to operate despite network partitions.

26
Q

gRPC

A
  • Open source remote procedural calls
  • Built on top of dependent on HTTP/2
  • Let you write software as if you were running the functional call locally
  • They use IDL (Interface Definition Language) to create a contract on the data types and methods to be invoked
  • gRPC uses Protocol Buffers (protobuf) for serialization and deserialization. Data is translated into binary which has a smaller footprint and is therefore faster
  • Streaming: Allows gRPC to execute several processes inside a single request. This is possible through HTTP/2 multiplexing feature. Supports serverside, clientside, and bidirectional streaming
  • Code Generation: Client and Server code is generated from the .proto file. Defines data formats/types and application endpoints
  • gRPC heavily uses HTTP/2 so you cannot call a gRPC service from a web browser like you would with REST
  • gRPC internals and communication between 2 micro services
  • The service that provides the service is the gRPC server
  • Thee other service which requires data from the providers is the gRPC consumer
  • A gRPC server can also be a consumer when it needs data from another microservice
    1) Service Definition, eg .proto file. Defines contract between client and server
    2) gRPC Server stubs. Also runs gRPC server to handle client calls
    3) gRPC Client Stubs. Converts code to remote function invocation

gRPC use cases
- Designed for latency and high throughput communication. Works very well with microservices
- Point to point real time communication. Supports bi-directional streaming. gRPC services can push messages in real-time without polling
- Messages are serialized with Protobuf, a lightweight message format

27
Q

Name different types of Caching

A

Different types of caching:
- In memory. Faster than fetching from disk
- Disk caching. Faster than retrieving from remote source
- CDN caching for faster retrieval of static content
- Client caching (eg browser)
- DNS cache for faster domain resolution

28
Q

Data Partitioning

A
  • Technique in distributed systems and DBs to divide a large dataset into smaller parts referred to as partitions
  • Each partition is assigned to a separate node
  • Improves the performance and scalability of large-scale data processing applications, as it allows processing to be distributed across multiple nodes
  • Also workload can be balanced and handle more requests and process data efficiently

Partition: A smaller, more manageable part of a larger dataset, created as a result of data partitioning

Partition key: A data attribute used to determine how data is distributed across partitions. An effective partition key should provide an even distribution of data and support efficient query patterns

Shard: Often used interchangeably with a partition, particularly in the context of horizontal partitioning

1) Horizontal Partitioning (also known as sharding)
- Each shard contains a subset of the rows
- Each shard is typically assigned to a different server which allows for parallel processing and faster query execution times
2) Vertical Partitioning
- Each partition contains a subset of the columns
- Optimizes performance by reducing the amount of data that needs to be scanned

1) Range based sharding
- Data is divided based on a specific range of values for a given partition key. Example: order dates, IDs
2) Hash based sharding
- Applying a consistent hash function to the partitioning key
- Particularly useful when key has a large number of distinct values or is not easily divided into ranges.
- Example: shared based on User IDs
3) Directory based sharding
- Use a custom lookup table to map each data entry to a specific shard
- Greater flexibility but introduces a layer of complexity as the directory must be maintained
4) Geographical sharding
- Shard by US State, Country, Zip

Benefits of Data Partitioning
- Improved query performance
- Enhanced scalability
- Load balancing. Helps distributed the workload evenly
- Data isolation
- Parallel processing
- Storage efficiency
- Faster data recovery

Problems of Data Partitioning
- Complexity
- Data skew. Uneven data distribution across partitions
- Cross partition queries. When queries need to access data across multiple partitions, performance can suffer as the system must search and aggregate data from several partitions

29
Q

SQL vs No-SQL

A

Common Characteristics of NoSQL DBs
* Not using the relational model
* Running well on clusters and sharded. Horizontally scalable
* You need to store a massive amount of data
* Most NoSQL stores don’t support joins
* Schema-less (dynamic) design which allows for greater flexibility of data
* Perform well under specific workloads such as as high write loads or large scale data storage and retrieval

Types of NoSQL Storage Models
* KV. Excel at high write and read for simple data models like session management and real-time analytics. Examples: DynamoDB, Riak
* In memory KV. Excel at low latency. Examples: Redis, Memcache
* Document. Keys and Values are stored in documents written into JSON. Each document can contain nested fields and complex data structures. Examples: Elasticsearch, Mongo, CouchDB
* Graph. Maps relationships between nodes and edges
* Columnar. Primary use case is large scale analytics. Examples: Cassandra, Vertica, RedShift
* Time series. Store data in time-ordered streams, sorted by timestamps. Examples: Graphite, Prometheus, AWS Timestream

Common Characteristics of SQL
* Relational are ACID compliant which provides a high level of safe guarantees, reliable transactions, and consistency of the data (these properties guarantee that any operation on the data will either be completed in its entirety or not at all).
* Most NoSQL sacrifice ACID compliance for availability, performance, and scalability

30
Q

Graph Search - Breadth-first search (BFS)

A

Searches a tree data structure one level of depth at a time

Means we explore all of a node’s neighbors before exploring any children

Uses a queue

Common applications of BFS is path finding

31
Q

Graph Search - Depth-first search (DFS)

A

Traverse as far as possible along each branch before backtracking, exploring until we reach a node without edges or a node that we’ve previously visited

DFS uses a stack rather than a queue to track locations to search next

Common applications are topological sorting

32
Q

Quorum

A

In a distributed environment, a quorum is the minimum number of servers on which a distributed operation needs to be performed successfully before declaring the operation’s overall success

It enforces the consistency requirement needed for distributed operations

What value should we choose for a quorum? Majority

33
Q

Microservices

A

An architecture that structures an application as a collection of loosely couple services

Each service is independently deployable

Pros
- Each microservice can be independently scaled
- Flexible. Each microservice can be developed, deployed, and updated independently leading to faster iteration
- Resilient. Failure in one microservice doesn’t necessarily impact the broader system
- Technology diversity
- Autonomy

Cons
- Complexity. Numerous systems, cognitive overload, monitoring, deploys. logging
- Latency. Communication between microservices over a network can introduce latency
- Data management and schema migrations
- Deployment overhead
- Operational overhead