interview fix Flashcards

1
Q

Amazon Elastic Block Store

A

Amazon Elastic Block Store (EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale. A broad range of workloads, such as relational and non-relational databases, enterprise applications, containerized applications, big data analytics engines, file systems, and media workflows are widely deployed on Amazon EBS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Application-Layer Attacks

A

The application layer is the topmost layer of the OSI network model and the one closest to the user?s interaction with the system. Attacks that make use of the application layer focus primarily on direct Web traffic. Potential avenues include HTTP, HTTPS, DNS, or SMTP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Containerization

A

Containerization is defined as a form of operating system virtualization, through which applications are run in isolated user spaces called containers, all using the same shared operating system (OS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Availability vs. Durability

A

Availability and durability are two very different aspects of data accessibility. Availability refers to system uptime, i.e. the storage system is operational and can deliver data upon request. Historically, this has been achieved through hardware redundancy so that if any component fails, access to data will prevail. Durability, on the other hand, refers to long-term data protection, i.e. the stored data does not suffer from bit rot, degradation or other corruption. Rather than focusing on hardware redundancy, it is concerned with data redundancy so that data is never lost or compromised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DIFFERENCE BETWEEN STORAGE TYPES

A

File storage: Economical and easily structured, data are saved in files and folders. They are usually found on hard drives, which means that they appear exactly the same for the user and on the hard drive.

Block storage: Data are stored in blocks of uniform size. Although more expensive, complex, and less scalable, block storage is ideal for data that needs to be accessed and modified frequently.

Object storage: Data is stored as objects with unique metadata and identifiers. Although, in general, this type of storage is less expensive, objects? storage is only ideal for data that does not require modification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Encryption at rest vs in transit

A

At rest : This kind of data is typically in a stable state: it is not traveling within the system or network, and it is not being acted upon by any application or third-party. It?s something that has reached a destination, at least temporarily.

In transit : Data that is going through a system or network, this data can be encrypted using https for example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

IDS

A

An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. With the cloud, the collection and aggregation of account and network activities is simplified, but it can be time consuming for security teams to continuously analyze event log data for potential threats. With GuardDuty, you now have an intelligent and cost-effective option for continuous threat detection in AWS. The service uses machine learning, anomaly detection, and integrated threat intelligence to identify and prioritize potential threats. GuardDuty analyzes tens of billions of events across multiple AWS data sources, such as AWS CloudTrail event logs, Amazon VPC Flow Logs, and DNS logs. With a few clicks in the AWS Management Console, GuardDuty can be enabled with no software or hardware to deploy or maintain. By integrating with Amazon CloudWatch Events, GuardDuty alerts are actionable, easy to aggregate across multiple accounts, and straightforward to push into existing event management and workflow systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you need to construct a 3-tier layer of storage, how can you divide where you store each file?

A

The answer is : you would use lifecycle management. Most accessed files are in S3, less S3 Standard infrequent access, rarely accessed files in Amazon s3 glacier extremly rare in S3 glacier deep archive. Non AWS answer: SSD for fast access, SSHD for less, HDD 7200RPM for rarely, HDD 5400RPM for extremely rareA traditional IDS typically relies on monitoring network traffic at specific network traffic control points, like firewalls and host network interfaces. This allows the IDS to use a set of preconfigured rules to examine incoming data packet information and identify patterns that closely align with network attack types. Traditional IDS have several challenges in the cloud:

Networks are virtualized. Data traffic control points are decentralized and traffic flow management is a shared responsibility with the cloud provider. This makes it difficult or impossible to monitor all network traffic for analysis.
Cloud applications are dynamic. Features like auto-scaling and load balancing continuously change how a network environment is configured as demand fluctuates.
Most traditional IDS require experienced technicians to maintain their effective operation and avoid the common issue of receiving an overwhelming number of false positive findings. As a compliance assessor, I have often seen IDS intentionally de-tuned to address the false positive finding reporting issue when expert, continuous support isn?t available.

GuardDuty analyzes tens of billions of events across multiple AWS data sources, such as AWS CloudTrail, Amazon Virtual Private Cloud (Amazon VPC) flow logs, and Amazon Route 53 DNS logs. This gives GuardDuty the ability to analyze event data, such as AWS API calls to AWS Identity and Access Management (IAM) login events, which is beyond the capabilities of traditional IDS solutions. Monitoring AWS API calls from CloudTrail also enables threat detection for AWS serverless services, which sets it apart from traditional IDS solutions. However, without inspection of packet contents, the question remained, ?Is GuardDuty truly effective in detecting network level attacks that more traditional IDS solutions were specifically designed to detect??

AWS asked Foregenix to conduct a test that would compare GuardDuty to market-leading IDS to help answer this question for us. AWS didn?t specify any specific attacks or architecture to be implemented within their test. It was left up to the independent tester to determine both the threat space covered by market-leading IDS and how to construct a test for determining the effectiveness of threat detection capabilities of GuardDuty and traditional IDS solutions which included open-source and commercial IDS.

Foregenix configured a lab environment to support tests that used extensive and complex attack playbooks. The lab environment simulated a real-world deployment composed of a web server, a bastion host, and an internal server used for centralized event logging. The environment was left running under normal operating conditions for more than 45 days. This allowed all tested solutions to build up a baseline of normal data traffic patterns prior to the anomaly detection testing exercises that followed this activity.

Foregenix determined that GuardDuty is at least as effective at detecting network level attacks as other market-leading IDS. They found GuardDuty to be simple to deploy and required no specialized skills to configure the service to function effectively. Also, with its inherent capability of analyzing DNS requests, VPC flow logs, and CloudTrail events, they concluded that GuardDuty was able to effectively identify threats that other IDS could not natively detect and required extensive manual customization to detect in the test environment. Foregenix recommended that adding a host-based IDS agent on Amazon Elastic Compute Cloud (Amazon EC2) instances would provide an enhanced level of threat defense when coupled with Amazon GuardDuty.

As a PCI Qualified Security Assessor (QSA) company, Foregenix states that they consider GuardDuty as a qualifying network intrusion technique for meeting PCI DSS requirement 11.4. This is important for AWS customers whose applications must maintain PCI DSS compliance. Customers should be aware that individual PCI QSAs might have different interpretations of the requirement, and should discuss this with their assessor before a PCI assessment.

Customer PCI QSAs can also speak with AWS Security Assurance Services, an AWS Professional Services team of PCI QSAs, to obtain more information on how customers can leverage AWS services to help them maintain PCI DSS Compliance. Customers can request Security Assurance Services support through their AWS Account Manager, Solutions Architect, or other AWS support.

We invite you to download the Foregenix Amazon GuardDuty Security Review whitepaper to see the details of the testing and the conclusions provided by Foregenix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IOPS

A

Input/output operations per second (IOPS, pronounced eye-ops) is an input/output performance measurement used to characterize computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

NAS vs SAN

A
NAS
Single point of failure
attached on network 
small medium business
part of local area network, subject to bottleneck
SAN
mounts like a local drive
more available
multiple arrays shared in a cluster of servers
dedicated network for enterprise level
NET AFFECTED BY NETWORK TRAFFIC
HIGH SPEED
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

object storage Vs file system

A

File storage organizes and represents data as a hierarchy of files in folders; block storage chunks data into arbitrarily organized, evenly sized volumes; and object storage manages data and links it to associated metadata.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OSI MODEL

A

https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Protocol Attacks

A

A protocol attack focuses on damaging connection tables in network areas that deal directly with verifying connections. By sending successively slow pings, deliberately malformed pings, and partial packets, the attacking computer can cause memory buffers in the target to overload and potentially crash the system. A protocol attack can also target firewalls. This is why a firewall alone will not stop denial of service attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RAID level 0 - Striping

A

In a RAID 0 system data are split up into blocks that get written across all the drives in the array. By using multiple disks (at least 2) at the same time, this offers superior I/O performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RAID level 1 -Mirroring

A

Data are stored twice by writing them to both the data drive (or set of data drives) and a mirror drive (or set of drives). If a drive fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RAID level 10 -combining RAID 1 and RAID 0

A

It is possible to combine the advantages (and disadvantages) of RAID 0 and RAID 1 in one single system. This is a nested or hybrid RAID configuration. It provides security by mirroring all data on secondary drives while using striping across each set of drives to speed up data transfers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

RAID level 5

A

RAID 5 is the most common secure RAID level. It requires at least 3 drives but can work with up to 16. Data blocks are striped across the drives and on one drive a parity checksum of all the block data is written. The parity data are not written to a fixed drive, they are spread across all drives, as the drawing below shows. Using the parity data, the computer can recalculate the data of one of the other data blocks, should those data no longer be available. That means a RAID 5 array can withstand a single drive failure without losing data or access to data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

RAID level 6 - Striping with double parity

A

RAID 6 is like RAID 5, but the parity data are written to two drives. That means it requires at least 4 drives and can withstand 2 drives dying simultaneously. The chances that two drives break down at exactly the same moment are of course very small. However, if a drive in a RAID 5 systems dies and is replaced by a new drive, it takes hours or even more than a day to rebuild the swapped drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID array will even survive that second failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Symmetric and Asymmetric encryption

A

Symmetric encryption uses a single key that needs to be shared among the people who need to receive the message while asymmetrical encryption uses a pair of public key and a private key to encrypt and decrypt messages when communicating.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Throughput vs Latency

A

Latency is the time required to perform some action or to produce some result. Latency is measured in units of time – hours, minutes, seconds, nanoseconds or clock periods.

Throughput is the number of such actions executed or results produced per unit of time. This is measured in units of whatever is being produced (cars, motorcycles, I/O samples, memory words, iterations) per unit of time. The term “memory bandwidth” is sometimes used to specify the throughput of memory systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Volumetric Attacks DDOS

A

The most common DDoS attack overwhelms a machine?s network bandwidth by flooding it with false data requests on every open port the device has available. Because the bot floods ports with data, the machine continually has to deal with checking the malicious data requests and has no room to accept legitimate traffic. UDP floods and ICMP floods comprise the two primary forms of volumetric attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Web Application Firewall Vs Firewall

A

In a technical sense, the difference between application-level firewalls and network-level firewalls is the layers of security they operate on. While web application firewalls operate on layer 7 (applications), network firewalls operate on layers 3 and 4 (data transfer and network). WAFs are focused on protecting applications, while network firewalls are more concerned with traffic into and out of your broader network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are DDoS Attacks?

A

A distributed denial-of-service (DDoS) attack is a malicious attempt to disrupt normal traffic of a targeted server, service or network by overwhelming the target or its surrounding infrastructure with a flood of Internet traffic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Database Clustering

A

Database Clustering is the process of combining more than one servers or instances connecting a single database. Sometimes one server may not be adequate to manage the amount of data or the number of requests, that is when a Data Cluster is needed. Database clustering, SQL server clustering, and SQL clustering are closely associated with SQL is the language used to manage the database information.
The main reasons for database clustering are its advantages a server receives; Data redundancy, Load balancing, High availability, and lastly, Monitoring and automation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is identity management?

A

Identity management, also known as identity and access management, is a framework of policies and technologies for ensuring that the proper people in an enterprise have the appropriate access to technology resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why hash a file

A

It simply helps you to verify the integrity of the file that you are downloading. The hashes are calculated using “good data” and it helps you to check for file corruption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Load balancing

A

Load balancing is the process of distributing network traffic across multiple servers. This ensures no single server bears too much demand. By spreading the work evenly, load balancing improves application responsiveness. It also increases availability of applications and websites for users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

DNS

A

Humans access information online through domain names, like nytimes.com or espn.com. Web browsers interact through Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so browsers can load Internet resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

TCP vs. UDP

A

TCP is a connection-oriented protocol, whereas UDP is a connectionless protocol.
The speed for TCP is slower while the speed of UDP is faster
TCP uses handshake protocol like SYN, SYN-ACK, ACK while UDP uses no handshake protocols
TCP does error checking and also makes error recovery, on the other hand, UDP performs error checking, but it discards erroneous packets.
TCP has acknowledgment segments, but UDP does not have any acknowledgment segment.
TCP is heavy-weight, and UDP is lightweight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Unicast vs Multicast

A

A Unicast transmission/stream sends IP packets to a single recipient on a network. A Multicast transmission sends IP packets to a group of hosts on a network. If the streaming video is to be distributed to a single destination, then you would start a Unicast stream by setting the destination IP address and port on the AVN equal to the destination?s values. If you want to view the stream at multiple concurrent locations, then you would set the AVN?s destination IP address to a valid Multicast IP address (224.0.0.0 ? 239.255.255.255)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When would DNS protocol use TCP vs. UDP transport protocol?

A

UDP packets are smaller in size. UDP packets can not be greater then 512 bytes. So any application needs data to be transferred greater than 512 bytes require TCP in place. For example, DNS uses both TCP and UDP for valid reasons described below. Note that UDP messages are not larger than 512 Bytes and are truncated when greater than this size. DNS uses TCP for Zone transfer and UDP for name queries either regular (primary) or reverse. UDP can be used to exchange small information whereas TCP must be used to exchange information larger than 512 bytes. If a client doesn’t get response from DNS it must re-transmit the data using TCP after 3-5 seconds of interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

router vs switch

A

Just as a switch connects multiple devices to create a network, a router connects multiple switches, and their respective networks, to form an even larger network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Hypervisor- distinguish multiple VMs running on it and isolate them from the underlying h/w?

A

A normal system call in a guest is processed by the guest OS without intervention of the hypervisor.

However, when the guest does cause a trap to the hypervisor (not a system call, but some other operation that requires hypervisor service), the hypervisor knows which guest it is because it knows which guest it scheduled on that CPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Virtualization

A

Virtualization is the process of running a virtual instance of a computer system in a layer abstracted from the actual hardware. Most commonly, it refers to running multiple operating systems on a computer system simultaneously. To the applications running on top of the virtualized machine, it can appear as if they are on their own dedicated machine, where the operating system, libraries, and other programs are unique to the guest virtualized system and unconnected to the host operating system which sits below it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

content delivery network

A

A content delivery network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content.

A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, javascript files, stylesheets, images, and videos. The popularity of CDN services continues to grow, and today the majority of web traffic is served through CDNs, including traffic from major sites like Facebook, Netflix, and Amazon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

cluster

A

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

IPSec VS SSL VPN

A

One of the major differences between SSL and IPsec is which layer of the OSI model each one belongs to. The OSI model is an abstract representation, broken into “layers,” of the processes that make the Internet work.

The IPsec protocol suite operates at the network layer of the OSI model. It runs directly on top of IP (the Internet Protocol), which is responsible for routing data packets.

Meanwhile, SSL operates at the application layer of the OSI model. It encrypts HTTP traffic instead of directly encrypting IP packets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

MPLS

A

Multiprotocol Label Switching is a routing technique in telecommunications networks that directs data from one node to the next based on short path labels rather than long network addresses, thus avoiding complex lookups in a routing table and speeding traffic flows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How to speed up a high latency link/high speed link?

A

Use UDP instead of TCP, DIsable encryption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

CDN - How does a CDN make web sites faster?

A

By distributing content closer to website visitors by using a nearby CDN server (among other optimizations), visitors experience faster page loading times. As visitors are more inclined to click away from a slow-loading site, a CDN can reduce bounce rates and increase the amount of time that people spend on the site. In other words, a faster a website means more visitors will stay and stick around longer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

OSI

A

Layer 7 - Application
To further our bean dip analogy, the Application Layer is the one at the top - it?s what most users see. In the OSI model, this is the layer that is the ?closest to the end user?. Applications that work at Layer 7 are the ones that users interact with directly. A web browser (Google Chrome, Firefox, Safari, etc.) or other app - Skype, Outlook, Office - are examples of Layer 7 applications.

Layer 6 - Presentation
The Presentation Layer represents the area that is independent of data representation at the application layer. In general, it represents the preparation or translation of application format to network format, or from network formatting to application format. In other words, the layer ?presents? data for the application or the network. A good example of this is encryption and decryption of data for secure transmission - this happens at Layer 6.

Layer 5 - Session
When two devices, computers or servers need to ?speak? with one another, a session needs to be created, and this is done at the Session Layer. Functions at this layer involve setup, coordination (how long should a system wait for a response, for example) and termination between the applications at each end of the session.

Layer 4 ? Transport
The Transport Layer deals with the coordination of the data transfer between end systems and hosts. How much data to send, at what rate, where it goes, etc. The best known example of the Transport Layer is the Transmission Control Protocol (TCP), which is built on top of the Internet Protocol (IP), commonly known as TCP/IP. TCP and UDP port numbers work at Layer 4, while IP addresses work at Layer 3, the Network Layer.

Layer 3 - Network
Here at the Network Layer is where you?ll find most of the router functionality that most networking professionals care about and love. In its most basic sense, this layer is responsible for packet forwarding, including routing through different routers. You might know that your Boston computer wants to connect to a server in California, but there are millions of different paths to take. Routers at this layer help do this efficiently.

Layer 2 ? Data Link
The Data Link Layer provides node-to-node data transfer (between two directly connected nodes), and also handles error correction from the physical layer. Two sublayers exist here as well - the Media Access Control (MAC) layer and the Logical Link Control (LLC) layer. In the networking world, most switches operate at Layer 2.

Layer 1 - Physical
At the bottom of our OSI bean dip we have the Physical Layer, which represents the electrical and physical representation of the system. This can include everything from the cable type, radio frequency link (as in an 802.11 wireless systems), as well as the layout of pins, voltages and other physical requirements. When a networking problem occurs, many networking pros go right to the physical layer to check that all of the cables are properly connected and that the power plug hasn?t been pulled from the router, switch or computer, for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Firewalls

A

In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted network and an untrusted network, such as the Internet.

43
Q

Route 53

A

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) service offered by AWS. It is named by the TCP or UDP port 53, which is where DNS server requests are addressed. Like any DNS service, Route 53 handles domain registration and routes users? internet requests to your application ? whether it?s hosted on AWS or elsewhere.

But Route 53 also intelligently directs traffic based on sophisticated routing policies and ? through automated health checks ? away from servers that might be failing.

44
Q

Port numbers for (DNS, HTTP, Telnet)

A

DNS : 53
HTTP: 80
Telnet: 23

45
Q

CIDR

A

CIDR notation. CIDR notation is a compact representation of an IP address and its associated routing prefix. The notation is constructed from an IP address, a slash (‘/’) character, and a decimal number. The trailing number is the count of leading 1 bits in the routing mask, traditionally called the network mask.

46
Q

QOS and bandwidth control

A

QoS helps manage packet loss, delay and jitter on your network infrastructure. Since we’re working with a finite amount of bandwidth, our first order of business is to identify what applications would benefit from managing these three things. Once network and application administrators identify the applications that need to have priority over bandwidth on a network, the next step is to identify that traffic.

47
Q

Bastion host

A

A bastion host is a special-purpose computer on a network specifically designed and configured to withstand attacks. The computer generally hosts a single application, for example a proxy server, and all other services are removed or limited to reduce the threat to the computer.

48
Q

Difference between SQL and No SQL

A

SQL databases are table based databases whereas NoSQL databases can be document based, key-value pairs, graph databases. SQL databases are vertically scalable while NoSQL databases are horizontally scalable. SQL databases have a predefined schema whereas NoSQL databases use dynamic schema for unstructured data

49
Q

hadoop - explain mapreduce

A

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a map procedure, which performs filtering and sorting, and a reduce method, which performs a summary operation.

50
Q

differene between oltp vs olap

A

OLAP stands for On-Line Analytical Processing. It is used for analysis of database information from multiple database systems at one time such as sales analysis and forecasting, market research, budgeting and etc. Data Warehouse is the example of OLAP system.

OLTP stands for On-Line Transactional processing. It is used for maintaining the online transaction and record integrity in multiple access environments. OLTP is a system that manages very large number of short online transactions for example, ATM.

51
Q

Data mart vs Data warehouse

A

A data mart is a subset of a data warehouse oriented to a specific business line. Data marts contain repositories of summarized data collected for analysis on a specific section or unit within an organization, for example, the sales department.

A data warehouse is a large centralized repository of data that contains information from many sources within an organization. The collated data is used to guide business decisions through analysis, reporting, and data mining tools.

52
Q

cluster and mirroring

A

Failover clusters provide high-availability support for an entire Microsoft SQL Server instance, in contrast to database mirroring, which provides high-availability support for a single database. Database mirroring works between failover clusters and, also, between a failover cluster and a nonclustered host.

53
Q

database caching server

A

In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data?s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.
How does Caching work?
The data in a cache is generally stored in fast access hardware such as RAM (Random-access memory) and may also be used in correlation with a software component. A cache’s primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.

Trading off capacity for speed, a cache typically stores a subset of data transiently, in contrast to databases whose data is usually complete and durable.

54
Q

db tuning

A

Good database design ? Distribute the database workload across multiple disks to avoid or reduce disk overloading. Good design also includes proper sizing and organization of tables, indexes, and logs.
Disk I/O optimization ? Disk I/O optimization is related directly to throughput and scalability. Access to even the fastest disk is orders of magnitude slower than memory access. Whenever possible, optimize the number of disk accesses. In general, selecting a larger block/buffer size for I/O reduces the number of disk accesses and might substantially increase throughput in a heavily loaded production environment.
Checkpointing ? This mechanism periodically flushes all dirty cache data to disk, which increases the I/O activity and system resource usage for the duration of the checkpoint. Although frequent checkpointing can increase the consistency of on-disk data, it can also slow database performance. Most database systems have checkpointing capability, but not all database systems provide user-level controls. Oracle, for example, allows administrators to set the frequency of checkpoints while users have no control over SQLServer 7.x checkpoints. For recommended settings, see the product documentation for the database you are using.
Disk and database overhead can sometimes be dramatically reduced by batching multiple operations together and/or increasing the number of operations that run in parallel (increasing concurrency). Examples:
Increasing the value of the Message bridge BatchSize or the Store-and-Forward WindowSize can improve performance as larger batch sizes produce fewer but larger I/Os.
Programmatically leveraging JDBC’s batch APIs.
Use the MDB transaction batching feature. See Tuning Message-Driven Beans.
Increasing concurrency by increasing max-beans-in-free-pool and thread pool size for MDBs (or decreasing it if batching can be leveraged).

55
Q

Sharding

A

Sharding is the process of breaking up large tables into smaller chunks called shards that are spread across multiple servers. A shard is essentially a horizontal data partition that contains a subset of the total data set, and hence is responsible for serving a portion of the overall workload. The idea is to distribute data that can?t fit on a single node onto a cluster of database nodes. Sharding is also referred to as horizontal partitioning. The distinction between horizontal and vertical comes from the traditional tabular view of a database. A database can be split vertically???storing different table columns in a separate database, or horizontally???storing rows of the same table in multiple database nodes.

56
Q

JSON vs. XML

A

JSON is Like XML Because
Both JSON and XML are “self describing” (human readable)
Both JSON and XML are hierarchical (values within values)
Both JSON and XML can be parsed and used by lots of programming languages
Both JSON and XML can be fetched with an XMLHttpRequest
JSON is Unlike XML Because
JSON doesn’t use end tag
JSON is shorter
JSON is quicker to read and write
JSON can use arrays
The biggest difference is:

XML has to be parsed with an XML parser. JSON can be parsed by a standard JavaScript function.

57
Q

How would you provide high availability for a database deployed on an instance in the cloud?

A

Deploy to multi availability zone and create replica

58
Q

Why is it hard to horizontally scale a SQL database?

A

Relational databases are designed to run on a single server in order to maintain the integrity of the table mappings and avoid the problems of distributed computing. With this design, if a system needs to scale, customers must buy bigger, more complex, and more expensive proprietary hardware with more processing power, memory, and storage. Upgrades are also a challenge, as the organization must go through a lengthy acquisition process, and then often take the system offline to actually make the change. This is all happening while the number of users continues to increase, causing more and more strain and increased risk on the under-provisioned resources.

59
Q

How can a database be scaled?

A

master-slave? model in which the ?slaves? are additional servers that can handle parallel processing and replicated data, or data that is ?sharded? (divided and distributed among multiple servers, or hosts) to ease the workload on the master server. shared storage, in-memory processing, better use of replicas, distributed caching

60
Q

What is database indexing and why is it important

A

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed

61
Q

What ports do databases use

A

SQL Server: 1433
Oracle: 1521
Aurora/MySQL/MariaDB: 3306

62
Q

horizontal Vs vertical scaling

A

Horizontal scaling helps you scale up to your computing requirements by adding more machines or servers to your resource pool, while vertical scaling helps you do that by adding more power or computing resources (CPU, RAM) to your existing infrastructure.

63
Q

LAMP stack

A

A LAMP Stack is a set of open-source software that can be used to create websites and web applications. LAMP is an acronym, and these stacks typically consist of the Linux operating system, the Apache HTTP Server, the MySQL relational database management system, and the PHP programming language

64
Q

HA Architecture

A

High availability is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Modernization has resulted in an increased reliance on these systems

65
Q

Lambda Architecture

A

Lambda architecture is a way of processing massive quantities of data (i.e. ?Big Data?) that provides access to batch-processing and stream-processing methods with a hybrid approach. Lambda architecture is used to solve the problem of computing arbitrary functions. The lambda architecture itself is composed of 3 layers:
Batch Layer

New data comes continuously, as a feed to the data system. It gets fed to the batch layer and the speed layer simultaneously. It looks at all the data at once and eventually corrects the data in the stream layer. Here we can find lots of ETL and a traditional data warehouse. This layer is built using a predefined schedule, usually once or twice a day. The batch layer has two very important functions:
To manage the master dataset
To pre-compute the batch views.

Serving Layer

The outputs from the batch layer in the form of batch views and those coming from the speed layer in the form of near real-time views get forwarded to the serving. This layer indexes the batch views so that they can be queried in low-latency on an ad-hoc basis.

Speed Layer (Stream Layer)

This layer handles the data that are not already delivered in the batch view due to the latency of the batch layer. In addition, it only deals with recent data in order to provide a complete view of the data to the user by creating real-time views.

66
Q

what does kernel do?

A

The kernel is the central module of an operating system (OS). It is the part of the operating system that loads first, and it remains in main memory. Because it stays in memory, it is important for the kernel to be as small as possible while still providing all the essential services required by other parts of the operating system and applications. The kernel code is usually loaded into a protected area of memory to prevent it from being overwritten by programs or other parts of the operating system.

Typically, the kernel is responsible for memory management, process and task management, and disk management. The kernel connects the system hardware to the application software. Every operating system has a kernel.

67
Q

Difference between java & C

A

C is a procedural, low level, and compiled language. Java is an object-oriented, high level, and interpreted language. Java uses objects, while C uses functions. Java is easier to learn and use because it’s high level, while C can do more and perform faster because it’s closer to machine code.

68
Q

OOP

A

Object-oriented programming (OOP) is a computer programming model that organizes software design around data, or objects, rather than functions and logic. An object can be defined as a data field that has unique attributes and behavior.

69
Q

continuous integration

A

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests. While automated testing is not strictly part of CI it is typically implied.

70
Q

Monolithic application

A

In software engineering, a monolithic application describes a single-tiered software application in which the user interface and data access code are combined into a single program from a single platform.

71
Q

loosely coupled application

A

In computer science, loose coupling (or loosely coupled) is a type of coupling that describes how multiple computer systems, even those using incompatible technologies, can be joined together for transactions, regardless of hardware, software and other functional components. Loosely coupled systems describe those that work on an exchange relationship where little input is needed from each of the additional systems. In a loosely coupled system hardware and software may interact but they are not dependant on each other to work. Computers in a network are considered loose-coupled systems as a client machine may request data from the server, but the two systems also work independently of each other.

72
Q

Command line interpreter vs compiler

A

Compiler transforms code written in a high-level programming language into the machine code, at once, before program runs, whereas an Interpreter coverts each high-level program statement, one by one, into the machine code, during program run. Compiled code runs faster while interpreted code runs slower.

73
Q

BIG-O Notation

A

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm

74
Q

What would you do if a web application could not talk to the database?

A

Process of elimination, See if any other apps using the database are down. Try to establish if it’s a database server issue. If not I’d move to the firewall and establish port 1521 was open on the database server. If that checked out I would start on the webserver and probably start with rebooting the webserver, since it’s down anyway. Start with the simple stuff and work my way through troubleshooting the webserver, is the webserver running, services started, error logs, etc…

75
Q

How to manage web application state between EC2 instances running in an auto scaling group?

A

State Manager, which was launched as part of Amazon EC2 Systems Manager, helps you define and maintain consistent configuration of operating systems and applications. Using State Manager, you can control configuration details such as instance configurations, anti-virus definitions, firewall settings, and more. Based on a schedule that you define, State Manager automatically reviews your fleet and compares it against the specified configuration policy. If your configuration changes and does not match the wanted state, State Manager reapplies the policy to bring it back to the wanted state.

76
Q

What kind of Metrics would you monitor for the E-commerce website application front-end?

A

JavaScript Errors, Framework related issues, Performance issues, Network request failures.

77
Q

Explain how to scale from a single 5 user database to a 50 user, 500 user, 5000 user and then 5 million user database and explain how to overcome each hurdle.

A

https://www.youtube.com/watch?v=Ma3xWDXTxRg

78
Q

Which algorithm does an Elastic Load Balancer use?

A

Application Load Balancer exclusively used a round-robin algorithm to distribute incoming requests to backend targets. The requests would be distributed among all the targets of a target group in a round-robin fashion without consideration for capacity or utilization. This led to over-utilization or under-utilization of targets in target groups when requests had varied processing times or targets were frequently added or removed.
And Least outstanding requests algorithm : as the new request comes in, the load balancer will send it to the target with least number of outstanding requests. Targets processing long-standing requests or having lower processing capabilities are not burdened with more requests and the load is evenly spread across targets. This also helps the new targets to effectively take load off of overloaded targets.

79
Q

Whats a three tier web page architect,and how do you monitor to see a bottleneck and how do you improve it

A

A 3-tier application architecture is a modular client-server architecture that consists of a presentation tier, an application tier and a data tier. … The presentation tier communicates with the other tiers through application program interface (API) calls

80
Q

If you collected logs and wanted to store them for 14 days then move them over to permanent storage for a few years how would you do this

A

S3 Lifecyle management

81
Q

What is a web application firewall and how do you use it, what layer is it on?

A

A web application firewall filters, monitors, and blocks HTTP traffic to and from a web application. A WAF is differentiated from a regular firewall in that a WAF is able to filter the content of specific web applications while regular firewalls serve as a safety gate between servers.

82
Q

What is TLS mutual authentication

A

Mutual TLS (mTLS) authentication ensures that traffic is both secure and trusted in both directions between a client and server. It allows requests that do not log in with an identity provider (like IoT devices) to demonstrate that they can reach a given resource. Client certificate authentication is also a second layer of security for team members who both log in with an identity provider (IdP) and present a valid client certificate.

With a root certificate authority (CA) in place, Access only allows requests from devices with a corresponding client certificate. When a request reaches the application, Access responds with a request for the client to present a certificate. If the device fails to present the certificate, the request is not allowed to proceed. If the client does have a certificate, Access completes a key exchange to verify.

83
Q

Difference between file storage, Block storage and object storage

A

File storage: Economical and easily structured, data are saved in files and folders. They are usually found on hard drives, which means that they appear exactly the same for the user and on the hard drive.

Block storage: Data are stored in blocks of uniform size. Although more expensive, complex, and less scalable, block storage is ideal for data that needs to be accessed and modified frequently.

Object storage: Data is stored as objects with unique metadata and identifiers. Although, in general, this type of storage is less expensive, objects? storage is only ideal for data that does not require modification.

84
Q

What is HTTP socket

A

WebSocket is a bidirectional communication protocol that can send the data from the client to the server or from the server to the client by reusing the established connection channel. The connection is kept alive until terminated by either the client or the server.

85
Q

Border Gateway Protocol

A

BGP is the postal service of the internet. When someone submits data across the internet, BGP is responsible for looking at all of the available paths that data could travel and picking the best route.

86
Q

Application programming interface (API)

A

APIs let your product or service communicate with other products and services without having to know how they?re implemented. This can simplify app development, saving time and money. When you?re designing new tools and products?or managing existing ones?APIs give you flexibility; simplify design, administration, and use; and provide opportunities for innovation.

87
Q

VLAN

A

VLANs (Virtual LANs) are logical grouping of devices in the same broadcast domain. VLANs are usually configured on switches by placing some interfaces into one broadcast domain and some interfaces into another. Each VLAN acts as a subgroup of the switch ports in an Ethernet LAN. A VLAN acts like a physical LAN, but it allows hosts to be grouped together in the same broadcast domain even if they are not connected to the same switch

88
Q

Hash Function

A

A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as the index in the hash table.

89
Q

IP Address

A

An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions: host or network interface identification and location addressing.

90
Q

Name 5 File Systems

A

Linux supports numerous file systems, but common choices for the system disk on a block device include the ext* family (ext2, ext3 and ext4), XFS, JFS, and btrfs. For raw flash without a flash translation layer (FTL) or Memory Technology Device (MTD), there are UBIFS, JFFS2 and YAFFS, among others. SquashFS is a common compressed read-only file system.

Windows makes use of the FAT, NTFS, exFAT, Live File System and ReFS file systems (the last of these is only supported and usable in Windows Server 2012, Windows Server 2016, Windows 8, Windows 8.1, and Windows 10; Windows cannot boot from it).

91
Q

When do you use functions or subroutines?

A

They operate similarly but they have one key difference. A function is ussed when a value is returned to the calling routine while a subroutine is used when a task is needed, but no value is returned

92
Q

what is concurrency?

A

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods within an OS.

93
Q

what is parallelism

A

Parallelism is when tasks literally run at the same time, for example on a multicore processor within an OS.

94
Q

what are deadlocks?

A

Deadlocks occur when a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by other process

95
Q

what is ETL?

A

ETL is a type of data integration that refers to 3 steps (extract, transform, load) used to blend data from multiple sources. It is often used to build a data warehouse. During this process, data is taken (extracted) from a source system, converted (transformed) into a format that can be analyzed and stored(loaded) into a data warehouse.

96
Q

How DHCP protocol works to acquire dynamic new IPs?

A

DHCP operations fall into four phases: server discovery, IP lease offer, IP lease request, and IP lease acknowledgement. These stages are often abbreviated as DORA for discovery, offer, request, and acknowledgement. The DHCP operation begins with clients broadcasting a request.

97
Q

What is the role of TTL in DNS?

A

DNS TTL (time to live) is a setting that tells the DNS resolver how long to cache a query before requesting a new one. The information gathered is then stored in the cache of the recursive or local resolver for the TTL before it reaches back out to collect new, updated detailsFor any critical records, you should always keep the TTL low.

The purpose of the TTL field is to avoid a situation in which an undeliverable datagram keeps circulating on an Internet system, and such a system eventually becoming swamped by such “immortals”.

A good range would be anywhere from 30 seconds to 5 minutes

98
Q

What is a VPN?

A

A virtual private network extends a private network across a public network and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network

99
Q

What is a proxy?

A

a proxy server is a server application or appliance that acts as an intermediary for requests from clients seeking resources from servers that provide those resources

100
Q

Dedupe vs WAN optimization

A

Deduplication ? Eliminates the transfer of redundant data across the WAN by sending references instead of the actual data. By working at the byte level, benefits are achieved across IP applications. Compression ? Relies on data patterns that can be represented more efficiently.

By eliminating the transfer of repetitive IP traffic, deduplication significantly improves WAN utilization and accelerates data transfers between geographically disperse locations. This saves bandwidth costs and helps to overcome many obstacles when communicating across a WAN. Because WAN deduplication works on all IP traffic, it plays a key role in a variety of IT initiatives, including server centralization, virtualization, and application delivery. In addition, it is essential to improving the performance and reliability of data replication, backup, and recovery across the WAN. In this respect, WAN deduplication is actually a nice complement to storage deduplication, resulting in even higher cost savings and better Recovery Point and Time Objectives (RPO/RTOs) across the enterprise.

101
Q

What is hyperconverged infrastructure (HCI) ?

A

combines
compute/network/storage into one system
hypervisor and storage controller

MAKES ENTIRE RACK A HYPERVISOR?
virtual san, nas, unified storage etc

102
Q

Difference between lAN, WLAN and WAN

A

wlan is wireless lan

103
Q

Difference between lAN, WLAN and WAN

A

wlan is wireless lan
A LAN, abbreviated from Local Area Network, is a network that covers a small geographical area such as homes, offices and groups of buildings. Whereas a WAN, abbreviated from Wide Area Network, is a network that covers larger geographical areas which can span the globe.Jul 15, 2019

104
Q

Difference between QOS and bandwidth control

A

Bandwidth control simply limits the amount of WAN bandwidth your router allows to be used. QoS actually prioritizes the packets based on the type of data they’re carrying.Mar 21, 2017