Exam Questions Flashcards by Pedro Dusso

Define Grid Computing

A Grid coordinates resources that are not subject to centralized control using standard, open, general purpose protocols and interfaces to deliver nontrivial qualities of service.

How well did you know this?

Not at all

Perfectly

Explain the 3 point check list

1) Coordinate resources that are not subject to centralized control
- integrates and coordinates resources and users are in different control domains
- different administrative units of one or different companies
- address: security, policy, payment, membership,
- SPPM

2) Using standard, open, general-purpose protocols and interfaces
- a grid is built from multi-purpose protocols and interfaces
- important because otherwise we are dealing with a application-specific system
- address: authentication, authorization, discovery, access
- DAAAc

3) To deliver nontrivial qualities of service
- resources can be used in a coordinate fashion
- offers various qualities of service (SLA)
- response time, throughput, availability, security
- the utility combined system is greater than that of the sum of its parts
- address: response time, throughput, availability, security.

How well did you know this?

Not at all

Perfectly

Can we apply the three point check list on Clouds?

No, not all points. A cloud normally manage its resources in a centralized fashion, e.g. a cloud provider host all the cloud service inside a data center which is totally under his control. This is different from Grid, where we deal with a distributed resource management.

The second point also does not apply for Cloud, because in Cloud Computing we still have the data lock-in problem - APIs, for Cloud Computing still proprietary. It is hard to extract data and programs from one site to run on another.

The only point that is shared between Grid and Cloud computing is the third one. Both definitions agree on the requirement of a certain level of quality of service or the negotiation of service level agreements (SLAs), but additionally Cloud Computing should be able to adjust to a required level of QoS.

How well did you know this?

Not at all

Perfectly

Can you give an example of a grid used in science? And in business?

Grids are important for science nowadays because the huge amount of data the experiments produce. Some examples are: the LHC in CERN, monitoring of industrial equipment (airplane flights), and high-throughput sensor networks.

In business, we notice that IT industry is evolving to a mass adoption stage: people begin to adopt a kind of post-technology perspective. We have two grid examples:

GDI-Grid: efficient mining and processing of spatial data for simulation of noise dispersion, flood simulation and disaster management.
EGEE: provide large computational and storage resources.

How well did you know this?

Not at all

Perfectly

Give a overview of each layer: FABRIC

FABRIC: interfaces for local control. Provides the resources to which shared access is mediated by grid protocols. Implement local, resource-specific operations that occur on specific resources - physical or logical.

Richer fabric functionality enables more sophisticated sharing operations. However, if we place few demands on Fabric elements, then deployment of Grid infrastructure is simplified.

Introspection/Enquire mechanisms: permit discovery of their structure, state, and capabilities (support advanced reservation?);
Resource Management: provide some control of delivery of QoS.

Resources:

Computational: start programs, monitor and control the execution. Enquire load, queue, software characteristics.
Storage: put/get files. High performance transfers. Control of data transfers: disk space, bandwidth, network, …
Network: control over the resources allocated to network transfers (prioritization/reservation)

How well did you know this?

Not at all

Perfectly

Give a overview of each layer: CONNECTIVITY

CONNECTIVITY: talking to things. Defines core communication and authentication protocols.

Communication protocols enable the exchange of data between fabric layer resources (~ TCP/IP).
Authentication protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources. Should provide:
- SSO: users must logon (authenticate) just once and then have access to multiple resources without further user intervention
- Delegation: a user must be able to endow a program with the ability to run on that user’s behalf, so that the program is able to access the resources on which the user is authorized (Possible limited; chain).
- Integration with various local security solutions
- User-based trust relationships

PKI/GSI

How well did you know this?

Not at all

Perfectly

Give a overview of each layer: RESOURCE

RESOURCE: sharing single resources. Concerned entirely with individual resources - ignore issues of global state. Protocols for secure negotiation, initiation, monitoring, control, accouting, and payment of sharing operations on individual resources.

Information protocols: used to obtain information about the structure and state of a resource - config, load, usage, policy, cost, …
Management protocols: used to negotiate access to a shared resource. Specify resource requirements (advanced reservation and QoS) and the operations to be performed (process creation or data access).

Since they are responsable for instantiating sharing relationships, they must serve as a “policy application point” –> ensure that the requested protocol operations are consistent with the policy under which the resource is to be shared.

How well did you know this?

Not at all

Perfectly

Give a overview of each layer: COLLECTIVE

COLLECTIVE: coordinating multiple resources, not associated with any one specific resource. Spam from general purpose to highly application/domain specific. Capture interaction across collections of resources.

Share behavior: directory services; co-allocation, scheduling and brokering service; monitoring and diagnostics; data replication.

Address security, policy and accounting issues: community authorization services (CAS - enforce community policies) and community accounting and payment.

Open Grid Service Architecture.

How well did you know this?

Not at all

Perfectly

Give a overview of each layer: APPLICATION

APPLICATION: execute grid applications. Grid service compositions(workflows) combine grid services with new grid applications. User applications constructed by utilizing the services defined at each lower level. Each of the previously layer must provide API and SDK for the higher layers integration.

How well did you know this?

Not at all

Perfectly

Defined Cloud Computing

Clouds are a large pool of VIRTUALIZED RESOURCES. These resources (hardware, development, platform, services, …) can be DYNAMICALLY RECONFIGURED to adjust to a variable load, exploited by a PAY-PER-USE MODEL, and they offer guarantees by means of CUSTOMIZED SLAs.

How well did you know this?

Not at all

Perfectly

What are the enablers for Cloud Computing?

Cloud computing has two main enablers: the first is the technological: secure and efficient HARDWARE VIRTUALIZATION technology. The second enabler is economical: economies of scale in infrastructure; lightweight, contract-less, inexpensive computing.

How well did you know this?

Not at all

Perfectly

What is the main technology behind the cloud?

The HYPERVISOR - it allows multiple guest operating systems to run concurrently on a host computer. It presents a virtual operating platform to the guest OS and monitor the execution of that guest OS. Multiple instances of a variety of OSs may share the virtualized hardware resources.

How well did you know this?

Not at all

Perfectly

Which techniques of virtualization exist?

Four main techniques: hardware-assisted virtualization; full virtualization; paravirtualization and operating system assisted virtualization.

How well did you know this?

Not at all

Perfectly

Techniques of virtualization: hardware-assisted virtualization

Any critical operation can be detected on the fly, because the hardware allows the automatic trapping of all critical operations. Therefore, no scanning is needed a priori.

How well did you know this?

Not at all

Perfectly

Techniques of virtualization: full virtualization

All executable code has to be scanned for critical operations- All of them have to be replaced by a trap that needs to be stored in a special directory such that the real operation can be retrieved if the trap is executed. Now the hyper-visor has to emulate the behavior expected by the virtual machine.

How well did you know this?

Not at all

Perfectly

Techniques of virtualization: paravirtualization

Study These Flashcards

The operating system (OS) is modified such that the OS is aware of being in a virtual machine. Therefore, critical operations need not be detected, but are already rewritten such that real system can be used.

Techniques of virtualization: operating system assisted virtualization

Study These Flashcards

The best solution with respect to performance. OS kernel allow for multiple isolated userspace instances (instead of just one). Programs run in virtual partition, using the OS normal system call interface. Standard system calls of the OS can be used at any time.

What is MapReduce? Give a example

Study These Flashcards

Mapreduce is a framework for simplifying the development of parallel programs, applying operations on large data sets. What MR does essentially is to parallelize maps operations and bring map results back together in the reduce operation.

MAP: takes as input a function and a sequence of values. Applies the function to each value in the sequence. Produces a set of intermediate pair.

REDUCE: combines all the elements of a sequence using a binary operation. Group all intermediate values with the same intermediate key and passes them to Reduce(). For each key K an output value will be produced.

WORDCOUNT
MAP:
string[] tokens = words.split(“ “)
foreach string key in tokens emit(key, 1)

REDUCE:
while(values.hasNext())
sum += values.get()

How can we secure the cloud in data level and application level?

Study These Flashcards

Data level: classified/sensitive data is held in the hands of third-parties - no other choice but to trust them they will keep your data safe and private. Unauthorized user must not access our data. ACCESS CONTROL defines and enforces access conditions. Usage control generalizes access control and enforces what must and what must not happen after distribution of data (“delete after read”, etc.)

Application level: encryption is needed. We must encrypt the access to the cloud resource control interface, the administrative access to OS instances, the access to applications and the application data not currently in use.

Grid Security Infrastructure: what is that?

Study These Flashcards

Supports security across organizational boundaries. Supports SSO and delegation, when computations involve multiple resouces and/or sites. Provide secure communication between elements of a Grid. Uses PKI (CAs and certificates) for credentials and SSL/TLS for authentication and message protection. Provide two levels of security:

Transport level: (TLS/SSL) end points are no longer in direct communication. Message will be unprotected in a intermediary service.
Message level: (WS-Security) end to end security; the message is always protected.

What are the types of Single Sign On?

Study These Flashcards

SSO can be simple or complex.

Simple: SSO with a single Authentication Authority and a single or multiple Authentication Server(s).

Complex:

Client side caching of credentials.
Server-based: Credentials returned from primary authentication authority’s database.
Credential set (Service-based):
- Token - temporary token. Token from home domain can be used in partner domain.
- PKI - Public Key Credentials (Certificate and Private Key)

Resource Management: shortly explain OGSA.

Study These Flashcards

Open Grid Service Architecture: common, standard, open architecture for building Grids. OGSA requires statefull WS, thus WSRF is the fundation upon which OGSA services are built. OGSA services are standard interfaces, behaviors and schema definitions to help development of reusable components and interoperable systems.

Divided into: Core services, Data services, Resource Management Services.

Grid services have state, but Web Services don’t. Solution?

Study These Flashcards

WS-Resource = Web Services + Stateful Resource.

Grids need access to stateful resources. WSRF contains 5 specifications to model and manage WS-R. Concerned primarily with the creation, addressing, inspection and lifetime management.

WSRF describes the WS-R definition, how to make the properties of a WS-R acessible through a WS interface.

What are the specification of WSRF?

Study These Flashcards

WS-ResourceProperty
WS-ResourceLifetime
WS-ServiceGroup
WS-FaultBase
WS-RenewableReference

Plus
WS-Notification
WS-Addressing

PLSgFbRefNA

Resource Management: shortly explain WS-Gram.

WS-Gram is one possible implementation of a Grid Resource Management Service (Globus implementation). It is consistent with the Web Services interfaces with WSRF. Run, terminate, monitor jobs remotely - but is NOT a job scheduler.

SLA: what are they and what are they good for?

SLA are agreements negotiated through the "submit, acquire, bind" operations. Explicit contract between provider and consumer. One SLA per resource. Policy and securtiy in SLA does not have to be public. SLAs, describe the minimum service qualities in terms of query response time, throughput, availability (mean time between failures; mean time to recovery). Usually defined as percentiles: 99.99% of all requests are within 300ms.

What is Globus Toolkit? Are there any other approaches?

Open source grid middlware for building computing grids. There are also Unicore and Venice.

Compare the different virtualization approaches:

OS-assisted virtualization is the best solution with respect to performance. Paravirtualization is the second, because the guest OS is aware of the virtualization and can directly access the system calls of the host OS. Full virtualization is the least efficient virtualization technique, because the scanning for and replacing of critical operation is quite time consuming. Hardware-assisted virtualization is somewhere is the middle between paravirtualization and full virtualization, but far more efficient then full virtualization with hardware assistance.

Cloud Computing security threats:

Privacy – your (or your customer) data is entirely stored at provider. Dependency on cloud provider – non physical control over hardware; foreign government can decide to shut down the cloud or to force the provider to unveil your data. Customers are bounded to the providers (vendor lock-in), because the provider use custom solutions and proprietary APIs – non standard services.

Security in Cloud Computing?

Data Security: different kind of data (personal, business, administrative) require different security mechanisms. - Access Control: defines/enforces access conditions. Usage controls generalizes it, enforcing what must and what must not happen to the data in the future. Use multitenancy (one code multi clients) Application Security: encryption is needed for Cloud. We must encrypt: access to the cloud resource interface; administrative access to OS instances; access to application; application data not in use.

What are the models for parallel architectures?

SISD, SIMD, (MISD) and MIMD. Parallel Architecture can be: Multiprocessors (shared memory): global address space. Implicit communication via shared data (changes in memory location caused by one processor are visible to all processors.) UMA and NUMA. Programmer responsible to ensure correct access of global memory. Do not scale. Multicomputers (distributed memory): processors have their local memory. Mem address from one processor does not map to another, and there is no global addres space across all processors - they operate independently. Programmer have to explicit define how data is distributed. Scalable model. To retrieve info from another processor's memory, a message must be sent over the network. Programmer responsible for many details in communication, data distribution, synchronization. Cluster: hybrid distributed-shared memory model. Type of parallel distributed system that consists of a collection of interconnected whole computers, used as a single unified machine.

How can we classify the networks and what are the differences?

Four major domains: - OCN: on-chip network - interconnect micro-architectural units, register files, caches, compute tiles, processors. Distance: cm - SAN: System/Storage Area Network: interconnect multiprocessor and multicomputer systems; interprocessor and processor-memory. Present in server and data-center environments. Storage and I/O components. Distance: few hundred meters (Myrinet) - LAN: Local Area Networks: interconnect autonomous computers. Hundreads (thousands with bridging) of devices. Distance: few kilometers. - WAN: Wide Area Network: interconnect systems distributed acros the globe. Millions of devices.

What is a Virtual Organization?

Set of individuals and/or institutions defined by high-controlled rules: clearly defining what is shared, who is allowed to share, conditions the sharing occurs.

What is Public Key Infrastructure?

Security infrastructure system. Provides symmetric and asymmetric crypto systems for secure messaging and digital signature for data integrity.

What is a digital certificate?

Digital certificate is a digital document that certifies that a certain PK is owned by a particular user. The PKI manages digital certificates. Certificate authority is an entity that issues certificates, trusted by both.

Exam Questions Flashcards

(35 cards)