9 - 11 Flashcards

1
Q

AGGREGATE FUNCTIONS FOR DATA OUTPUT

A

Selecting Literals, Functions, and Calculated Columns

The first of these is a number of built-in functions: COUNT, SUM, MAX, MIN, and AVG.

The COUNT function returns the number of tuples or values as specified in a query.
The functions SUM, MAX, MIN, and AVG are applied to a set or multiset of numeric values and return, respectively, the sum, maximum value, minimum value, and average (mean) of those values.
These functions can be used in the SELECT-clause or in a HAVING-clause
The functions MAX and MIN can also be used with attributes that have nonnumeric domains if the domain values have a total ordering among one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

OPERATORS FOR DATA MANIPULATION

A

Procedural Operator

In the procedural approach, we have to state which operations are to be performed on data, and the order in which they are to be performed – we describe how the result is to be computed.

Declarative operators

In the declarative approach, we do not specify explicitly how the result should be computed. Instead, we describe what the result looks like. We do this by stating the conditions that the result should satisfy.

In other words:
procedural system, we tell the system what to do (e.g. arithmetic operations, ordering, substring comparison)
in a declarative system, we tell it what result we want (e.g. LIKE, DISTINCT, IN, EXISTS etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WINDOW FUNCTIONS

A

Window functions perform calculations across a set of table rows related to the current row, providing more granular insights compared to aggregate functions.

Common Functions:

ROW_NUMBER(): Assigns a unique number to each row in a partition.
RANK(): Assigns a rank to each row within a partition, with gaps for tied ranks.
DENSE_RANK(): Similar to RANK() but without gaps for ties.
NTILE(n): Divides rows into n equally sized groups.
LAG()/LEAD(): Access data from preceding or following rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RECURSIVE QUERIES

A

Recursive Common Table Expressions (CTEs) allow queries to call themselves, enabling hierarchical or iterative data analysis.

Structure:
Anchor Clause: Defines the base result set.
Recursive Clause: Refers to the CTE itself to build upon the base results.

Applications:
Analysing organizational hierarchies.
Traversing parent-child relationships in a tree structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PERFORMANCE

A

Applications can be made to run significantly faster by performance tuning, which consists of finding and eliminating bottlenecks and adding appropriate hardware such as memory or disks. There are many things an application developer can do to tune the application, and there are things that a database-system administrator can do to speed up processing for an application.

Benchmarks are standardized sets of tasks that help to characterize the performance of database systems. They are useful to get a rough idea of the hardware and software requirements of an application, even before the application is built.

Tuning the performance of a system involves adjusting various parameters and design choices to improve its performance for a specific application. Various aspects of a database-system design — ranging from high-level aspects such as the schema and transaction design to database parameters such as buffer sizes, down to hardware issues such as number of disks — affect the performance of an application. Each of these aspects can be adjusted so that performance is improved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WHY TUNING?

A

Applications sometimes exhibit poor performance, with queries taking a long time to complete, leading to users being unable to carry out tasks that they need to do. We describe a few real-world examples that we have seen, including their causes and how tuning fixed the problems.

Users can experiencing long delays and time-outs in the web applications.
The CPU usage can be very high, with negligible disk and network usage.
Queries running on the database can poorly perform
a simple lookup query on a large relation is using a full relation scan, which is quite expensive.
an unnecessarily complicated query, with several nested subqueries, and the optimizer produced a bad plan for the query.
while the database performs fine under light load during testing, it might completely stop working when subjected to heavy load when it is used by actual users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

BOTTLENECKS

A

For simple programs, the time spent in each region of the code determines the overall execution time. However, database systems are much more complex, and query execution involves not only CPU time, but also disk I/O and network communication. A first step in diagnosing problems to use monitoring tools provided by operating systems to find the usage level of the CPU, disks, and network links.

It is also important to monitor the database itself, to find out what is happening in the database system. For example, most databases provide ways to find out which queries (or query templates, where the same query is executed repeatedly with different constants) are taking up the maximum resources, such as CPU, diskI/O, or network capacity.

In addition to hardware resource bottlenecks, poor performance in a database system may potentially be due to contention on locks, where transactions wait in lock queues for a long time.

Monitoring tools can help detect where the bottleneck lies (such as CPU, I/O, or locks), and to locate the queries that are causing the maximum performance problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MONITORING TOOLS

A

Most database systems provide view relations that can be queried to monitor database system performance. For example, PostgreSQL provides view relations pg stat statements to monitor resource usage of SQL statements
and lock contention respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DATABASE BACKUP AND RECOVERY

A

Most of the DBMSs provide facilities for recovering from hardware or software failures. The backup and recovery subsystem of the DBMS is responsible for recovery.
If the computer system fails in the middle of a complex update program, the recovery subsystem is responsible for making sure that the database is restored to the state it was in before the program started executing.
Alternatively, the recovery subsystem could ensure that the program is resumed from the point at which it was interrupted so that its full effect is recorded in the database.

Backup

A backup is a copy of important data that is stored on an alternative location, so it can be recovered if deleted or it becomes corrupted. Data should be backed up at an interval determined by how often the data changes, how valuable it is, and how long it takes to perform the backup.
Full backup
Incremental

Recovery

Recovery is the rebuilding of a database, table or record space after a problem such as media or storage failure, power interruption, or application failure. If you have backed up your database, or individual table spaces, you can rebuild them should they become damaged or corrupted in some way.

Crash recovery protects a database from being left in an inconsistent, or unusable, state when transactions (also called units of work) are interrupted unexpectedly.
Disaster recovery consist of the process to restore a database in the event of a fire, earthquake, vandalism, or other catastrophic events.
Version recovery is the restoration of a previous version of the database, using an image that was created during a backup operation (roll back).
Roll forward recovery can be used to reapply changes that were made by transactions that were committed after a backup was made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RECOVERY TECHNIQUES

A

Two situations when we need recovery

If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a disk crash, the recovery method restores a past copy of the database that was backed up to archival storage and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed up log, up to the time of failure.

When the database is not physically damaged but has become inconsistent due to non-catastrophic failures, the strategy is to reverse any changes that caused the inconsistency by undoing some operations. It may also be necessary to redo some operations in order to restore a consistent state of the database. In this case we do not need a complete archival copy of the database. Rather, the entries kept in the on-line system log are consulted during recovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ETHICAL ISSUES IN COMPUTING (1)

A

Intellectual Property

What counts as intellectual property
“Intellectual property is something that you create using your mind - for example, a story, an invention, an artistic work or a symbol.”
-Source: Gov.UK

You own intellectual property if you:

Created it (and it meets the requirements for copyright, a patent or a design)
Bought intellectual property rights from the creator or a previous owner
Have a brand that could be a trade mark, for example, a well-known product name

Intellectual property can:

Have more than one owner
Belong to people or businesses
Be sold or transferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Copyleft

A

“To copyleft a program, we first state that it is copyrighted; then we add distribution terms, which are a legal instrument that gives everyone the rights to use, modify, and redistribute the program’s code, or any program derived from it, but only if the distribution terms are unchanged. Thus, the code and the freedoms become legally inseparable.”

A related but separate movement is the open source movement, which was founded by Eric Raymond and Bruce Perens in 1998 as the Open Source Initiative.

Open source software includes the MySQL DBMS, Open Office, LibreOffice, Linux, Apache, PHP, Perl, and many other products. These products co-exist alongside competing proprietary software, and some of them are distributed and supported by major distributors of proprietary software, using a GNU license.
A major difference between open source and free software is that programmers who create new programs using open source can make their products proprietary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Privacy

A

Individual privacy is challenged by the widespread use of computer technology to collect and share personal information that is stored in databases. The technology for creating and managing big data has made it possible to collect and exploit an almost unlimited amount of information about individuals.

Much of the data that people provide about themselves in the regular course of their lives ends up in databases.

School records, which contain personal information such as name, address, date of birth, and details about performance in school, are often kept in a database.
Employment information about goals, job history, educational background, and references may be stored in a database long after they have completed the application process.
When people file their income tax returns, they provide detailed information about the sources and amounts of their income, as well as name, address, and national insurance number.

Personal identity
Location
Habits
Beliefs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PRIVACY LEGISLATION IN UK

A

UK’s Data Protection Act 2018, is the UK version and implementation of the General Data Protection Regulation (GDPR) of European Union known as UK GDPR and controls how the personal information is used by organisations, businesses or the government.

Everyone responsible for using personal data has to follow strict rules called ‘data protection principles’. They must make sure the information is:

used fairly, lawfully and transparently
used for specified, explicit purposes
used in a way that is adequate, relevant and limited to only what is necessary
accurate and, where necessary, kept up to date
kept for no longer than is necessary
handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage

There is stronger legal protection for more sensitive information, such as:

race
ethnic background
political opinions
religious beliefs
trade union membership
genetics
biometrics (where used for identification)
health
sex life or orientation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ETHICAL STANDARDS FOR COMPUTING PROFESSIONALS

A

In their work, professionals encounter situations that require moral judgments. Many professions have codes of ethics that guide their members’
actions in such situations.

The Association for Computing Machinery (ACM) & Electrical and Electronics Engineers (IEEE) publishes a Code of Ethics and Professional Conduct that outlines general moral imperatives as well as professional and leadership responsibilities for computing professionals.

The Council of European Professional Informatics Societies (CEPIS) where British Computer Society (BCS) is member maintains an ethics repository containing the codes of ethics and codes of conduct.

Other Rules of Conduct

Microsoft’s Standards of Business Conduct
Oracle’s Code of Ethics and Business Conduct
IBM’s Code of Conduct.
…and many more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SOCIAL ISSUES – HUMAN FACTOR

A

A human factors system engineering approach includes consideration of the user at every stage of the system project. A typical system project can include these stages:

Conceptualization - Designers identify the objectives of the system and define specifications for it.

Definition - Designers define the system, including any subsystems. They describe the environment in which it will be used and the way it will be used.

Overall design - Designers identify all system components and their functions. Components include hardware, software, and human components. They analyse tasks and requirements.

Interface design - Designers use appropriate design standards to create user interfaces. They apply design principles, statistical data, mathematical modelling, results of empirical studies, and experience.

Development - Developers implement the system, test it, and modify it if indicated by the testing.

Deployment - When the system is deployed, developers evaluate its use and make modifications as needed.

Maintenance - Once the system is operational, procedures for continued evaluation, system evolution, and improvement are put in place.

17
Q

THE HUMAN–DATABASE INTERFACE

A

The ultimate measure of the success of a database project is the extent to which the users actually use it to obtain the information they need. A database developer normally spends a considerable amount of time making sure that the database is useful—that it actually satisfies the information needs of the enterprise.

Design / Development Process

The designer interviews users to determine their data needs
Designs the conceptual model
Applies techniques such as normalization to ensure the quality of the model,
Chooses the most appropriate database management system,
Create the logical mapping
Designs the internal model,
Develop the database and create the system around

The designer may be satisfied if the database itself is well designed and highly efficient. However, if it is not usable because of a poor user interface, end users will not be satisfied and will be disinclined to use the database.

Usability is an aspect of quality that merits the same kind of careful attention that the rest of database design does. When designing for interactive users, database designers should consider the elements of interactive design, which include the use of metaphors (e.g., desktop look and feel), the quality and consistency of navigational cues, the quality of error messages, effective feedback, and visual components such as layout, fonts, and colour.

The layout should be consistent from screen to screen, the screen should not be too crowded, the colours and fonts should enhance readability, and the language should be unambiguous, simple, clear, and appropriate for the users.

18
Q

UTILITY vs USABILITY

A

Utility – How functional is the system

Usability – How easy is to follow, understand and use the system by the user

Without utility, the database does not provide what users need, and without usability, users cannot get what they need because the user interface is too difficult.

Five quality components of usability:

Learnability refers to the ease with which users can accomplish tasks the first time they see the design.
Efficiency refers to how quickly users can accomplish tasks once they have mastered the design.
Memorability refers to how easily users can re-establish proficiency in using the design after a period of not using it.
Errors refers to the number and severity of errors that users make, and how easily they can recover from them.
Satisfaction refers to how pleasant it is to use the design.
Utility – How functional is the system

Usability – How easy is to follow, understand and use the system by the user

Without utility, the database does not provide what users need, and without usability, users cannot get what they need because the user interface is too difficult.

Five quality components of usability:

Learnability refers to the ease with which users can accomplish tasks the first time they see the design.
Efficiency refers to how quickly users can accomplish tasks once they have mastered the design.
Memorability refers to how easily users can re-establish proficiency in using the design after a period of not using it.
Errors refers to the number and severity of errors that users make, and how easily they can recover from them.
Satisfaction refers to how pleasant it is to use the design.

19
Q

GENERAL GOOD PRACTICES IN DATABASES DEVELOPMENT

A

Designing a database the right way requires some analysis of your data and planning around how you want to structure it. There are many different database types, models, and customizations you can use to achieve your goals.

Good database design is driven by several core principles:

Minimize redundancy: To save resources, create an efficient database, and simplify how the database works, data redundancy is minimized and duplication is avoided. However, the performance must be considers hence some trade-offs must be decided (e.g. de-normalisation)

Protect accuracy: Your database should keep information accurate and reduce the likelihood of accidentally damaging information.

Be accessible: The business intelligence systems that need reading and writing access should have it. Your database should provide access while also protecting data security.

Meet expectations: Of course, you keep a database to fulfil a specific purpose—so the database design must successfully support your data processing expectations.