Chapter 1 Flashcards

1
Q

What is a database?

A

collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is database management system?

A

A software system that enables users to define,create, maintain and control access to the database

Allows users to insert,update,delete, and retrieve from the database through Data Manipulation Language(DML)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the roles in the database environent?

A
Data Admin-responsible for the management of the data resource, including planning,development and maintenance of standards,policies,conceptual and logical design
Database Admin(DBA)- responsible for the physical realization of the databasement including physical database design and implementation,security and integrity control,maintnence of the operaitonal system, and ensuring satisfactory performance of the application of users. It is more technicallly oriented than the DA.

Logical Database Designer-Considered with identifying the data,relationships between the data, and constraints on the data
Needs to have a thorough and complete understanding of the organization’s data and constraints on the data
Physical Database Designer-decides how the logical database design is to be physically realized
Mapping the logical database design into a set of tables and constraints
Selecting specific storage structures and access methods
Designing any security measures
1.4.4 End-Users
Clients of the database-can be classified accordingly
Naieve users-typically unaware of DBMS such as a clerk
Sophisticated Users-familiar with the structure of the db and facilities offered by the dBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

pros and cons of DBMS?

A

Control of Data redundancy- traditional file-based systems waste space by storing the same information in more than one file
Attempts to eliminate redundancy by integrating the files so that multiple copies of the same data are stored
Controls the amount
Data consistency- by eliminating the redundancy, we reduce the risk of inconsistencies occurring
If a data item is stored only once in the database, any update to its values must be only performed once
If a data item is stored more than once and the system is aware of this, the system can ensure that all copies of the items are kept consistent- no automatica way
More information from the same amount of data-may be possible to derive additional information from the data
More communication between networks
Sharing of data-belongs to the entire organization and can be shared by all authorized users
New applications can build on the existing data in the database and add only data that isnt stored
Improved Data Integrity- Validity and consistency of stored data
Usually expressed in terms of constraints ->consitency rules that the database is not permitted to violate
Improved security- protection of db from unauthorized users
Can be things like usernames and passwords and views
Improved backup and recovery services- provide facilities to minimize the amount of processing that is lost following a failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the cons of a dbms?

A

omplexity- failure to understand the system can lead to bad design decisions
Cost of DBMS- can be taxed
Performance- file system is more specific so it can have better performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the three levels of database architecture?

A

The users view of the database. THis level describes that part of the database that is relevant to each user
Consists of a number of different external views of the database
Different views may have different representations of the same data
2.1.2 Conceptual Level
The community view of the database. THis level describes what data is stored in the database and the relationships among the data
Contains the logical structure of the entire database as seen by the DBA
Represents all entities,attributes,relationships,constraints,semantic information on the data,etc
2.1.3 Internal Level
The physical representation of the database on the computer - describes HOW the data is stored in the database
Covers the physical implementation of the database to achieve optimal runtime performance and storage space utilization
Interfaces with the os access methods to place the data on storage devices,build the indexes, retrieve data etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

whatt are the three types of database schema?

A

external schema-correspond to different views of data
conceptual schema-describes all the entities, attributes, and relationships with the integrity constraints
internal schema-
complete description of the internal model,containing defintions of things like stored records etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the bottom up approach ?

A

Approach to database design that starts off at the very fundamental levels - properties of entities and relationships -then with that you build more and more- complexity. Normalization is a bottom up approach
This style works well with a small database ;however, as you start adding more and more attributes then it starts to get more complex and this style doesnt work as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the top down approach?

A

Basically, you start off the database with a few amount of entities,relationships, and the associated attributes then you apply succesive top down approaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the physical database design more in-depth

A

basically this deals with how we implement the logical step of the database design
create relational tables,use methods to achieve maximum efficiency, designing security protocols for the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why should the physical database design be different than the logical or the conceptual?

A

It deals with the what- not the how. Performed in a different time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the steps of choosing a dbms?

A

Determine the terms of reference, basically stating the objectives and getting a set of criteria to evaluate against

  1. Shortlist two or three products
  2. Evaluate products against some set of criteria
  3. Choose that shit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is transaction design, why is it important?

A

a transaction is an action that a user does that affects the accesses or changes the database. the rdbms makes sure that once a transaction happens, it is stored and follows the consistencies set by the constraints. There are a couple major things you have to define and document

  1. data to be used by the transaction
  2. functional characteristics of the transaction
  3. output of the transaction
  4. importance of the users
  5. expected rate of usage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are three major types of transactions ?

A
  1. Retrieval-,update, and mixed.. Pretty self explanatory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some important questions to ask when testing your rdbms?

A

Learnability: How long does it take a new user to become productive with the system?
• Performance: How well does the system response match the user’s work practice?
• Robustness: How tolerant is the system of user error? • Recoverability: How good is the system at recovering from user errors?
• Adapatability: How closely is the system tied to a single model of work?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the steps of the rdbms process?

A

The database is a fundamental component of an information system, and its development and usage should be viewed from the perspective of the wider requirements of the organization. Therefore, the lifecycle of an organizational information system is inherently linked to the lifecycle of the database that supports it.

Connolly, Thomas. Database Systems (p. 324). Pearson Education. Kindle Edition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a multi valued attribute?

A

can be broken down into multiples parts- for exmaple a street addy 7141 bob street = 3 different words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a derived attribute?

A

can be derived from other entities not necesarrily the same one as the attribute. for example rent duration is rent start from rent finish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is a candidate key

A

minimal amount of attributes that uniquely identify an entry. this allows implies that the candidate key can not be a null value. for example,staff ID is unique to every member but u also need it to identify the person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a primary key?

A

it is the candidate key that is chosen to uniquely identify the person. an alternative key is a backup , for example, if a staff id had max 4 characters and that got maxed out and then their backup id was 6, it would go from 4 to 6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a composite key?

A

the key of an entity sometimes needs multiple keys to identify it so youd use on here

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is a strong entity type?

A

an entity type that is not existent on another entity. a characteristic is that each entity occurence can be uniquely identiifed by the primary key of that column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a weak entity type?

A

An entity type that is dependent on the existence of another entity. It also can not be uniquely identified by the primary key , one example of this is preference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is multiplicity and some examples?

A

Multiplicity is an example of one of the relationship constraints on entities that are used pretty often to maintain the integrity of the database.
An example of this is that a property for rent must have an owner or a staff must have a branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are binary relationships usually referred as?

A

one to one, one to many , many to many

26
Q

Examples of one to one , one to many , many to many?

A

a member of staff manages a branch (1:1);
• a member of staff oversees properties for rent (1:);
• newspapers advertise properties for rent (
:*).

27
Q

What are the two constraints of multiplicity actually called?

A

Cardinality and participation

28
Q

What is cardinality?

A

Cardinality refers to the maximum amount of relationships for an entity participating in an entity relationship role

29
Q

What are fan traps?

A

When there is a relationship between entities but the pathway between the two is kind of ambiguous

30
Q

what are chasm traps?

A

when the model suggests an existence of a relationship between two entities but there actually isnt a pathway between the two

31
Q

What is the reason for normalization?

A

The reason for normalization is to provide a suitable set of relationships given the data requirements of an enterprise. Some attributes that you want are minimal redundancies, attributes with close logical relationships (dependencies), and the minimal amount of attributes

32
Q

What are some reasons to minimize data redundancy?

A

updates to the data stored in the database are done with a minimal amount of operations, reducing the chance of data inconsistencies. You can also reduce file storage which makes the DB cheaper :)

33
Q

What are the three kind of update anomalies? Why do they happen?

A

Insertion,update, and modificaiton. They happen when you have redundant data.

34
Q

What are functional dependencies?

A

They are the relationships between attributes and they are important because they help us to define noramlization

35
Q

What normal form do you absolutely need? What forms don’t you need? What “form” is the min recommended and why?

A

You need first normal form, you don’t Need anything after that . the minimum form recomennded is 3 because it helps with data redundancy and makes the DB less at risk for update anomalies

36
Q

What is first normal form?

A

First normal form is one of the steps of the normalization process. It revolves around taking data from whatever standard format it is in and making sure you put them into rows and columns with only one data entry in each area. it is atomic. Make sure to get rid of the repeating groups which are basically data entries.

37
Q

What is second normal form?

A

Second normal form has to do with composite keys. It is basically saying that every entry in that table relies on both the primary keys in the composite key to be identified

38
Q

what is third normal form?

A

can only happen when there are 3 or more attributes - means that an entry doesn’t depend on the primary key in that row.-transitive property

A relation that is in first and second normal form and in which no non-candidate-key attribute is transitively dependent on any candidate key.

Connolly, Thomas. Database Systems (p. 426). Pearson Education. Kindle Edition.

39
Q

What does referential integrity mean?

A

if the foreign key refers to a value, that parent value must exist

40
Q

what is a cascade delete?

A

if you delete one row, the rows that have foreign keys that refer to that row will be deleted, then the rows that depend on those rows will be deleted, and so on. ina CASCADATING manner

41
Q

What is a set null in term?

A

If you delete the parent row, then it sets the foreign keys that depend on the table to NULL

42
Q

what is set default?

A

if you delete the row from the parent table, then it sets the foreign key that references that number to some default value

43
Q

describe schema in DB

A

According to the ISO standard, relations and other database objects exist in an environment. Among other things, each environment consists of one or more catalogs, and each catalog consists of a set of schemas. A schema is a named collection of database objects that are in some way related to one another (all the objects in the database are described in one schema or another). The objects in a schema can be tables, views, domains, assertions, collations, translations, and character sets. All the objects in a schema have the same owner and share a number of defaults. The standard leaves the mechanism for creating and destroying catalogs as

Connolly, Thomas. Database Systems (p. 196). Pearson Education. Kindle Edition.

44
Q

what is an index?

A

An index is a structure that provides accelerated access to the rows of a table based on the values of one or more columns (see Appendix F for a discussion of indexes and how they may be used to improve the efficiency of data retrievals). The presence of an index can significantly improve the performance of a query. However, as indexes may be updated by the system every time the underlying tables are updated, additional overheads may be incurred. Indexes are usually created to satisfy particular search criteria after the table has been in use for some time and has grown in size.

45
Q

What is a view?

A

dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request. To the database user, a view appears just like a real table, with a set of named columns and rows of data. However, unlike a base table, a view does not necessarily exist in the database as a stored set of data values. Instead, a view is defined as a query on one or more base tables or views. The DBMS stores the definition of the view in the database. When the DBMS encounters a reference to a view, one approach is to look up this definition and translate the request into an equivalent request against the source tables of the view and then perform the equivalent request.

46
Q

What is view materialization?

A

View materialization is when you create a temporary table. after this, the queries are a lot faster

47
Q

What are the two types of ways that you can restrict control?

A

Discretionary access contrl-each user is giving certain rights to DB objects - this type of mechanism can be circumvented by devious unauthorized users tricking authorized users
Mandatory access control-certain tables are given certain clearence levels and each user is given a clearance level “top secret med etc”

48
Q

What are the two main methodologies in designing a data warehouse?

A

Kimball’s Business Dimensional Lifecycle and Inmons corporate information lifecycle

49
Q

Describe Inmons Corporate Information Lifecycle

A

dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request. To the database user, a view appears just like a real table, with a set of named columns and rows of data. However, unlike a base table, a view does not necessarily exist in the database as a stored set of data values. Instead, a view is defined as a query on one or more base tables or views. The DBMS stores the definition of the view in the database. When the DBMS encounters a reference to a view, one approach is to look up this definition and translate the request into an equivalent request against the source tables of the view and then perform the equivalent request.

50
Q

Describe Kimballs Business Dimensional Lifecycle

A

Kimball’s approach uses new methods and techniques in the development of an EDW. Kimball starts by identifying the information requirements (referred to as analytical themes) and associated business processes of the enterprise. This activity results in the creation of a critical document called a Data Warehouse Bus Matrix. The matrix lists all of the key business processes of an enterprise together with an indication of how these processes are to be analyzed. The matrix is used to facilitate the selection and development of the first database (data mart) to meet the information requirements of a particular group of users of the enterprise. This first data mart is critical in setting the scene for the later integration of other data marts as they come online. The integration of data marts ultimately leads to the development of an EDW. Kimball uses a new technique called dimensionality modeling to establish the data model (referred to as a dimensional model (DM) for each data mart. Dimensionality modeling results in the creation of a dimensional model (commonly called a star schema) for each data mart that is highly denormalized. Kimball believes that the use of star schemas is a more intuitive way to model decision support data and furthermore can enhance performance for complex analytical queries. In Section 32.4, we describe dimensionality modeling and in Section 32.5 we illustrate how dimensionality modeling can be used to create data marts and ultimately an enterprise data warehouse using the DreamHome case

51
Q

What are some pros and cons of the kimball method ?

A

Some pros of the Kimball method for buidling data warehouses are that since its a scaled down project, its easier to show value with a smaller budget ;however, since its a sequential method of adding data marts, it can get confusing and you might end up never having a comprehensive view of the data. The kimball method is good for when you want to show information to a certain group soon but showing teh entire entreprise data can wait

52
Q

What are some pros and cons of the inmon method?

A

Some pros are that it has good potential to provide a comprehensive view of the entreprises data ;however, it is a large complex project that isnt guranteed to succeed and might take time to finish. You’d want to use this method if it is necessary to have the warehouse meet the enterprises needs

53
Q

Which is preferred?

A

Kimball is preferred if you’re neutral because it includes new methods and is a little more versatile than the inmon method- that shit is boomer shit lmao

54
Q

What are the guiding principles of the kimball method and the ultimate goal?

A

Meet the information requirements of an entreprise by having a single,integrated, easy to use infrastructure which is delivered 6-12 month range. The ultimate goal is to deliver the entire solution, ad hoc query tools, reporting applications, advanced analytics, and all the training and support for users

55
Q

What is dimensionality modeling? Describe it

A

A logical design technique that strives to present the data in a standard easy to read intuitive manner that allows for high performance access. Basically, each DM has one table called the fact table, and a set of smaller tables- dimension tables. Each dimension table has a simple (noncomposite) key that corresponds to one of the components of the composite key in the fact table. This is called a star structure or star schema .Another important feature is that all natural keys are replaced with surrogate keys.

56
Q

What can be done to speed up query speed on star schemas? When should it be done? When shouldn’t it be done?

A

you can denormalize the reference data into a single table. This is appropriate when u have a large number of entities that are related to the dimension table that you access frequently, by doing this you avoid having to do a bunch of joins all the time. It isn’t appropriate when the data isn’t accessed very often because the overhead of scanning the expanded dimension table may not be offset by the timed saved by not doing a join

57
Q

What is a starflake schema?

A

A fact table surrounded by normalized and demoralized tables so its a mixture,

58
Q

What benefits do dimensional models offer?

A

efficiency, ability to handle changing requirements, ability to model business problems, and predictable queries

59
Q

Compare the ER and DM model

A

DM models are used to design the database component of a data warehouse while an ER model is traditionally used to describe an OTLP system. ER modeling is a technique used for identifying relationships between entities. A major goal of ER modeling is to reduce redundancy because it speeds up the process of queries by a lot. But since it is so efficient, it really isnt that great for ad hoc analysis.

60
Q

What is OLAP?

A

Online analytical processing. These are tools that are good for the dyanmic synthesis, analysis, and consolidation of large amounts of data. They are good for answering not just who or what but why questions whichi s what separates them from otlp systems

61
Q

What are the three necessities for a system to be considered an OLAP?

A

Time intelligence ,support for complex calculations, and ,multidimensional views of data

62
Q

What are the physical representations of OLAP tools?

A

In the OLAP environment multidimensional data is represented as n-dimensional data cubes. An alternative representation for a data cube is as a lattice of cuboids.

Connolly, Thomas. Database Systems (p. 1265). Pearson Education. Kindle Edition.