Introduction to databases Flashcards
What is Data, Information and Knowledge?
Data can be defined as symbols or facts, of a qualitative or quantitative type, that
represent properties of objects, e.g., the number 10
▪ Information is data compiled to derive meaningful inferences. It is structured,
processed and presented with an assigned meaning
• E.g., the number 10 from the data example gains a context if we say “10 km”
————————————–
10 km on its own doesn’t mean much though. Is it the length of a road? The distance
between two points? Or something else entirely?
Hence, we apply knowledge. Knowledge is your own (or a collective) expertise used to
infer results from information
• E.g., 10 km is a walkable distance
What is the main difference between Data, Information and Knowledge?
The main difference between the three types is their level of abstraction. Data
being the lowest (or most concrete) type and knowledge being the highest (or
most abstract) type.
Why is it important to make distinctions between Data, Information and Knowledge?
These distinctions are important:
• Because we need to understand the relationship between each type to make sense of
the purpose of a database
• E.g., we use databases to store data. However, this means that we still need someone
to make sense of said data (information) as well as utilize it in the intended way
(knowledge).
What different types of Data exists? And what is the difference between them.
Structured data and Unstructured data.
▪ Data that resides in a fixed field within a record or file is called structured data.
This includes data contained in relational databases and spreadsheets. E.g., a cell
in a spreadsheet may contain the number 10
Structured data depends on creating a data model – a model of the types of data
that will be recorded as well as how they will be stored, processed and accessed
We might for example have a spreadsheet containing students and their attributes. Each
attribute in a row, combined, makes out a single student. I.e., our data model
▪ Unstructured data, however, is all those things that cannot be so readily classified
and fit into a neat box: photos/graphic images, videos, webpages, documents etc.
What language do you use to manage structured data?
▪ Structured data, in the context of databases, is often managed using Structured Query Language (SQL)
Where can unstructed data be stored?
▪ Unstructured data can be stored in non-relational databases (MongoDB,
JanusGraph etc.)
What is semi-structured data?
Semi-structured data is information that doesn’t consist of purely
structured data whilst still retaining some structure. E.g., we could store an
employee in a JSON-format rather than different columns in a spreadsheet
▪ “Some structure” basically means that we’ll have some type of markers or tags
which can be used to identify elements within the data (much like structured
data), but it doesn’t have the same rigid structure
▪ For example, an email will have a sender, a recipient, a subject, a message text
and other fixed fields (structured data). But we can also attach an image or a file
(unstructured data) to our email before sending it
▪ As it’s neither entirely structured or unstructured, we achieve semi-structured
data
What defines lists or spreadsheets?
In lists or spreadsheets, each row of data is intended to stand on its own. I.e., we’re allowed to
have rows with duplicated information
What are spreadsheets good and bad for?
▪Spreadsheets are good for:
We can easily sort a column based on the values in the cells. E.g., we can sort by student- or
course names if we wanted to
They’re also good for storing data. The example used is a small one, but we could obviously
add thousands of rows without a problem
- Spreadsheets are bad for:
At some point we will run out of RAM
However - way before we run out of RAM - we’ll also start to note just how much longer it
takes to simply open the spreadsheet, never mind searching for a specific data cell
What purposes does Databases fulfill that is not so obvious?
• To provide an organizational structure for data
• To provide a mechanism for creating (C), reading (R), updating (U) and deleting (D)
data, i.e., CRUD operations
what does CRUD stand for?
creating (C), reading (R), updating (U) and deleting (D)
data
How much data can be stored in a relational database?
There is no real limit to how much data can be stored in a relational database, you can
generally always add more storage as you go
what are the downsides to using a database over spreadsheets?
▪ In short, it requires more technical expertise to work with (it is quite easy to
simply open a new sheet in Google Sheets and list away, comparatively)
▪ In some cases, it might also be redundant. If you know you’re going to be working
with a small set of data, why bother establishing a database for that purpose
when you might as well work with a smaller CSV-file (or an equivalent to that)?
What is a DBMS (database management system)?
▪ A DBMS (database management system) is a program used to manage a database. This means that it – among
other things - provides a user with an interface to perform various CRUDoperations on a database
• It also provides protection and security to the database itself and makes sure we don’t run
into issues when, e.g., multiple users are working with the database at the same time
• I.e., it helps us maintain data integrity (the opposite of data corruption)
What is it that an RDBMS (relational database management system) does in addition to the DBMS (database management system) functionality?
A RDBMS will, in addition to the DBMS functionality, also automatically
keep track of the relationships between our data (or rather, our tables/relational
data models)
What is essential when working with databases?
Planning is essential when working with databases
What the three general phases to designing a database?
The three general phases to designing a database is: Conceptual design, logical
design and a physical design
In the first phase of designing a database, Conceptual design. What do we focus on there?
Conceptual design:
• A design phase where we focus on developing conceptual models of the data used in an organization.
I.e., we conceptualize the relational data models and their attributes as entities
In the second phase of designing a database,
Logical design. What do we focus on there?
Logical design:
• In the second database design phase (i.e., logical design) we focus on translating
the ER model(conceptual model) to a set of conceptual relational tables called schemas
In the third and last phase of designing a database,
Physical design. What do we focus on there?
Physical design:
• In this final phase, we create actual tables in an actual database using SQL
What does a three-tier/layer architecture consists of?
A three-tier/layer architecture consists of a presentation layer, a logic layer and a
data layer
• The presentation layer represents the UI or view, i.e., what the end-user sees and interacts
with
• The logic layer acts as the “business layer”. It is here that we’ll coordinate the application by
running calculations, fetching and sending data from the database etc. I.e., it acts as the middleman between our UI and the database
• The data layer represents the information source, e.g., our database or file system