Introduction to databases Flashcards
What is Data, Information and Knowledge?
Data can be defined as symbols or facts, of a qualitative or quantitative type, that
represent properties of objects, e.g., the number 10
▪ Information is data compiled to derive meaningful inferences. It is structured,
processed and presented with an assigned meaning
• E.g., the number 10 from the data example gains a context if we say “10 km”
————————————–
10 km on its own doesn’t mean much though. Is it the length of a road? The distance
between two points? Or something else entirely?
Hence, we apply knowledge. Knowledge is your own (or a collective) expertise used to
infer results from information
• E.g., 10 km is a walkable distance
What is the main difference between Data, Information and Knowledge?
The main difference between the three types is their level of abstraction. Data
being the lowest (or most concrete) type and knowledge being the highest (or
most abstract) type.
Why is it important to make distinctions between Data, Information and Knowledge?
These distinctions are important:
• Because we need to understand the relationship between each type to make sense of
the purpose of a database
• E.g., we use databases to store data. However, this means that we still need someone
to make sense of said data (information) as well as utilize it in the intended way
(knowledge).
What different types of Data exists? And what is the difference between them.
Structured data and Unstructured data.
▪ Data that resides in a fixed field within a record or file is called structured data.
This includes data contained in relational databases and spreadsheets. E.g., a cell
in a spreadsheet may contain the number 10
Structured data depends on creating a data model – a model of the types of data
that will be recorded as well as how they will be stored, processed and accessed
We might for example have a spreadsheet containing students and their attributes. Each
attribute in a row, combined, makes out a single student. I.e., our data model
▪ Unstructured data, however, is all those things that cannot be so readily classified
and fit into a neat box: photos/graphic images, videos, webpages, documents etc.
What language do you use to manage structured data?
▪ Structured data, in the context of databases, is often managed using Structured Query Language (SQL)
Where can unstructed data be stored?
▪ Unstructured data can be stored in non-relational databases (MongoDB,
JanusGraph etc.)
What is semi-structured data?
Semi-structured data is information that doesn’t consist of purely
structured data whilst still retaining some structure. E.g., we could store an
employee in a JSON-format rather than different columns in a spreadsheet
▪ “Some structure” basically means that we’ll have some type of markers or tags
which can be used to identify elements within the data (much like structured
data), but it doesn’t have the same rigid structure
▪ For example, an email will have a sender, a recipient, a subject, a message text
and other fixed fields (structured data). But we can also attach an image or a file
(unstructured data) to our email before sending it
▪ As it’s neither entirely structured or unstructured, we achieve semi-structured
data
What defines lists or spreadsheets?
In lists or spreadsheets, each row of data is intended to stand on its own. I.e., we’re allowed to
have rows with duplicated information
What are spreadsheets good and bad for?
▪Spreadsheets are good for:
We can easily sort a column based on the values in the cells. E.g., we can sort by student- or
course names if we wanted to
They’re also good for storing data. The example used is a small one, but we could obviously
add thousands of rows without a problem
- Spreadsheets are bad for:
At some point we will run out of RAM
However - way before we run out of RAM - we’ll also start to note just how much longer it
takes to simply open the spreadsheet, never mind searching for a specific data cell
What purposes does Databases fulfill that is not so obvious?
• To provide an organizational structure for data
• To provide a mechanism for creating (C), reading (R), updating (U) and deleting (D)
data, i.e., CRUD operations
what does CRUD stand for?
creating (C), reading (R), updating (U) and deleting (D)
data
How much data can be stored in a relational database?
There is no real limit to how much data can be stored in a relational database, you can
generally always add more storage as you go
what are the downsides to using a database over spreadsheets?
▪ In short, it requires more technical expertise to work with (it is quite easy to
simply open a new sheet in Google Sheets and list away, comparatively)
▪ In some cases, it might also be redundant. If you know you’re going to be working
with a small set of data, why bother establishing a database for that purpose
when you might as well work with a smaller CSV-file (or an equivalent to that)?
What is a DBMS (database management system)?
▪ A DBMS (database management system) is a program used to manage a database. This means that it – among
other things - provides a user with an interface to perform various CRUDoperations on a database
• It also provides protection and security to the database itself and makes sure we don’t run
into issues when, e.g., multiple users are working with the database at the same time
• I.e., it helps us maintain data integrity (the opposite of data corruption)
What is it that an RDBMS (relational database management system) does in addition to the DBMS (database management system) functionality?
A RDBMS will, in addition to the DBMS functionality, also automatically
keep track of the relationships between our data (or rather, our tables/relational
data models)