BIM 6 Flashcards
What is data?
Data is raw facts, figures with no meaning
what is information?
information is a collection of data organized to have meaning
What is database technology?
a collection of related data organized to make it valuable and useful.
what is the relational database model?
a collection of tables to represent data and relationships of data.
What are columns/attributes/fields?
these are the columns of the table. they represent the different data categories
what are records/entity/instances/items?
these are rows of the table and the data.
what is a primary key?
to keep each row unique, we use a primary key to identify the different rows
what is a file?
these are the tables in the databases. the actual things data is in
What is a database?
the database includes the collection of files/tables containing information.
What is a data entity?
the things we store information about, this may be the people, places, objects. Basically a record/row of a table
What does SQl stand for?
sql stands for structured query language?
what does query mean?
query means a request for data/information from the database
What are the 3 languages/categories of SQL commands?
- Data Definition Language (DDL) - all about the structure of the table/file. It creates, deletes, and alters tables.
- Data Manipulation Language - DML is all about the content. It allows for editing, updating, inserting, etc
- Data Control Language - DCL is about access to the database
What are the DDL - creating tables commands?
CREATE TABLE tablename (columnname1 TEXT, column2, INTEGER)
What are the DDL data types?
- integer - whole numbers
- TEXT - text
- real - allows decimals
- blob - binary-large objects
renaming columnes
ALTER TABLE tablename RENAME COLUMN columnname TO newcolumnname
add columns
ALTER TABLE tablenname ADD columnname
using DML to manipulate table rows - add values into rows
INSERT INTO tablename VALUES (C1R1, …), (C1, R2)
INSERT INTO tablename (column1, column 2) VALUES (valueC1, …)
INSERT INTO [Phones] (PersonID, [Phone Number]) SELECT PersonID, Phone
From Person;
INSERT INTO table2
SELECT * FROM table1
WHERE condition;
Update values of rows
UPDATE tablename SET columnnameX=newvalue WHERE columnnameY=condition value
DML - reading rows
SELECT * FROM tablename WHERE columnname=conditionvalue
how do you order the columsn by order?
SELECT * FROM tablename WHERE condition=condition ORDER BY columnname ASC/DESC
what does lim and offset do?
lim limits the records by X amount and offset skips the first Y rows
how to join tables?
SELECT column FROM tablename JOIN table2 ON condition=condition
What is normalization?
the process of streamlining data to minimize redundancy and increase flexibility.
It is a technique to manage the trade-off between reducing data redundancy and ease of use.
What is data redundancy?
when the same piece of information occurs multiple times.
What is first normal form 1NF
there are 2 rules of 1NF
1. There are only atomic values and values in cells cannot be divided
2. Cells contain only 1 value from the attribute
What is 2NF
2NF is about reducing redundancy. The 2 rules are:
1. Follows 1NF
2. Does not have any partial dependencies in the case of composite keys. This partial dependency means that columns depend on both composite (primary) keys and not only one of them. Place depends only on tournament and not year. this is a partial dependency
What are composite keys?
Composite keys occur when there are multiple primary keys necessary to uniquely identify each row. So they are the multiple primary keys
what is 3NF
the rules of 3NF are
1. it follows 2Nf
2. There are no transitive dependencies. Transitive dependencies are columns that depend on columns that aren’t primary keys. They only relate to the primary keys as they depend on columns that depend on primary keys.
So rule 2 occurs if all columns depend on primary keys
What are entity-relationship diagrams?
ERDs describe the relationships between things of interest.
What 4 concepts does ERD rely on?
- Entity and entity sets
Entities are the rows/records while entity sets are the entire table. Entities describe the things of interest to store data on - Attributes
the columns and categories - Relationships and relationship sets
shows how entities are linked - Cardinalities
Cardinalities or relationship degrees represent the number of entities are associated with another tables entities
what are the 3 types of cardinalities?
- one to one 1:1
one entity is associated with one entity - one to many 1:M
one entitiy can be associated with many entities in Set B - many to many M;N
many entities in set A are associated with possible many entities in set B
what are the 3 models in making an ERD?
- Conceptual model
It is not detailed and just gives the files/tables that are the focus
About identifying what we want to store - Logical model
attributes and keys are attached. however, the problem with M:N is not fixed yet - Physical model
this is where the ERD is created, the database structure can be implemented and M:M relationships are fixed
What are the different ERD lines/conditions?
look at doc
how are many to many relationships solved?
solved through using an associate entity table. This creates a 1:M relationship and the table needs both primary keys from the tables as foreign keys as well.
What are bits and bytes?
Bits are the smallest unit of data that a computer can handle. 0 or 1.
Bytes are groups of 8 bits.
What is a primary and foreign key?
Primary key is the unique identifier for each row
Foreign keys are primary keys in other tables and mostly a look-up field to find data or align tables
What does select and join do?
Select creates a subset consisting of records in the file that meet the criteria
join combines tables
What is a blockchain?
A distributed database technology that enables firms to create transactions on a network without central authority. It stores transactions as a distributed ledger among computers. It maintains a continuously growing list of records called blocks. Each block contains a timestamp and link
What are the 3Vs of big data
- velocity - velocity of processing
- variety - wide variety
- volume - extreme volume of data
What is a datawarehouse?
data warehouses are databases that store data of potential interest. Data is available to anyone but it cannot be changed
What are data marts?
Data marts are subsets of data warehouses and summarize portions of data in a separate database for a specific population
What are datalakes? (hadoop)
a repository for raw, unstructured data that hasnt been analyzed yet
What is inmemory computing?
It relies on computer’s main memory and eliminates bottlenecks
What are some analytical tools?
- online analytical processing (OLAP)
Enables multidimensional data anlysis - Data mining
provides insight from data by finding hidden patterns and relationships in large databases - Text
tools that help businesses analyze big text data - sentiment analysis software
can mine text comments i emails, blogs, etc to determine opinions - web mining
the analysis of patterns from the webW
what are the types of data mining?
- Associations
occurences linked to a single event - Sequences
Events linked over time - Classifications
recognizes patterns to describe the group to which an item belongs by examining existing items that have been classified - clustesr
similar to classifications but no groups have been defined - forecasts
uses a series of existing values to forecast what other values will be
what is data governance?
encompasses policies through which data can be managed as organizational resources
what is data quality audit?
a structured survey on accuracy and level of completness of data
what is data cleansing
process of deleting and correctig data in database that is wrong