Databases Flashcards
What are databases?
Large collection of data organised for rapid search + retrieval (by comp). E.g. card index (punch cards storing data, like rolodex but in comp lang), flat file (Files store data in hierarchical data structure, analogue of physical storage) or relational (Entities can ref each other, more efficient as reduces dup, quicker search, can be hierarchal but also reps complex relationships).
What does a database system involve?
Schema (design/structure of data + how organised), what store + how related), DBMS (Database Management System, software used to store data efficiently, e.g. Access, MySQL), Query Language (domain specific language [DSL] to talk to DBMS, Insert, manipulate, search, extract e.g. SQL) + data itself.
What are the 3 levels of design?
Conceptual, logical, physical.
What are entities?
People, things, events + locations. Distinct thing with indep existence which we store data about. Logical thing, not specific e.g. dog = entity but Danny = instance of entity.
What are entity relationships?
We rep entities + relationships in Entity Relationship Diagram (ERD). Relationship is connection between entities. Cardinality of relationship is numerical relationship (3 flavours of cardinality) i.e. 1:1 (each entity has 1 related entity), 1:n (1 entity may have multiple related entities but entity on other side has only 1) + m:n (both entities can have many related entities)
What is an attribute?
Properties/characteristics of an entity, i.e. required/optional, simple/composite, single-valued/multi-valued, stored/derived, keys.
If an attribute is required, it cannot be blank or have a null value.
Simple vs Composite Attribute
Simple is self contained e.g. gender, whilst composite is made up of other attributes e.g. address.
Single vs Multi-Valued Attribute
Single means only 1 value e.g. DOB. Multi means multiple values e.g. skills.
Stored vs Derived Attribute
Stored are data we actually store e.g. DOB whereas derived are derived from stored e.g. age.
What is a primary key?
Attribute which uniquely identifies individual instances of an entity, quick way to find instance. Can’t be blank/null. E.g. student ID num.
What is a candidate key?
Attribute that could be a key, identified before choosing a primary key.
What is a composite key?
Combo of attributes that uniquely identifies individual instances of entity.
What does open source mean?
Software source code is provided + can be modified by anyone.
What is the difference between MySQL + MariaDB?
It’s a binary replacement with add+ features + dif licencing. MySQL more supported + MariaDB freer + more innovative.
What is a NoSQL DBMS?
‘Not Just SQL’ is increasingly pop approach to data storage, moves away from trad model of large tables where state (data consistency) closely controlled to large flexible databases focusing on availability. V. pop for large data + high availability (lots of users) applications. e.g. MongoDB.
What is ORM?
Object-Relational Mapping. Mapping of entities within software to entities within database (e.g. class in Java maps to table in database). Often done automatically – tables automatically created in database once code written.
What are the main data sources?
Social media platforms, pharmaceuticals, medicine & e-commerce (transport, banking, retail, construction, financial, insurance, energy).
What’s wrong with unstructured data?
Hard to use + hard to turn into info.
What is hierarchical structure?
Graph (tree-like structure). Each node may have 0+ children, each child has 1 parent, root note has no parent. 1st database model, created by IBM in 60s.
E.g. Windows registry stores low-level settings for Windows OS + apps using registry.
What are the pros + cons of hierarchical structure?
Pros: Clean + logical, fast to navigate, little data duplication, everything in place.
Cons: Many to Many not supported, rarely models real world.
What is set theory?
More flexible, items of data (nodes) can belong to 0+ sets, sets can contain 0+ items. Like Venn Diagram. Union (either/both), intersection (both) + set diff (\ in one, not other).
What is data mining?
Computerised process to discover patterns in data using combo of techniques from data science, machine learning, stats + DB systems. Involves pre-processing, data mining, validation & visualisation.
What is pre-processing?
Assembly of data into set + cleansing (removal of noise + fitting in missing data).