database management Flashcards
criteria for data integtrity
relevance (provides meaningful info)
accuracy (degree to which data measures against true value)
correctness (conforms to approved standard or agrees with logic)
currency (up to date)
completeness (all required essential data is known)
corruption of data
errors in computer data that occur during writing,reading,storage,transmission or processing which introduces unintended changes to og data
hard drive failure
failure of hard drive that stores data results in loss of data or data corruption
RAID 1 (mirroring) + RAID 5 (striping) :provides one or more hard drives in case of failure
backup: used to recover prev version of data
audit trails: identify change to data since corruption (if stored in place separate to faulty drive)
human error
data accidentally deleted or updated incorrectly
minimise edit privileges: only few trusted,experienced users have edit privileges
audit trail: record changes made, who made it and when
data validation: reject data that does not comply with validation rules
power outages
saving data, stopping applications and services process not executed - data loss/corruption
UPS- protects system by providing emergency power when main power source fails
redundant power supply- having multiple power supplies so if one falls second power supply will take over + provide power (hot swappable)
malware
corrupted data w viruses,ransomware pr users unintentionally download malware
firewall : prevent malware from entering
anti malware: detect malware + take action
education: educate users on network abt social engineering
sql injection
running maliciously intended sql queries to gain access to confidential data or destroy tables
data validation: validate user input to check for possible sql injections
gui components: restrict data input
limit website access to database : remove INSERT,UPDATE,DELETE rights to any website
don’t send db error messages to clients web browser: can be used to understand db structure and adapt their sql injection
outdated data
affects currency
add new records
change existing records
deleting records no longer needed
invalid data
can be exposed + corrected using validation checks required by scenario w gui components, exception handling and if statements to apply validation rules
file synchronisation
done through internet connectivity
having same data on more than 1 device using the same internet credentials
SQL
Structured query language
programming language to perform CRUD (create,read,update,delete) operations on a relational database
w data structured into tables linked by primary and foreign keys
relationsl/SQL databases are not flexible, schema (tables) must be designed before data can be added
database schema
table design
highly structured and not flexible
column cannot be added or table deleted w/o altering schema
if deleted- may violate referential integrity
NoSQL
not only SQL
unstructured, flexible
store data in documents
ideal for storing massive amounts of unstructured data without predefined schema
can adapt to changing data
Document NoSQL db- fileholder that collects data of many diff types
Key-Value NoSQL database uses JSON files with key-value pairs
OLTP
OnLine Transactional Processing
type of data processing that quickly processes large volume of simple queries
real time updates -immediately responds to user’s requests
data warehouse
combines data from variety of sources w/in an org for the purpose of producing reports and analysis
reports created from complex queries
unlike database, does not contain current info and is not updated in real time
data denormalised
OLAP
OnLine Analytical Processing
process data warehouse data - massive volumes of data quickly analysed using OLAP to produce reports
faster than OLTP but queries come out slower
big data
massive volume of structured and unstructured data so large that it is difficult to process using traditional database + software techniques
gathered using many sources (eg mobile devices, sensory technologies,sensors,audit trails,pictures,videos,gps signals, transaction records, RFID readers)
used in many areas (eg business, genomics,meteorology, biological and environmental research,complex physics simulations
big data charcaterisations
volume: size of data that needs to be analysed and processed. so large- requires different processing techniques- cant be processed by laptop or desktop
velocity- speed at which data is generated. so high- requires distributed processing techniques
variety: variety of sources . data either structured or unstructured
veracity: quality of data, accuracy, applicability . high veracity- many valuable records. low- large proportion of meaningless data
KDD
can i see the macros please, kilojules!
knowledge discoverability in databases
process of discovering useful knowledge from, collection of data
data cleaning ( remove inconsistent data)
data integration (combine data from multiple sources)
data selection ( select data relevant to analysis task)
data transformation ( transform data into appropriate form for mining)
data mining ( apply intelligent methods to extract patterns)
pattern evaluation ( interpret patterns of interest )
knowledge presentation ( translate useful patterns into tables/graphs understandable by others)
data mining
set of techniques for discovering hidden valid and potentially useful patterns/trends in a data set
attempts to find new relationships amongst data to extract useful info
uses machine learning, statistics, AI, database technology
-extract transform + load data onto data warehouse
-store + manage data in multidimensional db system
-provide data access to business analysts
-analyse data by app software
present data in useful format
JSON Files
JavaScript Object Notation
Convenient way of sharing data
Platform independent
More organised than text file
Flexible- can be used in most programming
Data representation format
Commonly used for APIs and configurations
{“Key” : value} - key value pair
arrayList
Dynamic- does not store an initial value - grows/shrinks as elements are added/removed
[ //array
{ //each individual object
“Name”: Mr BG , “Age”: 23, “hobbies” : [“weight lifting”,”bowling”]
},
{“Name” : Mr Klaus, “Age” :24, “hobbies”: [“drama”, “singing”]
}
]
security measures to ensure data in database is properly protected
keep data up to date with regular edits and purges
refrain from printing out personal info relating to users
actions to take in data breach
POPIA requires
- users notified by email or on website that info has been accessed
- authorities notified
data persistence
data stored in permanent storage so it’s available again after programme closed and reopened later