WEEK 10 Flashcards
IN WHAT WAY IS A DW SUBJECT OREIENTED
it is roganized by a fact table and dimensional table
which system is for daily oeprations & has high frequency of update operaitons
OLTP
where is data not updated
olap
what is the osurce of data for a depementa data mart
EDW
WHICH Of thefollowing can be relational db
ods, edw, data mart, operational db
what is a logical dm composed of
views
which of the following is typically created for a specific project and can be rmouved if no longer eneeded
data mart
what is data called one it is cleaned and loaded into EDW
RECONCLIED DDATA
charactieristcis of big data: 5 V’s
volume: much larger quantitiy of data than typical for relaitonal db
variety: lots of differnt data types and formats
velocity: data comes at very fast rate! (mobile sensors, web click stream)
veracity: trad data quality methods dont apply, how to judge accuracy and relevance?
value: big data is menaingless if it does not provide value toward some meanignufl goal
schema on read vs schema on write
schema on read (USING DATA FOR BANA)
- data model determiner later, depends on how you want to use it
- caputre and store the data and worry about how you want ot use it late
- DATA MARTS ARE LIKE THIS! you create scehma when you are doing a rpoject
schema on write (STORING DATA)
- preexsiting data model
-this is how tradiitonal db are designed (relaitonal db)
data lake
a large integraated repositoray for internal and external data that does not follow a prediefned schema
capture everything ,dive in anywhere, flexbile access, use ai to pulldata that ou ened
trad database design: schema on write
gather requrements and strcture > format data model > database schema > db use based on the predefined schemaa
big data appraoch: schema on read
collect large amts of data witg locally defined structures (JSON/XML…) > store data on lake > analyze store data to identify ways to structure> structure or org data during analysis process
NoSQL
not only swl
what does noSql mean
caetgory of recently intrduced data stroe and retrevial rech not based on the rleiatonal model
SCALING OUT rather than SCALING UP
NOSQL CHARCATERISITCS
NATURAL FOr cloud environeemnt (sacling out)
suports schema on read (big data is happy)
largely open source
not ACID complaint (atomicity, conssitency, isolations and durability ) this is transaction processing
BASE- bascially available, soft state, eventually consistent
NoSQL: key value stores
this is a simple pair of a key associated colelction of values
key is usually a string! db has no knowledgee of hte sturcture or meaning of the values
REDIS
Document stroes NOSQL
like a key-value stroe, but documetn goes further than value.
Doc is structued so specific elecments can be manipuated separelty
MONGODB
Wide colymn stores NOSQL
rows and columns
distribution of data based on both key values (records) and columns, using column groups/families; key is two dimensional
APACHE CASSANDRA
Graph oriented db NOSQL
maintain ingo regarding the relations between data items! nodes with properties
conenceiotns betwen nodes/ relationships can also have proeprties
Neo4j
mongo db
NOSQL
document-store db
BSON based storage (binary json)
collections: equivalent to tables in relational db, set of docs intended to be stored togehter
documents :equivalent to rows in relational db, docs do not need to have the same structure (unliek rows), _id property fo runiquely identify a row
relationships: _id property serves as primary key, another doc can have a foreing key as anotehr JSON property
MOngo db what is table and what is row
collection=table
document =row
ask if mongo db will on final, if yes then from 2:30 onwards:
https://ucalgary.zoom.us/rec/play/w-ozhS2JaZaj3_CT0YRHmU70ROoJCYw8aSQNwS5lEft-EiBnhPEhojS-Uk9cxqV5BkH7ZijL-hmJRIDI._rRt0Z1vyrQzztMe
what is a cluster on mongo db
cluster is the server
WHAT ARE JSON FILES!!
Big data
data that exist in large volumes
many varieties (data types)
high velocity (processed at very high speed)
5 v’s of bid gata
volume: HIGH volumne
variety: diff data types and formats
velocity: very fast rate
veracity: trad data quality methods dont apply
value: meaningfless if it does not provide value
mongo db atlas
BASICALLY mongo db is made to handle big data
MONGO DB DATABASES ARE MADE OF COLLECTIONS. (tables)
-> collections are made of documents (rows), identified by {}
-> field are seprated by ,
why are mongo db collections cool?
because no requriement that each doc (row) in collection does not tneed to have same fields