Unit 11: Behind the Scenes: Databases and Information Systems Flashcards
data warehouse
large-scale collection of data that contains and organizes in one place all the data form an organizations’ multiple databases
3 sources for data warehouses
- internal sources - sales, billing, inventory, and customer databases
- external sources - vendors and suppliers
- clickstream data - software used on company websites to capture info about each click user makes as they navigate through the site
time-variant data?
data that doesn’t all pertain to one period of time
data staging?
an intermediate storage area used for data processing during the ETL proecess
ETL?
extract, transform, and load - the process fo formatting/cleansing data to enable data of different sources and types to comingle for analysis
OLAP?
online analytical processing - software that provides standardized tools for viewing and manipulating data in a data warehouse
data mart?
a related set of data that is grouped together and separated tout form the main body of data in the data warehouse
data mining?
the process by which a great amount of data are analyzed and investigated
Hadoop?
an open-source platform that makes complex unsaturated data easier to manage
how Hadoop stores files and processes data
- file storage - break data into chunks and is then distributed across many servers to be stored
- data processing - uses MapReduce
MapReduce?
sends code to each of the servers storing the data and uses multiple processors to process its set of data
6 data-mining techniques
- anomaly detection - identify outliers
- association/affinity group - determines which data goes together
- classification - defines data classes to spot trends
- clustering - organize data into smaller subgroups
- estimation/regression - assign value to data based on certain criteria
- visualization - a visual representation of data
information system?
a software-based solution used to gather and analyze data
TPS?
transaction-processing system - an operational-level system that keeps track of everyday business transactions/activity
real-time processing?
the database is updated while the transaction is taking place