04_14 Big Data and NoSQL Flashcards
A data model that organizes data around a central entity based on the way the data will be used.
aggregate aware
A data model that does not organize data around a central entity based on the anticipated usage of the data.
aggregate ignorant
A process or set of operations in a calculation.
algorithm
A data processing method that runs data processing tasks from beginning to end without any user interaction.
batch processing
In the HDFS…
A report sent every 6 hours by the data node to the name node informing the name node which blocks are on that data node.
block report
A computer-readable format for data interchange that expands the JSON format to include additional data types including binary objects.
BSON (Binary JSON)
In a key-value database…
A logical collection of related key-value pairs.
bucket
In document databases…
A logical storage unit that contains similar documents, roughly analogous to a table in a relational database.
collection
In a column family database…
A collection of columns or super columns related to a collection of rows.
column family
A NoSQL database model that organizes data into key-value pairs, in which the value component is composed of a set of columns that vary by row.
column family database
A physical data storage technique in which data is stored in blocks, which hold data from a single column across many rows.
column-centric storage
A declarative query language used in Neo4j for querying a graph database.
Cypher
A NoSQL database model that stores data in key-value pairs in which the value component is composed of a tag-encoded document.
document database
In a graph database…
The representation of a relationship between nodes.
edge
Analyzing stored data to produce actionable results.
feedback loop processing
A MongoDB method to retrieve documents from a collection.
find()
A NoSQL database model based on graph theory that stores data on relationship-rich data as a collection of nodes and edges.
graph database
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.
Hadoop Distributed File System (HDFS)
In the HDFS…
A signal sent every 3 seconds from the data node to the name node to notify the name node that the data node is still available.
heartbeat
In a Hadoop environment…
A central program used to accept, distribute, monitor, and report on MapReduce processing jobs.
job tracker
A human-readable text format for data interchange that defines attributes and values in a document.
JSON (JavaScript Object Notation)
A NoSQL database model that stores data as a collection of key-value pairs in which the component is unintelligible to the DBMS.
key-value (KV) database
The function in a MapReduce job that sorts and filters data into a set of key-value pairs as a subtask within a larger job.
map
An open-source API that provides fast data analytics services.
One of the main Big Data technologies that allows organizations to process stores.
MapReduce
A program that performs a map function.
mapper
In the object-oriented data model…
A named set of instructions to perform an action.
method
Methods represent real-world actions and are invoked through messages. Also, a programed function within an object used to manipulate the data in that same object.
A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructre.
NewSQL
In a graph database…
The representation of a single entity instance.
node
A new generation of database management systems that is not based on the traditional relational database model.
NoSQL
The coexistence of a variety of data storage and data management technologies within an organization’s infrastructure.
polyglot persistence
In MongoDB…
A method that can be chained to the find() method to improve the readability of retrieved documents through the use of line breaks and indentation.
pretty()
In a graph database…
The attributes or characteristics of a node or edge that are of interest to the users.
properties
The function in a MapReduce job that collects and summarizes the results of map functions to produce a single result.
reduce
A program that performs a reduce function.
reducer
A physical data stroage technique in which data is stored in blocks, which hold data from all columns of a given set of rows.
row-centric storage
A method for dealing with data growth that involves distributing data storage across a cluster of commodity servers.
scaling out
A method for dealing with data growth that involves migrating the same structure to more powerful systems.
scaling up
A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude.
sentiment analysis
The processing of data inputs in order to make decisions about which data to keep and which data to discard before storage.
stream processing
Data that conforms to a predefined data model and has been formatted to facilitate storage, use, and information generation.
structured data
In a column family database…
A column that is composed of a group of other related columns.
super column
A program in the MapReduce framework responsible to running map and reduce tasks on a node.
task tracker
A query in a graph database.
traversal
Data that exists in its orginal, raw state.
That is, in the format in which it was collected and does not conform to a predefined data model.
unstructured data
The degree to which data can be analyzed to provide meaningful insights.
value
A characteristic of Big Data that describes the speed at which data enters the system and must be processed.
velocity
The trustworthiness of a set of data.
veracity
The ability to graphically present data in such a way as to make it understandable to users.
visualization
A characteristic of Big Data that describes the quantity of data to be stored.
volume