04_14 Big Data and NoSQL Flashcards by D Styx

A data model that organizes data around a central entity based on the way the data will be used.

aggregate aware

How well did you know this?

Not at all

Perfectly

A data model that does not organize data around a central entity based on the anticipated usage of the data.

aggregate ignorant

How well did you know this?

Not at all

Perfectly

A process or set of operations in a calculation.

algorithm

How well did you know this?

Not at all

Perfectly

A data processing method that runs data processing tasks from beginning to end without any user interaction.

batch processing

How well did you know this?

Not at all

Perfectly

In the HDFS…

A report sent every 6 hours by the data node to the name node informing the name node which blocks are on that data node.

block report

How well did you know this?

Not at all

Perfectly

A computer-readable format for data interchange that expands the JSON format to include additional data types including binary objects.

BSON (Binary JSON)

How well did you know this?

Not at all

Perfectly

In a key-value database…

A logical collection of related key-value pairs.

bucket

How well did you know this?

Not at all

Perfectly

In document databases…

A logical storage unit that contains similar documents, roughly analogous to a table in a relational database.

collection

How well did you know this?

Not at all

Perfectly

In a column family database…

A collection of columns or super columns related to a collection of rows.

column family

How well did you know this?

Not at all

Perfectly

A NoSQL database model that organizes data into key-value pairs, in which the value component is composed of a set of columns that vary by row.

column family database

How well did you know this?

Not at all

Perfectly

A physical data storage technique in which data is stored in blocks, which hold data from a single column across many rows.

column-centric storage

How well did you know this?

Not at all

Perfectly

A declarative query language used in Neo4j for querying a graph database.

Cypher

How well did you know this?

Not at all

Perfectly

A NoSQL database model that stores data in key-value pairs in which the value component is composed of a tag-encoded document.

document database

How well did you know this?

Not at all

Perfectly

In a graph database…

The representation of a relationship between nodes.

edge

How well did you know this?

Not at all

Perfectly

Analyzing stored data to produce actionable results.

feedback loop processing

How well did you know this?

Not at all

Perfectly

A MongoDB method to retrieve documents from a collection.

find()

How well did you know this?

Not at all

Perfectly

A NoSQL database model based on graph theory that stores data on relationship-rich data as a collection of nodes and edges.

graph database

How well did you know this?

Not at all

Perfectly

A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.

Hadoop Distributed File System (HDFS)

How well did you know this?

Not at all

Perfectly

In the HDFS…

A signal sent every 3 seconds from the data node to the name node to notify the name node that the data node is still available.

heartbeat

How well did you know this?

Not at all

Perfectly

In a Hadoop environment…

A central program used to accept, distribute, monitor, and report on MapReduce processing jobs.

Study These Flashcards

job tracker

A human-readable text format for data interchange that defines attributes and values in a document.

Study These Flashcards

JSON (JavaScript Object Notation)

A NoSQL database model that stores data as a collection of key-value pairs in which the component is unintelligible to the DBMS.

Study These Flashcards

key-value (KV) database

The function in a MapReduce job that sorts and filters data into a set of key-value pairs as a subtask within a larger job.

Study These Flashcards

map

An open-source API that provides fast data analytics services.

One of the main Big Data technologies that allows organizations to process stores.

Study These Flashcards

MapReduce

A program that performs a map function.

mapper

# In the object-oriented data model... A named set of instructions to perform an action.

method ## Footnote Methods represent real-world actions and are invoked through messages. Also, a programed function within an object used to manipulate the data in that same object.

A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructre.

NewSQL

# In a graph database... The representation of a single entity instance.

node

A new generation of database management systems that is not based on the traditional relational database model.

NoSQL

The coexistence of a variety of data storage and data management technologies within an organization's infrastructure.

polyglot persistence

# In MongoDB... A method that can be chained to the find() method to improve the readability of retrieved documents through the use of line breaks and indentation.

pretty()

# In a graph database... The attributes or characteristics of a node or edge that are of interest to the users.

properties

The function in a MapReduce job that collects and summarizes the results of map functions to produce a single result.

reduce

A program that performs a reduce function.

reducer

A physical data stroage technique in which data is stored in blocks, which hold data from all columns of a given set of rows.

row-centric storage

A method for dealing with data growth that involves distributing data storage across a cluster of commodity servers.

scaling out

A method for dealing with data growth that involves migrating the same structure to more powerful systems.

scaling up

A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude.

sentiment analysis

The processing of data inputs in order to make decisions about which data to keep and which data to discard before storage.

stream processing

Data that conforms to a predefined data model and has been formatted to facilitate storage, use, and information generation.

structured data

# In a column family database... A column that is composed of a group of other related columns.

super column

A program in the MapReduce framework responsible to running map and reduce tasks on a node.

task tracker

A query in a graph database.

traversal

Data that exists in its orginal, raw state. ## Footnote That is, in the format in which it was collected and does not conform to a predefined data model.

unstructured data

The degree to which data can be analyzed to provide meaningful insights.

value

A characteristic of Big Data that describes the speed at which data enters the system and must be processed.

velocity

The trustworthiness of a set of data.

veracity

The ability to graphically present data in such a way as to make it understandable to users.

visualization

A characteristic of Big Data that describes the quantity of data to be stored.

volume

04_14 Big Data and NoSQL Flashcards

(49 cards)