Lesson 6.3 Advanced Analytics Flashcards
What is the “trough of Disillusionment?”
phase of the hype cycle where our perceptions did not meet reality
This is a centralized repository that allows you to store all your structured and unstructured data in its natural state and in its entirety.
data lake
This is an application of artificial intelligence (AI) that provides
systems with the ability to automatically learn and improve from experi-
ence without being explicitly programmed.
Machine learning
This is anything that happens at a clearly defined time and that can be specifically recorded
event
these usually include data about the type of activity, when the activity occurred as well as it’s location and cause
event objects
this is a constant and continuous flow of event objects that navigate into and around companies from thousands of connected devices, medical internet of things and any other sensors
stream
this is the final act of analyzing all of this data
processsing
This is a step-by step set of instructions for carrying out a process for problem-solving
algorithm
this is data in a data set that does not match an expected or projected pattern
anomaly detection
this is The theory and development of computer systems able to
perform tasks that normally require human intelligence,
such as visual perception, speech recognition, decision-
making, and translation between languages
artificial intelligence
This is Identifying data in a data set that is similar and grouping it
together to understand the similarities as well as the
differences within a data set
clustering analysis
this is an analysis of data to determine a positive or negative relationship
correlation analysis
this is A subset of machine learning, utilizing a hierarchical level of
artificial neural networks to carry out the process of
machine learning
deep learning
This is a process in which data is extracted from a source, then transformed and loaded into a data warehouse
extract, transform, and load
this is an open source framework for the storage and processing of Big Data across a distributed file system
Hadoop
This is a column-oriented data store allowing for fast access to data stored in HDFS
HBase
This is a file system for the storage of data across many computers
Hadoop Distributed File SystemHDFS
This is the use of super computers to rapidly solve complex problems
High performance computing (HPC)
This is a Hadoop data system that facilitates the interrogation of data stored in HDFS using structured query language (SQL)
Hive
This is a database management system that stores data in memory not on a disk, resulting in fast processing
in-memory
this is the name for medical devices connected to the internet via sensors
Medical Internet of Things
This is a process in which software learns during data processing and becomes more accurate over time
machine learning
This is the processes of breaking up problems into pieces that are then distributed across multiple computers on the same network or cluster
MapReduce
This means data about data, information about stored data elements
Metadata
this is an open source, reliable, high performance, scalable, document database
MongoDB
This is extracting information from text
natural language processing
this is an open source graph database
Neo4j
this is a data flow management application
NiFi
These are databases that do not use the relational model, such as databases that store documents, tweets and so on
NoSQL
This is a representation of a body of knowledge as a set of domain-specific concepts
Ontology
This is a data movement in which data sets are made available to the public for use without charge.
Open Data
This means applications in which the source code is available to the general public for use or modification
open source
this is identification of patterns in data via algorithms
pattern recognition
this is a programming language used in the Hadoop framework
Pig
this is the use of existing data sets and algorithms to predict the probability that a future event will occur
predictive analytics
this is a movement to incorporate data acquisition about self into all aspects of a person’s daily living
Quantified self
This is an open source programing lanquage used for statistical computation, most commonly used to develop statistical software
R
This is a system in which treatments, therapies, and medications are recommended based on patient data
recommender systems
this is the use of algorithms to understand human feelins
sentiment analysis
this is data that is organized in a predetermined structure
structured data
this is data that does not prescribe to a predetermined structure, such as free text
unstructured data
What system of the processing of the brain presented by Daniel Kahneman represents the automatic and intuitive thinking process?
System 1
What system of the processing of the brain presented by Daniel Kahneman represents the thinking process that requires effort and attention?
System 2
Why is it important when developing visualizations of healthcare data to insure that they invoke system 1 processes?
You do not want the viewers spending time trying to figure out what the data is representing, you want them to understand immediately
What other factors should be considered when developing data visualizations?
colorblind palate
make sure they can render in any platform