Mid Term Flashcards
Define Noisy in data
Containing errors or outliers
Tabular form
Data has rows and columns
Define variable
a storage mechanism for a particular identifier, which contains information referred to as a value
Define randomization
the practice of using chance methods to assign participants to experimental conditions without bias or knowing anything about the person.
WHERE
Defines a specific condition desired in the outcome (ex. age = 35)
What is business analytics
The use of data to gain insights from data to maximize business outcomes
3 steps of getting data ready for analysis
clean, structure, integrate
5 stages of business analytics
- data wrangling
- descriptive analytics
- predictive analytics
- prescriptive analytics
- storytelling
data wrangling
wrestling with data to get it in a more structured format that is useful for analytics
Data integration
connecting two sources of data to offer more insights than each source would yield separately
predictive analytics
The practice of interpreting data to predict the likelihood of future business outcomes
Prescriptive analytics
the use of optimization techniques to advise businesses on what they should do
Spreadsheet tool
an interactive software application for structuring, transforming, analyzing, and storing data in rows and columns
Programming
The process of solving a problem using computer algorithms
Programing language
a formal set of instructions that can be used to produce various kinds of output
open-source programming tools
programming tools that are made freely available, often developed by and for the community
What are two well-known open-source programs
R and Python
Programming code
a collection of statements written in a particular programming language
Record
row in a spreadsheet
stores a person’s or object’s response over a number of fields
Fields
column in a spreadsheet
stores the info unit we have about each record (e.g. a person’s age, income, etc.)
Integer
a variable that contains numbers without decimal points
Programming tool
a software package that allows for the execution of programming code
Big data
large sets of both structured and unstructured data
Relational database
A means of storing information in such a way that information can be retrieved from it.
Non-relational database
a database that is not stored in tables, ready for analysis, but instead they may be document-based and use a variety of other strategies.
Hadoop
an open-source software framework that stores and processes large amounts of data