Mid Term Flashcards
Define Noisy in data
Containing errors or outliers
Tabular form
Data has rows and columns
Define variable
a storage mechanism for a particular identifier, which contains information referred to as a value
Define randomization
the practice of using chance methods to assign participants to experimental conditions without bias or knowing anything about the person.
WHERE
Defines a specific condition desired in the outcome (ex. age = 35)
What is business analytics
The use of data to gain insights from data to maximize business outcomes
3 steps of getting data ready for analysis
clean, structure, integrate
5 stages of business analytics
- data wrangling
- descriptive analytics
- predictive analytics
- prescriptive analytics
- storytelling
data wrangling
wrestling with data to get it in a more structured format that is useful for analytics
Data integration
connecting two sources of data to offer more insights than each source would yield separately
predictive analytics
The practice of interpreting data to predict the likelihood of future business outcomes
Prescriptive analytics
the use of optimization techniques to advise businesses on what they should do
Spreadsheet tool
an interactive software application for structuring, transforming, analyzing, and storing data in rows and columns
Programming
The process of solving a problem using computer algorithms
Programing language
a formal set of instructions that can be used to produce various kinds of output
open-source programming tools
programming tools that are made freely available, often developed by and for the community
What are two well-known open-source programs
R and Python
Programming code
a collection of statements written in a particular programming language
Record
row in a spreadsheet
stores a person’s or object’s response over a number of fields
Fields
column in a spreadsheet
stores the info unit we have about each record (e.g. a person’s age, income, etc.)
Integer
a variable that contains numbers without decimal points
Programming tool
a software package that allows for the execution of programming code
Big data
large sets of both structured and unstructured data
Relational database
A means of storing information in such a way that information can be retrieved from it.
Non-relational database
a database that is not stored in tables, ready for analysis, but instead they may be document-based and use a variety of other strategies.
Hadoop
an open-source software framework that stores and processes large amounts of data
Document Databases
a database that pairs a key with a complex data structure
Scientific method
A set of techniques used for investigating phenomena commonly based on reasoning applied to the evidence of empirical data
controlled experiment
type of experiment in which a hypothesis is tested by looking for changes in a dependent variable measure caused by manipulated changes to an independent variable as the only factor that is allowed to be adjusted
Overconfidence
tendency to think too highly of one’s expertise
What does S.T.P stand for
Segmentation, Targeting, Positioning
Wide-column Stores
a database that uses a column-oriented data structure, similar to an inverted table that has multiple attributes per key
Graph databases
databases that use structured relational graphs of the key-value pairings
multi-model database
A database that can support multiple data models against a single, integrated backend
NoSQL
not-only structure query language with the ability to retrieve data from a relational database and data from non-relational data sources
SQL
used to retrieve data from a database application
SQL vs. NoSQL
-SQL manages relational databases while NoSQL manages non-relational databases
-NoSQL can handle large volumes of rapidly changing structured semi-structured and unstructured data.
primary key
a column that uniquely identifies a row in the table
unique key
used to indicate that an index cannot accept duplicate entries
database management system
interacts with end-users to store and manage structured data
Table function
command that manages and changes tables within the database
Query function
SQL commands that ask questions of the data
SELECT
gathers specific data from a table
FROM
Establishes which table the data is gathered from
What are nodes
entities in a graph database such as people, accounts, firms, etc
cloud
storing and accessing data and programs over the internet instead of a local computer
GROUP BY
Tells how to segment the data (ex. group by state)
Join command
temporarily combines two tables for the query
Inner Join
data that has matching records in both tables
Left join
All data from left and matching data from right
Right join
All data from right and matching data from left
Full join
All data from both tables
What is a dendrogram
a diagram illustrating a hierarchy of clusters
Psychographic segmentation
Segmenting people by their feelings about a product category.
A centroid
a center of mass of a geometric object of uniform density
Cluster analysis
a technique for grouping people so that those in the same group are more like one another compared with those in other groups
data mining
The process of finding patterns in large data sets
Segmentation
Used in marketing to divide the total populations of customers into smaller, relatively homogenous groups.
targeting
Identifying which segment(s) to pursue and appealing uniquely to particular segments of customers
Positioning
A business strategy that establishes the way a customer perceives a product or firm relative to the rest of the marketplace
k-means cluster analysis
iterative technique that seeks to allocate each observation to the cluster closest to it.
behavioral data
a highly valued source of segmentation; include usage rates and patterns for a product or category
A/B testing
a method for testing the effectiveness of a business effort via a controlled experiment that tests two or more conditions before exposure to the broader marketplace
Sample size
The number of participants needed for all conditions of the A/B test or other experiment
qualitative
a type of data that uses words, photos, or graphs
quantitative
a type of data that uses numbers