chapter 1 Flashcards
Define Data Analytics
the process of evaluating data with the purpose of drawing conclusions to address business questions.
What is used to analyze data to give organizations the information they need to make sound and timely decisions?
technologies, systems, practices, methodologies, databases, statistics, and applications.
Patterns are discovered from…
past archives
What is an analytics mindset?
recognizing when and how data analytics can address accounting questions
what is Data scrubbing and data preparation
comprehend the process needed to extract (query), clean, and prepare the data before analysis
Define data quality
recognize what is meant by data quality, be it completeness, reliability, or validity
descriptive data analysis
perform basic analysis to understand the quality of the underlying data and their ability to address the business question
data analysis through data manipulation
demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis
problem solving through statistical data analysis
identify and implement an approach that will use statistical data analysis to draw conclusions and make recommendations on a timely basis
data visualization and data reporting
report results of analysis in an accessible way to each varied decision maker and his or her specific needs
what is the objective of data extraction
to identify and obtain the data from the appropriate source
what is the objective of transforming data
to validate the data for completeness and integrety
what is the objective of loading data
to load the data into the appropriate tool for analysis
what are the five steps of the ETL process
determine the purpose and scope of the data request, obtain the data, validate the data for completeness and integrity, clean the data, load the data for data analysis.
Define classification
an attempt to assign each unit in a population into a few categories
define Regression
a data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.
define similarity matching
a data approach that attempts to identify similar individuals based on data known about them
define clustering
an attempt to divide individuals into groups in a useful or meaningful way
define co-occurrance grouping
a data approach that attempts to discover associations between individuals based on transactions involving them (i.e. when amazon says customers who bought this also bought…
define profiling
a data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data (mean, median, stnd deviation)
define link prediction
a data approach that attempts to predict a relationship between 2 data items (i.e. facebook sees you have 20 mutual friends w someone, suggests them as a friend)
define structured data
data that are stored in a database or spreadsheet and are readily searchable
define training data
existing data that have been manually evaluated and assigned a class
define test data
a set of data used to assess the degree and strength of a predicted relationship established by the analysis of training data
what is Benford’s law
it states that when you have a large set of naturally occuring numbers, the leading digit is likely to be small.
Define diagnostic analytics
procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark
what are the examples of diagnostic analytics
profiling, clustering, similarity matching, co-occurrence grouping
define predictive analytics and what are it’s examples
procedures that can generate a model that can be used to determine what can happen in the future. Examples are regression, classification, and link prediction.
what is nominal data
qualitative data that cannot be ranked (i.e. hair color)
what is ordinal data
qualitative data that can be ranked (i.e. gold, silver, bronze, or an A,B,C grade)
what is ratio data
quantitative data where 0 defines the “absence” of something. (i.e. cash)
what is interval data
quantitative data where 0 is just another number (i.e. temperature)
What is discrete data
quantitative data that only shows whole numbers (i.e. points in a basketball game)
What is continuous data
quantitative data that shows numbers with decimals (i.e. height)
what are declarative visualizations
visualizations that present findings to an audience (i.e. financial results
what are exploratory visualizations
visualizations used to gain insight while you are interacting with the data (i.e. identifying good customers)
what is the base standard
defines the formats for files & fields as well as master data requirements for users, business units, and tax tables.
what is the general ledger standard
Defines the chart of accounts, source listings, trial balance, and general ledger or journal entry detail
what is the order to cash subledger standard
defines sales orders and line items, shipments, invoices, open accounts receivable and adjustments, cash receipts, and customer master data.
what is the procedure to pay subledger standard
defines purchases and line items, goods received, invoices received, open accounts payable and adjustments, payments, and supplier master data
what is the inventory subledger standard
defines inventory location master data, product master data, inventory on hand data, and inventory movement transactions, and physical inventory and material cost
what is the fixed asset subledger
defines fixed asset master data, additions, removals, and depreciation calculations.
what is a homogeneous system
one single uniform installation or instance of a system
what is a heterogeneous system
multiple installations or systems
what do systems translator software do
attempts to map the various tables and fields from the varied enterprise systems in a heterogeneous system into a data warehouse
what is a data warehouse
where all of the data can be analyzed centrally, it is a repository of data accumulated from internal and external data sources
what is a flat file
a means of storing the data in one place, it is a single table of data with user-defined attributes that is stored separate from any application
what is a correlation coefficient
how closely 2 datasets are correlated or predictive of one another
what numbers do correlation coefficients range between
-1 to 1
what is the hot hand fallacy
assuming events are not independent when they are
what is selection bias
Having the wrong take because of the group you’re deriving data from
what is publication bias
significant findings are published, not finding anything results in not being published
recall bias
participants do not remember previous events or experiences accurately, or once they are told something happened a certain way, they believe to remember it that way.
what is survivorship bias
focusing on the people or things that made it past some sort of selection (we analyze data from existing companies, not the ones that failed)
what charts would you use for conceptual (qualitative) data
bar charts, pie chart, heat map, tree map (for comparison)
symbol map (for geographic data)
word cloud (text data)
what charts would you use for data-driven (quantitative) data
box and whisker plot (for outlier detection) scatter plot (relationship between two variables)
line chart (trend over time)
filled map (geographic data)