Final Exam Flashcards
According to the text, which of the following is NOT true?
An example of an SSBI tool is PowerPoint.
When preparing data, analysts use the ETL process. ETL stands for Explore, Transfer, Load.
False
According to the text, the data analysis process is comprised of three equally important stages, which of the following is NOT one of those stages?
Review
Understanding “why” something happening in your analysis is called _________ analytics.
Diagnostic
A visualization of a chart that compares actual vs expected monthly revenue would probably be found in the _________ area
auditing
In preparing data, the process of reviewing the data for possible issues is called
profiling
In the data analysis process, “C” in the MOSAIC model stands for “Cleaning”.
False
The CPA Exam and the CMA Exam both include topics on data analytics
True
Which is consistent with the Data Analytics Mindset?
all of these
Which of the following is best defined as a measure of dispersion
variance
Most of the data you will work with will come from
relational databases
Questions with single dimensions should be answered with pivot tables, questions with multiple dimensions should be answered with excel functions.
False
Database elements can be represented in the REA model, the model’s elements are..
resources, events, agents
Which of the following is NOT one of the basic excel functions used in foundational analysis
DISPLAYIF
In a relational database table, a primary key is
is a unique value
A __________ is a bar chart of frequency distributions where the height of the bar represents the count of items in the interval
histogram
There are 4 types of joins used to link tables together, which type of join DOES NOT result in any null values being produced?
Inner
Simultaneously filtering for multiple dimensions is called
data slicing
An action request made to a database is called a(n)
query
Which is the best tool when the desired result is known, but not the input value for a single variable will achiever that result?
Goal Seek
An analysis prepared to support a predetermined belief is an example of
confirmation bias
an anomaly is
an observation that deviates from what is normal/expected
When examining the relationship between two variables, if one variable increases as the other variable decreases the relationship is
a negative correlation
In a regression model prepared to predict revenue, which of the following is the correct interpretation of an adjusted R-squared of 0.85?
the independent variables in the model can explain 85% of the change in revenue
A spreadsheet model that allows evaluating how changes to values and assumptions affect an outcome is called a
what-if analysis
Determining if the analysis makes senses is associated with….
data analysis interpretation
An appropriate analysis to use to determine how many times an event has occurred would be
a frequency distribution
which of the following analysis can predict a future outcome
linear regression
if the objective is to use historical data to identify patterns, which is the best analysis to use?
Trend analysis
which of the following describes part of the goal of the ETL process
Identify and obtain the data needed for solving the problem
the purpose of transforming data is
to validate the data for completeness and integrity
mastering the data can also be described via the ETL process. ETL process stands for:
Extract, Transform, Load
the advantages of storing data in a relational database include
help in enforcing business rules and integrating business processes
why is supplier ID considered to be a primary key for a supplier table
it contains a unique identifier for each supplier
Which of the following questions are not suggested by the institute of business ethics to allow a business to create value from data use and analysis, and still protect the privacy of stakeholders?
Does the data used by the company include personally identifiable information?
which of the following is not a common way that data will need to be cleaned after extraction and validation
Clean up trailing zeroes
which attribute is required to exist in each table of a relational database and serves as the “unique identifier” for each record in a table?
Primary key
what are attributes that exist in a relational database that are neither primary nor foreign keys?
Descriptive attributes
the metadata that describes each attribute in a database is
data dictionary
which of the following best describes an unsupervised approach to the evaluation of data?
data exploration looking for potential patterns of interest
these data are organized and reside in a fixed field with a record or a file. such data are generally contained in a relational database/ spreadsheet and are readily searchable by search algorithms.
structured data
which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs
classification
an observation about the frequency of leading digits in many real-life sets of numerical data
benford’s law
which approach to data analytics attempts to predict a relationship between two data items
link prediction
models associated with regression and classification data approaches have all these important parts except:
test data
auditing financial statements, and its desire to look for errors, anomalies, and possible fraud, is most consistent with which type of analytics?
Diagnostic analytics
in general, the simpler the model, the greater the chance of
underfitting the data
test data
set of data used to assess the degree and strength of a predicted relationship
in general, the more complex the model, the greater the chance of
overfitting the data
ratio data
considered the most sophisticated type of data
in the late 1960s ed altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic altman’s z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from?
standardized normal distribution
the Fahrenheit scale of temperature measurement would best be described as an example of
interval data
Conceptual (Qualitative)
Comparison: Bar Chart, Pie Chart, stacked bar chart, Tree map, Heat map
Geographic data: Symbol map
Text Data: word cloud
Data-driven (quantitative)
Outlier detection: box and whisker plot
Relationship between two variables: scatter plot
Trend over time: line chart
Geographic data: filled map
least sophisticated type of data
nominal
not a typical example of nominal data
SAT scores
Anscombe’s quartet suggests that
visualizations should be used in tandem with statistics
line charts are not recommended for
qualitative data
letter grades would be best described as
ordinal data
which testing approach would be used to predict whether certain cases should be evaluated as having fraud or no fraud
classification
describes finding correspondences between at least two types of text or entries that may not match perfectly
fuzzy matching
the determinants for sample size include all of the following except:
potential risk of account
Benford’s law suggests that the first digit of naturally occurring numerical datasets follow an expected distribution where
the leading digit of 8 is more common than 9
What type of analysis would help auditors find missing checks?
sequence check
CAAT (Computer assisted audit techniques)
Automated scripts that can be used to validate data, test controls, and enable substantive testing of transaction details or account balances and generate supporting evidence for the audit
which testing approach would be useful in assessing the value of inventory shrinkage given multiple environmental factors
regression
which items would be currently out of the scope of data analytics
direct observation of processes
which type of audit analytics might be used to find hidden patterns/variables linked to abnormal behavior
diagnostic analytics
which type of audit analytics might be used to find hidden patterns/variables linked to abnormal behavior
diagnostic analytics
what allows tax departments to view multiple years, periods, jurisdictions (state/federal/international) and differing scenarios of data, typically through use of a dashboard
tax data visualizations
the task to tax accountants and tax departments to minimize the amount of taxes paid in the future
tax planning
an example of a tax risk KPI would be
levels of late filing or error penalties
an example of a tax cost KPI would be
ETR (Effective tax rate)
an example of a tax efficiency and effectiveness KPI would be
amount of time spent on compliance vs strategic activities
tax departments interested in maintaining their own data are likely to have their own
tax data mart
in which stage of the IMPACT model would the use of tax cockpits fit?
track outcomes
predictive analysis of potential tax liability and the formulation of a plan to reduce the amount of taxes paid is
tax planning
the evaluation of the impact of different tax scenarios/alternatives on various outcome measures including the amount of taxable income or tax paid
what-if scenario analysis
an example of a tax sustainability KPI would be
number of audits closed and significance of assessment over time
dependent variable is
Y
TO REMOVE NULL VALUES
go to power query and right click-remove empty
binary values
either 0 or 1
IMPACT
I-Identify the questions
M-Master the data
P-Perform the test
A-Address and refine results
C-Communicate insights
T-Track Outcome
A data approach that attempts to discover associations between individuals based on transactions involving them.
co-occurrence grouping
A data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data (including mean, standard deviations, etc.).
profiling
A data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.
regression
Data that do not adhere to a predefined data model in a tabular format.
unstructured data
An information system for managing all interactions between the company and its current and potential customers.
Customer Relationship Management (CRM) system
Centralized repository of descriptions for all of the data attributes of the dataset.
data dictionary
A means of storing data in one place, such as in an Excel spreadsheet, as opposed to storing the data in multiple tables, such as in a relational database.
flat file
An information system that helps manage all the company’s interactions with suppliers.
Supply chain mgmt (SCM) system
A data approach that attempts to divide individuals (like customers) into groups (or clusters) in a useful or meaningful way.
clustering
Procedures that summarize existing data to determine what has happened in the past. Some examples include summary statistics (e.g., Count, Min, Max, Average, Median), distributions, and proportions.
descriptive analytics
A numerical value (0 or 1) to represent categorical data in statistical analysis; values assigned a 1 indicate the presence of something and 0 represents the absence.
dummy variable
One way to categorize quantitative data, as opposed to discrete data. Continuous data can take on any value within a range. An example of continuous data is height.
continous data
One way to categorize quantitative data, as opposed to continuous data. Discrete data are represented by whole numbers. An example of discrete data is points in a basketball game.
discrete data
The second most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio; a type of qualitative data. Ordinal can be counted and categorized like nominal data and the categories can also be ranked. Examples of ordinal data include gold, silver, and bronze medals.
ordinal data
The least sophisticated type of data on the scale of nominal, ordinal, interval, and ratio; a type of qualitative data. The only thing you can do with nominal data is count, group, and take a proportion. Examples of nominal data are hair color, gender, and ethnic groups.
nominal data
interval data
The third most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio; a type of quantitative data. Interval data can be counted and grouped like qualitative data, and the differences between each data point are meaningful. However, interval data do not have a meaningful 0. In interval data, 0 does not mean “the absence of” but is simply another number. An example of interval data is the Fahrenheit scale of temperature measurement.
Procedures used to generate a model that can be used to determine what is likely to happen in the future. Examples include regression analysis, forecasting, classification, and other predictive modeling.
predictive analytics
Procedures that summarize existing data to determine what has happened in the past. Some examples include summary statistics (e.g., Count, Min, Max, Average, Median), distributions, and proportions.
descriptive analytics
Procedures that work to identify the best possible options given constraints or changing conditions. These typically include developing more advanced machine learning and artificial intelligence models to recommend a course of action, or optimizing, based on constraints and/or changing conditions.
prescriptive analytics
Analysis technique of business processes used to diagnose problems and suggest improvements where greater efficiency may be applied.
process mining
tax legislation offering major change to existing tax code
2018 Tax cuts and jobs act tax reform
A subset of the data warehouse focused on a specific function or department to assist and support its needed data requirements.
data mart
A repository of data accumulated from internal and external data sources, including financial data, to help management decision making.
data warehouse
A subset of a company-owned data warehouse focused on the specific needs of the tax department.
tax data mart