EXAM 2 Flashcards
4.1 How do Accountants Design Data Analysis Projects?
Create a Data Analysis Project Plan
What are the 5 steps?
Step 1: F___ on the o____e
Step 2: Select a d___ str___y
Step 3: Select an an___ str___y
Step 4: Consider r___
Step 5: Em___ co___s
Step 1: Focus on the objective
Step 2: Select a data strategy
Step 3: Select an analysis strategy
Step 4: Consider risks
Step 5: Embed controls
Create a Data Analysis Project Plan
Step 1: Focus on the objective
Keep the project’s 1) o___ve and 2) sp___ q____s in mind to 3) select the 4) b___ data and 4) an___ strategies to 5) ful__ the ob___tive and answer those 6) q___
Simply remembering to ask how the plan’s 7) pro__ d__and analysis st___y decisions 8) re___ to the 9) objective helps us make better 10) ch__
1) objective
2) specific questions
3) select
4) best data
4) analysis strategies
5) fulfill the objective
6) questions
7) proposed data and analysis strategy decisions
8) relate
9) objective
10) choices
Create a Data Analysis Project Plan
Step 2: Select a data strategy
Use 1) cr___ t____g to 2) d__p and r___ a few data 3) al___s.
This 4) en___ that we choose the data option most 5) ap___e for the objective
1) critical thinking
2) develop and rank
3) alternatives
4) ensures
5) appropriate
Create a Data Analysis Project Plan
Step 3: Select an analysis strategy
Use what was 1) le___ from 2) s___g the data 3) str__ and apply that same 4) dev___t and r____g process to analyses 5) alt___
Following this step 6) inc___ the likelihood of selecting the 7) b___ analysis 8) o___n
1) learned
2) selecting
3) strategy
4) development and ranking
5) alternatives
6) increases
7) best
8) option
Create a Data Analysis Project Plan
Step 4: Consider risks
1) Co___ and pr___g cr___l risks to both the data and the analysis strategies 2) re__s how these risks can create 3) mis___ and in___ results
1) Considering and prioritizing critical
2) reveals
3) misleading and invalid
Create a Data Analysis Project Plan
Step 5: Embed controls
Designing and implementing 1) pre____ and de___ co___s into the analysis process leads to results that are 2) ac___, v____, and re___
1) preventative and detective controls
2) accurate, valid, and reliable
Step 2: Select the Data Strategy
Develop several data alternatives that could help answer the objective question.
Then, to select the most useful data alternative for the project plan, identify the factors you want to use to rank these alternatives and assign values to each alternative’s factors.
The best data strategy alternative is the one with the 1) hi___t ov__l factor r___gs
1) highest overall factor
Step 3: Select the Analysis Strategy
For any project plan, choosing the best analysis strategy involves considering and evaluating several possible alternative analyses given the objective questions and the already selected strategy for the data
Steps 4 and 5: Data Strategy Risks and Controls
These data risks can be controlled by comparing the extracted data to the source invoice and collection documents.
Possible risks involving the management’s estimates of bad debt percentages include human bias embedded in the authorized bad debt percentages, changes in customer payment behavior, and business process changes to credit customer approvals and collection practices.
Steps 4 and 5: Data Strategy Risks and Controls
One way to control for these risks is to ask the finance, sales, and accounts receivable managers if there were changes to the customer base, market conditions, or business policies and procedures for credit approval, write-offs, or collection policies during the year. This knowledge could confirm existing bad debt percentages or motivate their adjustment.
Another control for evaluating the risks in management estimates and assumptions is to compare 1) c___t data to p___ y__ data. Evaluating increases in the median number of days outstanding and the number of new to returning customers in each age group can offer insights into the reasonableness of bad debt estimation percentages.
1) current data to prior year data
4.2 What Should We Consider When Selecting Data for Analysis?
It’s easier to make better decisions when we have the 1) ne___ in___
1) necessary information
A successful data analysis project hinges on selecting data that are 1) rel___t and ap___ for the objective of the project, respecting the data’s 2) cha___s and me___ scales, and controlling for in__ data ri__
1) relevant and appropriate
2) characteristics and measurement scales, and controlling for inherent data risks
Identify Appropriate Data
Data can be considered appropriate for analysis when they are 1) r___t, av__e, and the ch___ ma__ the analysis method 2) req___.
Appropriate data can be 3) in___, e___al, or a co____ of both
1) relevant, available, and the characteristics match
2) requirements
3) internal, external, or a combination of both
Identify Appropriate Data
Internal data are generated within the organization, such as 1) s__ data, pur___ data, in__y data, c___er data, and ve___r data. Internal data can typically be more 2) ea___ co___d and v___d by an organization.
External data are obtained from sources outside of an organization. This data can include 3) we___ data, ge___ic data, and pub__ available co___or data. External data are somewhat 4) ri___r to use since we often 5) ca__t know if the data are 6) ac___ or co___. External data can, however, provide 7) in___ that internal data alone 8) c___t p___e
After identifying the available and relevant data alternatives, the 9) cha__ of the possible data sets need to be 10) ve___ as 11) s___e for the planned analysis.
1) sales data, purchase data, inventory data, customer data, and vendor data
2) easily controlled and verified
3) weather data, geographic data, and publicly available competitor data
4) riskier
5) cannot
6) accurate or complete
7) insights
8) cannot provide
9) characteristics
10) verified
11) suitable
Define Data Set
A 1) co____ of data 2) col___ and r___ available for analysis
1) collection
2) columns and rows
Understanding the 1) cha____ of a data set is important because, for example, 1.1) st___al me___ and t__s often require certain data characteristics or a minimum of data points.
2) Vi___ the data requirements for these measures and tests can threaten the 3) ac___, re___y, and sig____e of the analysis results.
1) characteristics
1.1) statistical measures and tests
2) Violating
3) accuracy, reliability, and significance
Define Fields
1) In____l columns representing the 2) cha___s about each 3) r___ stored in the columns of a data set
1) Individual columns
2) characteristics
3) record
Define Attributes
The data fields that describe aspects of a 1) re___, e___t, or a___t of the object of 2) in___.
When the data source is a database, they are the 3) co__ in a data set.
1) resource, event, or agent 2) interest
3) columns
Define Records
1) R___ in a data set from a database are records, which represent the collection of 2) co____ that hold the 3) descriptions of a 4) s___e oc_____ of the data set’s 5) pu____
1) Rows
2) columns
3) descriptions
4) single occurrence
5) purpose
In addition to understanding the content of data fields in accounting databases, considering the 1) so__ of the data is important because the 2) q____of the data in the 3) f___s impact the 3.1) q___ of the 4) a__s
1) source
2) quality
3) fields
3.1) quality
4) analysis
What are the 2 types of raw data fields?
-____d raw data
-N____-___ed raw data
-Measured raw data
-Non-measured raw data
Define Measured Raw Data
Data 1) cr____ or ca___d by a 2) con____process capturing the 3) v__ of the data.
Examples include 4) p___e, c___t, n___r on hand, w___t, d___, h____s worked, temp___, sensor r___s, h____observation, or ec____c value.
Their format can be 5) di___ or co___ data.
1) created or captured
2) controlled
3) value
4) price, cost, number on hand, weight, date, hours worked, temperature, sensor readings, human observation, or economic value
5) discrete or continuous
Measured raw numeric data
can be used in ma___l cal__
mathematical calculations
Define Non-measured raw data
Data often created 1) au___ by the 2) co___ or com___y policy for 3) c___l.
Examples include 4) ide____ codes, chart of ac___t nu___, standard d____s, product cat___ c__s, or loc___ codes such as city.
These fields are typically formatted as 5) di___ data
1) automatically
2) computer or company
3) control
4) identification codes, chart of account numbers, standard descriptions, product category codes, or location codes such as city
5) discrete
Non-measured raw data
Non-measured raw data that have been formatted as 1) nu___ and di___e, meaning that they are 2) ___n-con____and will 3) n__r have 4) pa___ u___t v__.
These fields 5) ca___ be used in analysis for 6) mat____ cal___s because they refer to 7) u___e ide___ers for 8) in____ items and their 9) pro____category group.
1) numeric and discrete
2) non-continuous
3) never
4) partial unit values
5) cannot
6) mathematical calculations 7) unique identifiers
8) inventory items
9) product category
Define Calculated Data
Data created when 1) o__ or m___ fields in a particular 2) r____ (r___) have any number of 3) ma___l operators (such as +, −, ×, %) applied, and often are 4) d___d from using the data in another 5) fi___ or fi__ within the 6) sa__
record (7) n___across 8) di____ rows).
These fields can be formatted as 9) d___e or co___ data.
1) one or more
2) record (row)
3) mathematical operators
4) derived
5) field or fields
6) same
7) never a
8) different
9) discrete or continuous
Calculated data that is also 1) nu__and likely to be used in a variety of 2) dif___ mat___ ca___s in accounting.
1) numeric
2) different mathematical calculation
Define measurement scale
refers to the type of 1) in___ pr___ by the data.
Data measurement scales should be considered when 2) de___ the data strategy, as they impact which analyses can be 3) pe___ on the data.
1) information provided
2) designing
3) performed
What are the 4 types of data measurement scales?
-ca____
-o___l
-i____l
-r___
-categorical
-ordinal
-interval
-ratio
Define Categorical (nominal) Data
1) La____or n____d data that can be sorted into 2) gr___s according to 3) sp___ ch___s. The data 4) d__ n___ have a 5) qu___ value
1) Labeled or named
2) groups
3) specific characteristics
4) do not
5) quantitative
Define Ordinal Data
1) Or___or r__d ca___al data. 2) Di___ between the categories 3) d___ n__ need to be 4) k___ or e___
1) Ordered or ranked categorical
2) Distance
3) does not
4) known or equal
Define Interval Data
1) O___ data that have 2) e__l and con___ different between 3) o___ns and an ar___ z___ po___t
1) Ordinal
2) equal and constant
3) observations and an arbitrary zero point
Define Ratio Data
1) Int__ data with a 2) n___l z__ p__t. A natural zero point means that it’s 3) n__ ar___
1) Interval
2) natural zero point
3) not arbitrary.
Consider Data Strategy Risks and Implement Controls
Another benefit of documenting data strategy in a project plan is that examining each choice can help identify three common data 1) ri___ that might affect an analysis’s 2) v___, a____, and r___ty
1) risks
2) validity, accuracy, and reliability
What are the 3 common data risks ? (ig this would be more summarized)
- N___-re____e sa___ sel____
- O___ data points
- D__Data
- Non-representative sample selections
- Outlier data points
- Dirty Data
What are the recommended controls for the 3 data risks?
- Non-representative sample selections
-1) v___ rep____s of sample - Outlier data points
-perform a 2) h___m or quartile analysis to identify 3) ou___, then 4) ex__ or justify the 5) r___ used for outlier 6) ad___t or re__l - Dirty Data
-7) v___ in___y of data set and cl___ u__ dirty data is___
1) verify representativeness of sample
2) histogram
3) outliers
4) explain or justify
5) rule
6) adjustment or removal
7) verify integrity of data set and clean up dirty data issues
3 common data risks (more detailed)
The first risk is that a sample 1) ex____from a 2) la___r po___ of data are a 3) po___ representation of the 4) un____ population.
That is why performing tests on the sample’s 5) repr____ can either 6) ve___ its 7) val___ or d___t its significant 8) we___
1) extracted
2) larger population
3) poor
4) underlying
5) representativeness
6) verify
7) validity or detect
8) weakness
3 common data risks (more detailed)
The second risk is potentially including unusual data points in an analysis.
Outlier data points are 1) un__ data points compared to the rest of the data in either an activity point (x-axis or the independent variable), such as number of units produced, or an unusual level of the economic value (y-axis or the dependent variable), such as total costs or total sales.
The best control for identifying outliers is to 2) vi___ the data in a graph to check if the measure is very different than the rest of the data.
Identified outliers can either be 3) re___ (if possible) or 4) re___ with a logical and documented rule.
1) unusual
2) visualized
3) remeasured
4) removed
Define Dirty Data
variety of data ___
provides 1) in___or inc___e descriptions of the economic 2) ac___ of a business
errors
1) inaccurate or incomplete
2) activities
3 common data risks (more detailed)
Dirty Data
Dirty data include 1) m__g, in___ du__e, and inc___ formatted data.
For example, the purchase orders, invoices, or checks written by the company should be sequentially accounted for, with no missing numbers or two with the same identifying number.
Controls should 2) a__s t__t for dirty data 3) be__ the analysis is performed.
Data can be 4) co___ to its source 5) doc____, or if that is not possible, tested for 6) rea____. Testing for reasonableness can include checking for 7) m___g seq___ num__, 8) du___ numbers, or 9) ve___ the data have the ex___ fo___t or ac___ble characters.
1) missing, invalid, duplicate, and incorrectly
2) always test
3) before
4) compared
5) documents
6) reasonableness
7) missing sequence numbers
8) duplicate numbers
9) verifying the data have the expected format or acceptable characters.
Once data have been selected, evaluated, and any risks identified and controlled, the next step is to develop an analysis 1) st___ ap___e for the 2) o___ of the project
1) strategy appropriate
2) objective
4.3 What Should We Consider When Selecting an Analysis?
What are the two questions when designing a data analysis strategy?
- Can the ch___ a___is strategy an__r the sp__c ob___ qu___s?
- Is the me____ sc___ of the data ap____ for the selected analysis strategy?
- Can the chosen analysis strategy answer the specific objective questions?
- Is the measurement scale of the data appropriate for the selected analysis strategy?
The specific type of descriptive and diagnostic analysis used depends on the 1) me____ s__ of the data
1) measurement scale
1) Des____ analyses are the most common analysis strategies using accounting data.
It is useful for 2) ev____g str____ pe____e because it provides more 3) me_____ and b____ss in____
1) Descriptive
2) evaluating strategy performance
3) meaning and business intelligence
Diagnostic analysis strategies can be compared to detective work. Accountants can use these analysis strategies to identify and discover the most likely 1) ca__s of accounting phenomena
Diagnostic strategies always require applying 2) cr___l thi___ skills.
1) causes
2) critical thinking
Define Variable
a data 1) it__ that will be 2) us___ in the analysis
1) item
2) used
Descriptive and diagnostic analyses help to better understand 1) w___ happened and 2) w__ it happened.
What if the objective is to predict a 3) fu___ outcome or determine what strategies 4) mi___t achieve a specific outcome? In those cases, we need to use 5) pre____ or pr____e analyses
1) what
2) why
3) future
4) might
5) predictive or prescriptive
Predictive analysis strategies use 1) hi___l data to create models that estimate a 2) fu___ va___e or o___e.
Prescriptive analysis strategies, on the other hand, help determine which 3) op___ is most likely to produce the 4) b___ ou___e given the objective.
Predictive and prescriptive analysis strategies can determine the best use of 5) res___, im___e pe___ce, anticipate market re__s and mo___s, and av___ process breakdowns:
1) historical
2) future value or outcome
3) option is most
4) best outcome
5) resources, improve performance, anticipate market reactions and movements, and avoid process breakdowns
The last two steps of a data analysis plan are to consider 1) ri___ in__t to the data and analysis 2) str___ and to put in place 3) co___s to 4) re___ those risk
1) risks inherent
2) strategies
3) controls
4) reduce
It is critical to consider these risks during analysis strategy planning and to add controls to help 1) m___ them.
Without those controls we risk 2) pre___, inter___g, and rep____ inc___t analyses res___ possibly causing ourselves or our stakeholders to make 3) har__ dec___s
1) mitigate
2) preparing, interpreting, and reporting incorrect analyses results
3) harmful decisions
4.4 How Do Data and Analysis Strategies Differ Across Practice Areas?
As the last step of the data analysis planning stage, accounting professionals across practice areas 1) d___ data and analysis 2) str___ based on their project 3) ob___s
1) design
2) strategies
3) objectives
Accounting Information Systems
Because a company’s accounting information system is involved in 1) pla___, ex___, con___, and rep___ the business’ operations, data analysis projects in this area often involve 2) in____ary data and 3) kn___
1) planning, executing, controlling, and reporting
2) interdisciplinary
3) knowledge
Accounting Information Systems
These project objectives can range in scope from impacting just one or two employees to involving most of the organization.
All four types of analytics objectives (1) de___e, diag___, pr___e, and pre____) are commonly used.
1) descriptive, diagnostic, predictive, and prescriptive
Accounting Information Systems
The data analyzed in these projects can include a variety of IT operational data with different levels of data 1) inte___ and co__l doc___n.
More traditional accounting data are often analyzed with 2) qua___ cat____l and or____ data:
-Counts of h___ tickets and inc____
-Er___ counts and d___y times
-L___n issues
-System s____n and _ service satisfaction
1) integrity and control documentation
2) qualitative categorical and ordinal
-Counts of help tickets and incidents
-Error counts and delay times
-Login issues
-System satisfaction and IT service satisfaction
Accounting Information Systems
AIS data analyses projects may also choose data strategies which use 1) qua____ int___l and ra___ data:
Costs such as equipment and software installations and updates, and IT staff labor.
Budget variances for any of these costs
1) quantitative interval and ratio
Accounting Information Systems
It is important to select data that captures the AIS system’s 1) per__, vuln___, and er___
1) performance, vulnerabilities, and errors
Accounting Information Systems
Accountants designing analysis strategies for AIS projects are focused on increasing 1) co___ve adv____ and improving the 2) or___’s op__s.
They often use 3) st___s to understand which 4) ele__of the accounting system they should 5) f___ on. The mathematical formality of statistical tools can help to 5) pe___e management to make the necessary investments.
1) competitive advantage
2) organization’s operations
3) statistics
4) element
5) persuade
Accounting Information Systems
Risks
Auditing
Auditors use statistics to determine 1) r__s to account balances and 2) un__ transactions, especially in 3) jo___l en__s that involve estimates and assumptions.
They commonly work with different kinds of 4) d___ so___, from process documentation, journal entries, general and subsidiary ledgers, trial balances, to financial ratios
1) risks
2) unusual
3) journal entries
4) data sources
Auditing
Auditors use analysis strategies for:
-Continuous auditing modules to test large data populations rather than testing samples with inference risks.
-Automated identification of dirty data, unusual transactions, and pattern anomalies to better reduce audit risk and fraud risks.
-Testing entire transaction cycles’ internal controls with process mining and tracing purchasing to payment flows, P-cards (purchasing credit cards) usage and payroll documentation. Testing the revenue cycle from order to collection
Auditing
Auditors use analysis strategies for:
-Using robotic process automation (RPA) to remove the human (and often inconsistent) element of repetitive audit tasks, freeing auditors to focus on areas that require thoughtful critical thinking and judgments. (RPA applications and benefits to accountants are explained in the chapter that covers data and analysis developments in accounting.)
-Testing hypotheses about inventory and fixed assets with sampling, statistical tests, and inferences for the population of these assets.
Auditing
Risks
Financial Accounting
Financial accountants are primarily responsible for 1) cap____, re__g, pro___g, sto___, and re__g accounting information. 2) Ac__y, thoroughness, and documentation are essential.
The accounting data they use for analyses are both guided and restrained by accounting 3) re___ns and 4) go___ agency compliance
1) capturing, recording, processing, storing, and reporting
2) Accuracy
3) regulations
4) government
Financial Accounting
Due to the measurement scales of the variables they need for analysis, financial accountants may have to 1) tran___ data depending on their 2) ob___ q___ns.
These data most often include 3) cat___al, in__al, and ra___ data
1) transform
2) objective questions
3) categorical, interval, and ratio
Financial Accounting
The purpose of a data analysis strategy might be 1) des___ and di____ng based on corresponding financial accounting rules and regulations:
-2) Na___, ti___g, and aut____ of transactions (and diagnose 3) is___) charged to each account.
-4) In___l control effectiveness and weaknesses.
-The 5) com___ and rea___ of adjusting entries at the end of the period.
1) describing and diagnosing
2) Nature, timing, and authorization
3) issues
4) Internal
5) completeness and reasonableness
Financial Accounting
Financial accountants also use analysis strategies designed for their 1) pre___ and pr____e objectives regarding financial outcomes for managers and board members. One example is 2) es___g additional information to be presented along with their financial statements, such as:
-Ranking alternative capital sources and costs of capital in terms of favorability.
-Future net income and cash flows from operations, investing, and financing activities.
-Impacts of new strategies on the financial statement.
-Expected future costs associated with contingent liabilities, pension costs, new business units or the discontinuation of business units
1) predictive and prescriptive
2) estimating
Financial Accounting
Financial accountants use statistics to identify 1) opp___s and pr___s with 2) pr___, li___y, and business va___
1) opportunities and problems
2) profitability, liquidity, and business valuation
Financial Accounting
Risks
Managerial Accounting
The purpose of most managerial accounting data analyses is to improve 1) pla___, op___al control, and de___-m___g to support an organization’s mission and strategies.
This type of data analysis is valuable for improving the 2) se____, ex___n, and ev____n of organizational strategy.
These kinds of analyses can also lead to decisions that give employees more access to information, which improves performance and, eventually, organizational culture and operation
1) planning, operational control, and decision-making
2) selection, execution, and evaluation
Managerial Accounting
Managerial accountants prepare analysis strategies, both for routine and ad hoc (one-time) objectives for each functional area of their organization. These strategies use data across measurement scales to describe, diagnose, predict, and prescribe strategy impacts:
-Identifying areas where innovation in business organization, policy and process will increase efficiencies, for example, to reduce non-value-added steps and delays.
-Identifying areas where innovation in business partnerships, operations, and internal controls will increase effectiveness.
-Increasing competitive advantage through many new business intelligence opportunities.
-Improving compliance with all legal and regulatory regulations
Managerial Accounting
Identifying risks to the data and analyses used in managerial accounting is important to 1) e__e analysis results are 2) ac___ and re__e
1) ensure
2) accurate and reliable
Managerial Accounting
Risks
Tax Accounting
tax accountants have more information for their tax 1) pla___ and co___ce services, which improves their 2) jud____ and how they document defend their positions to both clients and regulators.
1) planning and compliance
2) judgments
Tax Accounting
Perhaps the biggest effect of data analysis strategies on tax practice is the movement away from a dependency on 1) h__c data and toward a forward-looking, 2) v___-ad___ se__ perspective by using 3) pr___ and pre___g analytics strategies.
Examples of these predictive and prescriptive analytics include using complex data modeling to evaluate decision and position alternatives, which improves the value and accuracy of advice
1) historic
2) value-adding service
3) predicting and prescribing
Tax Accounting
Risks
5.1 What is Data Profiling?
Dirty Data
-can result in issues from 1) in__ pricing to an inability to detect 2) f__d to sending wrong bills to customers
-data preparation helps 3) a___d these 4) is__ and more
1) inaccurate
2) fraud
3) avoid
4) issues
Define Data Preparation
The process of 1) tran____ data into an analytical database by 2) pr___g, cle___, re__ng and in____g data prior to processing and analysis
It helps ensure 3) hi___-qu___ data and, therefore, improved 4) d___n-m__g.
1) transforming
2) profiling, cleaning, restructuring and integrating
3) high-quality
4) decision-making
Define Data Profiling
the process of 1) in____ data 2) q___y and st____re
1) investigating
2) quality and structure
What are the 3 parts of Data Profiling
- Inv___ the data q___ty
- Inv____ data str___e
- De___and inf___g
- Investigating the data quality
- Investigating data structure
- Deciding and informing
3 parts of Data Profiling
- Investigating data quality: Search for 1) an___s in the data. That is, are the data 2) di__?
- Investigating data structure: Find the 3) b__t way to 4) org___ the data and 5) im___ an__s.
- Deciding and informing: Make 6) dec___ about whether it is possible to address the identified 7) is___, what the 8) c__t of doing so would be, and consider the possible 9) cons___ if the issues are not 10) ad___.
1) anomalies
2) dirty
3) best
4) organize
5) improve analytics
6) decisions
7) issues
8) ocst
9) consequences
10 addressed
Decisions made in the final phase will guide the 1) ex__, tran___, l__ (__) process by determining what must be 2) c___e
1) extract, transform, load (ETL)
2) change
For now, keep in mind that data profiling detects data 1) iss__ and 2) __ corrects them.
1) issures
2) ETL (extract, transform, load)
Investigate Data Quality
When discussing the “quality” of the data we work with, we are referring to the 1) sui__ of using the data for 2) de___n-m___.
Assessing data quality identifies 3) fla___ values in the data set, which reveals if there are data that must be 4) cle___.
There are different methods for doing this, which is called
-5) ru__-dr__ method
-6) exp___n and in___e
1) suitability
2) decision-making
3) flawed
4) cleaned
5) rule-driven method
6) exploration and inference
Investigate Data Quality
Rule-Driven Method
The rule-driven method is a 1) t__p-d___ approach.
A 2) log___ relationship, or 3) r___, is defined 4) a___g data and 5) te__ to determine if the data 6) con__ to it.
The number of rules that can be specified is nearly 7) unl___
1) top-down
2) logical
3) rule
4) among
5) tested
6) conform
7) unlimited
Investigate Data Quality
Exploration and Inference Methods
The exploration and inference methods are 1) bot___-__ approaches. The goal is to find 2) an___s by 3) exa___ the data from many different 4) per____.
Sorting, 5) fre___ distributions, and 6) o___r analysis are examples of powerful techniques for exploration purposes.
The second bottom-up approach, inference, is a method that relies on 7) co___ algorithms to identify 8) an___s
EX. sorting and frequency distributions
1) bottom-up
2) anomalies
3) examining
4) perspectives
5) frequency
6) outlier
7) computer
8) anomalies
Rule-driven, exploration, and inference methods identify data 1) ano__, which occur when the data 2) d_ n_t meet expectations of 3) cor___, val__, co___y, and com___
1) anomalies
2) do not
3) correctness, validity, consistency, and completeness
Define Validation Rules
An 1) in___l part of data profiling, these rules define what 2) va__ are and are 3) n__ acceptable for analysis when investigating data 4) q___
1) integral
2) values
3) not
4) quality
What are Data quality characteristics?
-cor__
-va____
-inc___t value, inv___value
-correctness
-validity
-incorrect value, invalid value
Correctness
Data describe facts about entities, such as the name of a customer, the price of a product, or the date of a transaction.
Data are incorrect when the value assigned to an entity’s characteristic is wrong
Validity
An incorrect value means the 1) wr__ value is assigned, and an invalid value means an 2) un____ value is assigned
1) wrong
2) unacceptable
Consistency
In addition to being correct and valid, data should be 1) con___
Data inconsistency occurs when the 2) s___e cha__c is represented 3) m__ways
1) consistent
2) same characteristic
3) multiple
Using mgr, mngr, and manager to describe the same job position would create data in___
inconsistency
Consistency
How can we identify inconsistencies like these?
Here are two profiling techniques:
-Create a 1) li__ with all distinct 2) val___, then 3) s___ and review. The inconsistent values of mgr, mngr, and manager will likely be noticeable immediately.
-Build a 4) fre__ table, or a table that 5) co___ how many times a 6) v__e occurs. Values with a 7) l___ frequency might indicate 8) inc___ data.
Which is the best option ?
1) list
2) values
3) sort
4) frequency
5) counts
6) value
7) low
8) inconsistent
the second one is the best option
Completeness
Data that is correct, valid, and consistent are only 1) ac___ if they are also 2) co___e, as noncomplete data might result in 3) dis___ insights
1) accurate
2) complete
3) distorted
Completeness
Data can be in complete in 2 ways.
- A missing instance is when a 1) co__t occurred but is not 2) rec___, (nothing there) such as when goods were sold, but the sales transaction was not recorded. A missing instance such as a missing sales transaction can be identified with gap analysis. If there is a sequential number for each sales invoice, then a missing number might indicate a sale that occurred but was not recorded.
- A missing value occurs when the 3) tra___ is recorded, but we 4) __ n___ have information for all 5) cha__. (we have some info, but not complete) The result is empty cells. This might happen if a customer is recorded, but the record does not include the customer’s email address. The term null typically indicates a missing or unknown value. It is important to assess how the missing information affects decision-making
1) concept
2) recorded
3) transaction
4) do not
5) characteristics
Investigate Data Structure
Along with their quality, data are investigated to assess whether they are 1) str___ in a way that makes data analytics 2) e__ and ef___t
1) structured
2) easy and efficient