Data Analysis & Reporting Tools Flashcards

1
Q

Which of the following steps takes place during the preparation phase of a data analysis engagement?

A. Determining whether predication exists

B. Articulating examination objectives

C. Building a profile of potential frauds

D. Cleansing and normalizing the data

A

D. Cleansing and normalizing the data

Articulating the examination objectives, determining whether predication exists, and building a profile of potential frauds are all steps of the planning phase of the data analysis process, which is the first phase that should be undertaken. The second phase of the data analysis process is the preparation phase. The results of a data analysis test will only be as good as the data used for the analysis. Thus, before running tests on the data, the fraud examiner must make certain the data being analyzed are relevant and reliable for the objective of the engagement. During the preparation phase of the data analysis process, the fraud examiner must complete several important steps, including:
• Identifying the relevant data
• Obtaining the requested data
• Verifying the data
• Cleansing and normalizing the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following is a limitation of Benford’s Law?

A. Benford’s Law applies best to data sets with three-digit numbers.

B. Benford’s Law can only be applied to data sets listed in U.S. dollars.

C. Benford’s Law only works on data sets with assigned numbers, such as bank account or telephone numbers.

D. Benford’s Law cannot be applied to data sets with non-natural numbers, such as check or invoice numbers.

A

D. Benford’s Law cannot be applied to data sets with non-natural numbers, such as check or invoice numbers.

Benford’s Law distinguishes between natural and non-natural numbers, and it is important to understand the difference between the two types because Benford’s Law cannot be applied to data sets with non-natural numbers. Natural numbers are those numbers that are not ordered in a particular numbering scheme and are not human-generated or generated from a random number system. For example, most vendor invoice totals will be populated by dollar values that are natural numbers. Conversely, non-natural numbers (e.g., employee identification numbers and telephone numbers) are designed systematically to convey information that restricts the natural nature of the number. Any number that is arbitrarily determined, such as the price of inventory held for sale, is considered a non-natural number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A fraud examiner is conducting textual analytics on journal entry data and runs a keyword search using the terms override, write off, and reserve. With which leg of the fraud triangle are these fraud keywords typically associated?

A. Rationalization

B. Opportunity

C. Pressure

D. Capability

A

B. Opportunity

In conducting a textual analytics examination, the fraud examiner should come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, the suspected fraud schemes or types of fraud risk present, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails.

The factors identified in the fraud triangle are helpful when coming up with a fraud keyword list. One of these factors is opportunity; consequently, the fraud examiner should consider how someone in the entity might have the opportunity to commit fraud. Examples of keywords that indicate the opportunity to commit fraud include override, write off, recognize revenue, adjust, discount, and reserve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Black, a fraud examiner, is conducting textual analytics on emails sent to and from specific employees that his client has identified as fraud suspects. He is using the fraud triangle to come up with a list of fraud keywords to use in his search. Which of the following words found in email text might indicate a fraudster is rationalizing his actions?

A. Quota

B. Override

C. Deserve

D. Write off

A

C. Deserve

In conducting a textual analytics examination, the fraud examiner should come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, the suspected fraud schemes or types of fraud risk present, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails.

The factors identified in the fraud triangle are helpful when coming up with a fraud keyword list. One of these factors is rationalization; consequently, the fraud examiner should consider how someone in the entity might be able to rationalize committing fraud. Because most fraudsters do not have a criminal background, justifying their actions is a key part of committing fraud. Some keywords that might indicate a fraudster is rationalizing his actions include reasonable, deserve, borrow, and temporary.

Other keywords can be used to identify the other factors indicated by the fraud triangle. For example, write off and override would indicate opportunity to commit fraud, while deadline suggests pressure to commit fraud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lee manages a house painting company. He notices paint expenses have increased substantially from the prior year, which is unexpected because the company had much less business this year and painted fewer houses. Which of the following data analysis functions would be the most useful in helping Lee determine the relationship between paint expense and houses painted?

A. Verifying multiples of a number

B. Correlation analysis

C. Stratification

D. Benford’s Law analysis

A

B. Correlation analysis

Using the correlation analysis function, investigators can determine the relationships between different variables in the raw data. Investigators can learn a lot about data files by learning the relationship between two variables. For example, we should expect a strong correlation between the following independent and dependent variables because a direct relationship exists between the two variables. Hotel costs should increase as the number of days traveled increases. Gallons of paint used should increase as the number of houses painted increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is an example of a data analysis function that can be performed to help detect fraud through examination of payroll accounts?

A. Identify paycheck amounts over a certain limit.

B. Compare approved vendors to the cash disbursement payee list.

C. Generate depreciation to asset cost reports.

D. Compare customer credit limits and current or past balances.

A

A. Identify paycheck amounts over a certain limit.

The following are examples of data analysis queries that can be performed by data analysis software on payroll accounts to help detect fraud:
• Summarize payroll activity by specific criteria for review.
• Identify changes to payroll or employee files.
• Compare timecard and payroll rates for possible discrepancies.
• Prepare check amount reports for amounts over a certain limit.
• Check proper supervisory authorization on payroll disbursements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

_____________ is a method of using software to extract usable information from unstructured data.

A. Benford’s Law

B. Textual analytics

C. The Fog Index

D. Linguistic analytics

A

B. Textual analytics

Textual analytics is a method of using software to extract usable information from unstructured text data. Through the application of linguistic technologies and statistical techniques—including weighted fraud indicators (e.g., fraud keywords) and scoring algorithms—textual analytics software can categorize data to reveal patterns, sentiments, and relationships indicative of fraud. For example, an analysis of email communications might help fraud examiners to gauge the pressures/incentives, opportunities, and rationalizations to commit fraud that exist in an organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is a data analysis tool that is effective in identifying indirect relationships and relationships with several degrees of separation?

A. Geospatial analysis

B. Word maps

C. Tree maps

D. Link analysis

A

D. Link analysis
Link analysis software is used by fraud examiners to create visual representations (e.g., charts with lines showing connections) of data from multiple data sources to track the movement of money; demonstrate complex networks; and discover communications, patterns, trends, and relationships.

Link analysis is very effective for identifying indirect relationships and relationships with several degrees of separation. For this reason, link analysis is particularly useful when conducting a money laundering investigation, since it can track the placement, layering, and integration of money as it moves around unexpected sources. It could also be used to detect a fictitious vendor (shell company) scheme. For instance, the investigator could map visual connections between a variety of entities that share an address and bank account number to reveal a fictitious vendor created to embezzle funds from a company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Suppose you suspect there is a ghost employee scheme taking place in your organization and you want to compare the payroll records to the employee master file. Which data analysis technique would you use to match these two data records?

A. Correlation analysis

B. The join function

C. Compliance verification

D. Gap testing

A

B. The join function

The join function gathers the specified parts of different data files. Joining files combines fields from two sorted input files into a third file. Join is used to match data in a transaction file with records in a master file, such as matching invoice data in an accounts receivable file to a master cluster. For example, you might need to compare two different files to find differing records between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Text-based data is typically considered:

A. Documentary data

B. Unstructured data

C. Narrative data

D. Structured data

A

B. Unstructured data

Data are either structured or unstructured. Structured data is the type of data found in a database, consisting of recognizable and predictable structures. Examples of structured data include sales records, payment or expense details, and financial reports. Unstructured data, by contrast, is data that would not be found in a traditional spreadsheet or database. It is typically text-based.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

________ is a data analysis software function that allows users to relate several files by defining relationships in collected data, without the use of the join command.

A. Correlation analysis

B. Record selection

C. Multi-file processing

D. Verifying multiples of a number

A

C. Multi-file processing

Multi-file processing allows the user to relate several files by defining relationships between multiple files, without the use of the join command. A common data relationship would be to relate an outstanding invoice master file to an accounts receivable file based on the customer number. The relationship can be further extended to include an invoice detail file based on invoice number. This relationship will allow the user to see which customers have outstanding invoices sorted by date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following data analysis functions is most useful in testing for hidden journal entries?

A. Statistical sampling

B. Aging analysis

C. Gap testing

D. Identifying duplicates

A

C. Gap testing

Gap testing is used to identify missing items in a sequence or series, such as missing check or invoice numbers. It can also be used to find sequences where none are expected to exist (e.g., employee Social Security numbers). In reviewing journal entries, gaps might signal possible hidden entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is TRUE regarding textual analytics?

A. Textual analytics is used to figure out whether someone is lying or telling the truth based on context clues.

B. There is a universal list of fraud keywords to use when implementing textual analytics that is applicable to any fraud examination.

C. The purpose of performing textual analytics is to search for and find an admission of fraud that can be presented in court.

D. Textual analytics can be used to categorize data to reveal patterns, sentiments, and relationships indicative of fraud.

A

D. Textual analytics can be used to categorize data to reveal patterns, sentiments, and relationships indicative of fraud.

Textual analytics is a method of using software to extract usable information from unstructured text data. Through the application of linguistic technologies and statistical techniques—including weighted fraud indicators (e.g., fraud keywords) and scoring algorithms—textual analytics software can categorize data to reveal patterns, sentiments, and relationships indicative of fraud. For example, an analysis of email communications might help fraud examiners to gauge the pressures/incentives, opportunities, and rationalizations to commit fraud that exist in an organization. Textual analytics provides the ability to uncover additional warning signs of rogue employee behavior.

Depending on the type of fraud risk present in a fraud examiner’s investigation, he will want to come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, fraud schemes, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails. It might be helpful to look at the fraud triangle when coming up with a keyword list. Additionally, it can be helpful to consider the three factors identified in the fraud triangle when coming up with a keyword list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following is an example of a data analysis function that can be performed to detect fraud through examination of accounts payable?

A. Sort asset values by asset type or dollar amount.

B. Identify debits to expense accounts outside of set default accounts.

C. Summarize cash disbursements by bank account.

D. Select samples for asset existence verification.

A

B. Identify debits to expense accounts outside of set default accounts.

The following are typical examples of data analysis queries that can be performed by data analysis software on accounts payable:
• Audit paid invoices for manual comparison with actual invoices.
• Summarize large invoices by amount, vendor, etc.
• Identify debits to expense accounts outside of set default accounts.
• Reconcile check registers to disbursements by vendor invoice.
• Verify vendor 1099 requirements.
• Create vendor detail and summary analysis reports.
• Review recurring monthly expenses and compare to posted/paid invoices.
• Generate a report on specified vouchers for manual audit or investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Scott, a fraud examiner, is concerned that employees are abusing their expense accounts and are spending more than the $30 per day allowed for meals. Which of the following is the most appropriate data analysis function for locating meal expenses greater than $30?

A. Gap testing

B. Duplicate search

C. Compliance verification

D. Multi-file processing

A

C. Compliance verification

Compliance verification determines whether company policies are met by employee transactions. If a company limits the amount of its reimbursements, the software can check to see that this limit is being observed. Many times, fraud examiners can find early indications of fraud by testing detail data for values above or below specified amounts. For example, when employees are out of town, do they adhere to company policy of spending not more than $30 per day for meals? To start, fraud examiners can look at all expense report data and select those with daily meal expenses exceeding $30. With the information returned from this simple query, there is a starting point for suspecting fraud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Understanding the data, determining whether predication exists, and building a profile of potential frauds are all steps of which phase of the data analysis process?

A. The testing and interpretation phase

B. The planning phase

C. The preparation phase

D. The post-analysis phase

A

B. The planning phase

As with most tasks, proper planning is essential in a data analysis engagement. Without sufficient time and attention devoted to planning early on, the fraud examiner risks analyzing the data inefficiently, lacking focus or direction for the engagement, running into avoidable technical difficulties, and possibly overlooking key areas for exploration.

The first phase of the data analysis process is the planning phase. This phase consists of several important steps, including:
• Understanding the data
• Articulating examination objectives
• Building a profile of potential frauds
• Determining whether predication exists
17
Q

Which of the following is an example of a data analysis function that can be performed to help detect fraud through examination of asset accounts?

A. Recalculate expense and reserve amounts using replacement costs

B. Compare book and tax depreciation and indicate variances

C. Sort asset values by asset type or dollar amount

D. All of the above

A

D. All of the above
The following are examples of data analysis queries that can be performed by data analysis software on asset accounts to help detect fraud:
• Generate depreciation to cost reports.
• Compare book and tax depreciation and indicate variances.
• Sort asset values by asset type or dollar amount.
• Select samples for asset existence verification.
• Recalculate expense and reserve amounts using replacement costs

18
Q

Why would a fraud examiner perform duplicate testing on data?

A. To determine whether company policies are met by employee transactions

B. To identify transactions with matching values in the same field

C. To determine the relationship between different variables in raw data

D. To identify missing items in a sequence or series

A

B. To identify transactions with matching values in the same field

Duplicate testing is used to identify transactions with duplicate values in specified fields. This technique can quickly review the file, or several files joined together, to highlight duplicate values of key fields. In many systems, the key fields should contain only unique values (no duplicate records).

For example, a fraud examiner would expect fields such as check numbers, invoice numbers, and Social Security numbers to contain only unique values within a data set; searching for duplicates within these fields can help the fraud examiner find anomalies that merit further examination.

19
Q

During which phase of the data analysis process does the fraud examiner identify, obtain, and verify the relevant or requested data?

A. The post-analysis phase

B. The testing and interpretation phase

C. The preparation phase

D. The planning phase

A

C. The preparation phase

The second phase of the data analysis process is the preparation phase. The results of a data analysis test will only be as good as the data used for the analysis. Thus, before running tests on the data, the fraud examiner must make certain the data being analyzed are relevant and reliable for the objective of the engagement. During the preparation phase of the data analysis process, the fraud examiner must complete several important steps, including:
• Identifying the relevant data
• Obtaining the requested data
• Verifying the data
• Cleansing and normalizing the data