Class 6 & 7 - Diagnostic Analysis With Stat Review Flashcards

1
Q

What is the difference between a Population vs. Sample

A

Population - a group of phenomena having something in common
Parameter is a characteristic of a population (μ)
Sample - a subset of members of a population selected to represent that population
Statistic is a characteristic of a sample (x̅)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the Normal Distributions with Different Areas Under The Curve (Within SD of Mean)

A

68% of the observations are within 1 standard deviation of the mean.
95.4% of the observations are within 2 standard deviations of the mean.
99.7% of the observations are within 3 standard deviations of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Write the Hypothesis Testing Formats. What is the set up for One Tailed and Two Tailed for Sales?

A

Null Hypothesis (H0): there is no significant difference between two populations, or the hypothesized relationship does not exist.
Alternative Hypothesis (HA): a hypothesis that is opposite of the null hypothesis, or a potential result that is expected.

Two-tailed tests:
H0: Saturday sales are equal to Sunday sales. (The difference is equal to zero.)
HA: Saturday sales are not equal to Sunday sales. (The difference is not equal to zero.)

One-tailed tests:
H0: Saturday sales are less than or equal to Sunday sales. (The difference is less than or equal to zero.)
HA: Saturday sales are greater than Sunday sales. (The difference is greater than zero.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you do hypothesis testing with a t -test?

A

Use the Student’s t-test to examine how “unusual” an outcome is.

  1. Compute the t-statistic, t = (M - µM ) / SM where the SM = S/ N is the standard error of M. M and S are the sample mean and standard deviation, respectively.
  2. Compare the t-statistic against the critical t-value (for right-tail target area):
    If t-statistic < critical t-value, do not reject the null hypothesis
    If t-statistic >= critical t-value, reject the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In hypothesis testing, what determines a critical t-value, and what are the common critical t-values? How do we find the critical values in excel?

A

The critical t-value depends on:
Sample size (N) (Degrees of freedom = N - 1)
Significance level (α) (e.g., 5% or 1%)
One-tailed test:
α = 5%: 1.697
α = 1%: 2.457
Two-tailed test:
α = 5%: 2.042
α =1%: 2.750
On Excel:
One-Tailed Test: T.INV(α, N)
Two-Tailed Test: =T.INV.2T(α, N)
α (alpha) = Significance level (e.g., 0.05 or 5%).
N = Degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how to use the p-value to decide whether to reject the null hypothesis in a t-test. Provide an Example:

Consider:
Data Given: Sample size N=100, t-statistic = 1.290, p-value = 0.09.

A

Step 1: Compare p-value with α (the significance level)
If p-value > α, do not reject the null hypothesis H0
If p-value ≤ α, reject the null hypothesis H0

Example:
Consider H0 (Saturday sales are less than or equal to Sunday sales) vs. HA (Saturday sales are greater than Sunday sales).
Data Given: Sample size N=100, t-statistic = 1.290, p-value = 0.09.
Since p-value = 0.09 > α = 0.05: We would conclude that “the test cannot reject the null hypothesis at the 5% significance level.”
The confidence level is calculated as 1−α, If α=5%, the confidence level = 95%. We can now state
The test cannot reject the null hypothesis (Saturday sales are less than or equal to Sunday sales) at the 95% confidence level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Type I Error and Type 2 II Error. Draw the Chart

A

CHART: Decision Made
H0 is True
H0 is False
Reject H0
Type I Error (α)
Correct (1 - β)
Accept H0
Correct (1 - α)
Type II Error (β)

Type I Error (α): Rejecting a true null hypothesis (false positive)
Example: Saying a drug works when it does not

Type II Error (β): Accepting a false null hypothesis (false negative)
Example: Saying a drug does not work when it actually does

Correct Decisions:
1−α: Correctly accepting a true null hypothesis
1−β: Correctly rejecting a false null hypothesis

Power of the Test: The ability to detect a false null hypothesis (H0)
Power = 1−β1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When do you use a one-tailed test or two-tailed test?

A

One-Tailed Test
When to Use: If the hypothesis involves direction (greater than, less than).

Two-Tailed Test
When to Use: If the hypothesis does not specify a direction. You are testing for a difference in either direction (greater or less).
Keywords:”not equal to”, “different from”, “changed”, “effect” (without specifying direction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the key measures for the regression model that make a good-fit? What about statistically significant?

A

Goodness-of-fit measures for the regression model:
R2
Adjusted R2
F statistics (and significance)

The estimated coefficient on an independent variable is statistically different from zero at the ∝ significance level, if:
t-statistic > critical t-value (given ∝ and N),
or p-value < ∝

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpret the Regression Outputs: Interpret R-Square (R²):

A

R-Square (R²):

What it tells you: R² measures the proportion of variance in the dependent variable explained by the independent variables in the model.

Range: Between 0 and 1.
Closer to 1: A better fit (more variance explained).
Closer to 0: Poor fit (less variance explained).

Key Note: R² does not penalize for adding more variables, so it can be artificially high with many predictors.

Sentence Template:
“The R² value is [value], meaning that [value as a percentage]% of the variation in [dependent variable] is explained by the independent variables in the model.”
Example:
“The R² value is 0.6423, meaning 64.23% of the variation in college completion rates is explained by the predictors in the regression model.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpret the Regression Outputs: Interpret Adjusted R-Square (R²):

A

Adjusted R-Square (Adjusted R²):
What it tells you: Adjusted R² modifies R² to account for the number of predictors. It penalizes for adding unnecessary variables (overfitting).
When to use: Use Adjusted R² instead of R² when comparing models with different numbers of predictors.
Key Note: Adjusted R² will always be lower than R² unless the added variables improve the model fit.
Sentence Template:
“The Adjusted R² is [value], which accounts for the number of predictors in the model. It indicates that [value as a percentage]% of the variance in [dependent variable] is explained after adjusting for the predictors.”
Example:
“The Adjusted R² is 0.6421, meaning 64.21% of the variation in college completion rates is explained by the predictors, adjusted for the number of variables in the model.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpret the F-Statistic and Significance F (Overall Model Significance):

A

What it tells you: The F-statistic tests whether the regression model as a whole is significant.
Significance F (p-value): If p-value < α (e.g., 0.05), the model is statistically significant.

Key Note:
A large F-statistic (relative to its critical value) and small p-value suggest the model fits the data well.

Sentence Template:
“The F-statistic is [value] with a p-value of [value]. Since the p-value is [less than/greater than] the significance level of 0.05, we conclude that the overall regression model is [significant/not significant].”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpret the t-Statistic and p-Value (Variable Significance):
(Overall Model Significance):

A

What it tells you: The t-statistic and p-value determine whether individual predictors are significant.
Steps:
Compare the p-value of each predictor to the chosen significance level (α = 0.05).
If p-value < 0.05, the variable is statistically significant.
Sentence Template:
“The t-statistic for [variable name] is [value], with a p-value of [value]. Since the p-value is [less than/greater than] 0.05, [variable name] is [significant/not significant] in explaining the [dependent variable].”
Example:
“The t-statistic for SAT_AVG is 47.74 with a p-value of 1.26E-25. Since the p-value is less than 0.05, SAT_AVG is a significant predictor of college completion rates.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of Diagnostic Analysis?

A

Descriptive analytics answers the question, “What Happened?”
Diagnostic analytics takes it a step further by asking “Why it Happened?” “What are the Reasons for Past Results?”
Diagnostic analytics are performed to investigate the underlying reasons for past results that cannot be answered by simply looking at the descriptive evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two broad types of Diagnostic Analysis?

A
  1. Identifying Anomalies and Outliers
    Look for unusual, unexpected results or transactions.
    Find out what might have occurred and why they occurred.
    Frauds, errors, or just extreme observations?
  2. Finding Patterns or Relationships among Variables
    Performing drill-down analytics
    Look for patterns in the data by examining correlations and summarizing data at different levels to understand why something happened.
    ​​Performing statistical analyses
    Uncover patterns in the data or how data moves together.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Identifying Anomalies and Outliers: Explain what an anomaly and outlier is:

A

Anomaly – something that departs from the expected.
Outlier – a data point that lies outside its expected distribution (like an anomaly).
Business professionals expect certain levels of performance or outcomes from companies and various business processes.
When that doesn’t happen, they will conduct additional investigations to figure out why.
Are they frauds, errors, or outliers?

17
Q

Give an example of expectations vs anomaly and the explanation for that anomaly/outlier in financial accounting and managerial accounting.

A

Financial Accounting
Expectation: Past year’s performance or the performance of other firms in the same industry
Example of Anomaly: Profits tripled in the current year from the past.
Possible Explanation for the Anomaly/Outlier: New products are selling well and expenses are under control.

Managerial Accounting
Expectation: Standard cost is used for each foot of decking product produced.
Example of Anomaly: There is a large labor rate variance. The per hour charge of direct labor at the factory is much higher than anticipated.
Possible Explanation for the Anomaly/Outlier: Our employees had to put in overtime to complete the work. Overtime is 1.5 times normal pay

18
Q

Give an example of expectations vs anomaly and the explanation for that anomaly/outlier audit (external), audit(internal), and tax.

A

Audit (External)
Expectation:IFRS are used.
Example of Anomaly: An auditor found its client capitalized research costs, contrary to IFRS
Possible Explanation for the Anomaly/Outlier: The company bought longterm testing equipment for future research and argues it should be capitalized.
Audit (Internal)
Expectation: Segregation of duties (internal control) – need two signers for check when over $50,000
Example of Anomaly: 15 checks greater than $50,000 only had 1 signer, violating their internal controls
Possible Explanation for the Anomaly/Outlier: There are only two possible signers for checks, and one was out on family leave for an extended period.
Tax
Expectation: Taxable income generally increases when net income increases
Example of Anomaly: Taxable income decreases, while net income increases.
Possible Explanation for the Anomaly/Outlier: Tax law changed making some types of income not subject to tax.

19
Q

What are the 8 Diagnostic Analytic Techniques?

A

Internal Controls Testing – Separation of Duties
Internal Controls Testing – Unusual Accounting Activity
Testing for Duplicate Transactions
Fuzzy Matching
Sequence Checks – Identifying Missing Checks
Bank Reconciliation
Variance Analytics (management accounting)
Benford’s Law

20
Q

Explain the diagnostic analytic techniques of Internal Controls Testing.

A

Internal Controls Testing – Separation of Duties

Duties are clearly assigned to individuals and segregated according to the tasks. The person who authorizes a transaction should not be the same individual as the one recording the transaction… to prevent frauds and avoid errors.
Some checks were both signed and recorded by the same person. (Lab 7-1)

Internal Controls Testing – Unusual Accounting Activity
Most transactions are made and recorded during normal work hours and normal work days.
An unusual number of transactions occur on the weekends, on holidays, or at the end of the quarter. (Lab 7-2)

21
Q

Explain the diagnostic analytic techniques of Testing for Duplicate Transactions and Fuzzy Matching.

A

Testing for Duplicate Transactions
Each transaction is independent of (not identical to) any other similar transaction.
Why are there duplicates of some transactions in the financial reporting records? (Lab 7-7)
Fuzzy Matching
Fuzzy Matching is a way to compare two pieces of data (like addresses or names) to see if they are similar, even if they’re not exactly the same.
Addresses of the company vendors are independent of the company employees. Similarly, addresses of customers getting refunds are independent of the employee offering the refund.
Some vendor addresses are similar to employee addresses. (Lab 7-5)

22
Q

Explain the diagnostic analytic techniques of Fuzzy Matching, provide and example and explain how it works in excel?

A

It is used to find potential equivalents when there is less than an exact fit.
Example: Some vendor addresses/names are similar to employee addresses/names.
What was the nature of these transactions?
Were they due to fraud, error, or just a coincidence?
Fuzzy Lookup Setting the Similar Threshold:
Click Fuzzy Lookup: Two Tables Eg. Version 1 and 2
Similarity threshold default is 0.5 this means anything greater than that it will catch it anything less it won’t
Depending on how important false positives are to you, you can make it more or less
If u want to catch less make the number bigger,, want to catch more make it smaller (useful for auditors)
Set tolerance too low (too loose) – too many matches
Type 1 errors (Reject a true Ho of “No Fraud” too often)
Set tolerance too high (too strict) – too few matches
Type 2 errors (Accept a false Ho of “No Fraud” too often)

23
Q

Explain the diagnostic analytic techniques of Sequence Checks -identifying missing checks/payments, and Bank Reconciliation.

A

Sequence Checks – Identifying Missing Checks/Payments
Checks are written in ascending in order.
Why are some check numbers missing documentation? (Lab 7-6)

Bank Reconciliation
Cash balance at the bank is the same amount as cash balance in the general ledger.
Examines differences between transactions recorded by the bank and the general ledger. (Lab 5-2 and Lab 7-3)

24
Q

Explain the diagnostic analytic techniques of Variance Analytics(management accounting) and Benford’s Law

A

Variance Analytics (management accounting)
Budgeted cost of manufacturing product or providing service.
Why is the labor rate variance for direct labor unfavorable?

Benford’s Law
The first digit of naturally occurring numerical datasets follow an expected distribution

Why does the first digit of some refunds depart from the distribution expected by Benford’s Law? (Lab 7-4)

Refunds may depart from Benford’s Law due to potential manipulation or artificial adjustments. For instance, refunds might be processed in rounded values, such as $10 or $50, rather than naturally occurring numbers. Additionally, fraudulent activity could cause larger starting digits to appear more frequently. Other reasons could include small sample size, system defaults, or non-natural data generation, which disrupt the expected Benford’s distribution.”

25
Q

Show how you would use a Chi-Square test to evaluate the independence of two variables (e.g., actual vs expected) in a contingency table.

A

Step 1: Use Benford’s Law Formula
For each digit d (1 to 9), calculate the expected percentage using:
P(d)=log10​(1+1/d​)
Step 2: Count the Actual Occurrences
Count how many times each digit d appears as the first digit.Write these counts down in a table.
Step 3: Calculate the Actual %
To get the percentage for each digit: Actual %=Count for digit/ Total count ​×100
Step 4: Calculate the Expected Count
Using Benford’s Law percentage, calculate the expected count for each digit:
Expected Count=Benford’s Law % × Total Observations
Step 5: Chi-Square Test
Use the Chi-Square Formula to compare the actual and expected counts:
χ2=∑(Oi​−Ei​)2/ Ei​, Oi​: Actual count for digit i, Ei​: Expected count for digit i
Step 6: Interpret the Results
Use the Chi-Square result to find the p-value.
Compare the p-value to α=0.05
If p-value > 0.05 → No significant difference (data follows Benford’s Law).Cannot reject H0
If p-value ≤ 0.05 → Significant difference (potential irregularities or fraud). Reject H0

26
Q

When is Benford’s Law Useful and Not Useful?

A

When is Benford’s Law Useful and Not Useful?

Useful:
Mathematically Combined Numbers:
Example: Accounts receivable/payable (number sold × price).
Transaction-Level Data:
Example: Disbursements, sales, expenses.
Large Datasets (more observations → better results):
Example: Full year’s transactions.
Skewed Data with Higher Mean than Median:
Example: Most sets of accounting numbers.

Not Useful
Assigned Numbers:
Example: Check numbers, invoice numbers, zip codes.
Numbers Influenced by Human Thought:
Example: Prices set at $1.99, ATM withdrawals.
Firm-Specific Numbers:
Example: Accounts with set amounts, e.g., $100 refunds.
Built-In Minimum/Maximum Rules:
Example: Assets needing a threshold to record
No Transactions Recorded:
Example: Thefts, kickbacks, contract rigging.

27
Q

Explain how you would do Hypothesis Testing Using a Difference in Means

A

Purpose:
Comparing two groups to see if there’s a significant difference in their means.
Steps to Test Difference in Means:
Null Hypothesis (H₀): The means of two groups are equal (no difference).
Alternative Hypothesis (Hₐ): The means are not equal (two-tailed) or one is greater/lesser (one-tailed).

Example Questions:
Is the average Nordstrom’s holiday season sales higher than the non-holiday season?
H₀: Holiday Sales = Non-Holiday Sales
Hₐ: Holiday Sales > Non-Holiday Sales (one-tailed test).
Is the average female executive salary lower than the male executive salary?
H₀: Female Salary ≥ Male Salary
Hₐ: Female Salary < Male Salary (one-tailed test).

Key Notes:
Use a t-test for comparing two means (independent samples).
Specify equal or unequal variances based on the problem setup.
Hypothesis Testing Using a Difference in Means: Example Excel
Steps to Calculate in Excel:
Go to Data > Data Analysis (requires Analysis ToolPak Add-In).
Choose t-Test: Two-Sample Assuming Unequal Variances.
Input Ranges:
Variable 1 Range: e.g., CEO Male Compensation.
Variable 2 Range: e.g., CEO Female Compensation.
Set Alpha (α) to 0.05 (default significance level).
Output Range: Select a cell to display results.
Click OK.

28
Q

Explain how you would do Hypothesis Testing Using Regression

A

General Process of Hypothesis Testing with Regression:
State the Hypotheses:
Null Hypothesis (H₀): There is no relationship between the independent variable(s) and the dependent variable.
Example: H₀: Advertising expenses do not affect sales revenue.
Alternative Hypothesis (H₁): There is a relationship between the independent variable(s) and the dependent variable.
Example: H₁: Advertising expenses positively affect sales revenue.

Set up the regression equation:
Y=β0+β1X+ϵ
Y: Dependent variable (e.g., Sales Revenue)
X: Independent variable (e.g., Advertising Expenses)
β₀: Intercept (baseline)
β₁: Slope (relationship between X and Y)
ε: Error term