2 Flashcards

1
Q

Question 1
In fraud detection, the target fraud indicator is usually
a) easy to determine.
b) hard to determine.

A

b) hard to determine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Question 2
A spider construction is an example of a fraud schema often used in
b) Credit card fraud.
c) Insurance claim fraud.
d) Tax evasion.
e) All of the above.

A

d) Tax evasion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Question 3
The key novely of CSLogit is that we now
a) maximize the average expected cost and the complexity.
b) minimize the average expected cost and the complexity.
c) maximize the likelihood and the complexity.
d) minimize the likelihood and the complexity.

A

b) minimize the average expected cost and the complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Question 4
CSLogit uses a
a) ridge regression complexity term.
b) LASSO complexity term.
c) elastic net complexity term.

A

b) LASSO complexity term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Question 5
To find the optimal parameters CSLogit uses
a) genetic algorithms.
b) gradient descent.

A

b) gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Question 6
When compared to traditional logistic regression,
a) CSLogit performs better in terms of savings.
b) CSLogit performs worse in terms of savings.

A

a) CSLogit performs better in terms of savings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Question 7
Which statement is CORRECT?
a) Supervised approaches can detect only previously known fraud patterns as they occurred in the past.
b) Unsupervised approaches look for unusual anomalous behavior deviating from a norm; hence they can detect previously unknown fraud (also referred to as anomaly detection methods).
c) Both statements are correct.

A

c) Both statements are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Question 8
Which statement is CORRECT?
a) In OLAP, a roll-up operation aggregates across one or more dimensions. An example of this is the distribution of the amount of a claim and the recency aggregated across all the number of cars.
b) In OLAP, drill-down is the opposite operation of roll-up whereby more detail is asked for by adding another dimension to the analysis.
c) In OLAP, slicing refers to selecting a slice of the OLAP cube along one of its dimensions.
d) In OLAP, a dicing operation fixes values for all the dimensions and creates a sub-cube.
e) All statements are correct.

A

e) All statements are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Question 9
Which statement is NOT CORRECT?
a) Traditional techniques for detecting outliers can be affected by outliers so strongly that the resulting fitted model may not allow to detect the deviating observations. This is called the masking effect.
b) When using traditional techniques for detecting outliers, some good data points might even appear to be outliers, which is known as swamping.
c) The goal of robust statistics is to find a fit which is different to the fit we would have found without the outliers.
d) It is not the aim to replace traditional techniques by a robust alternative but illustrate that robust methods can give you extra insights in the data and may improve the reliability and accuracy of your analysis.

A

c) The goal of robust statistics is to find a fit which is different to the fit we would have found without the outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Question 10
The z-score measures
a) how many standard deviations an observation lies away from the median for a variable.
b) how many standard deviations an observation lies away from the mean for a variable.
c) how many standard deviations an observation lies away from the minimum for a variable.
d) how many standard deviations an observation lies away from the maximum for a variable.

A

b) how many standard deviations an observation lies away from the mean for a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Question 11
The median and interquartile range (IQR)
a) change when outliers are present.
b) do not change when outliers are present.

A

b) do not change when outliers are present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Question 12
In a multivariate setting, outliers can
a) not always be detected by simply applying outlier detection rules to each variable separately.
b) always be detected by simply applying outlier detection rules to each variable separately.

A

a) not always be detected by simply applying outlier detection rules to each variable separately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Question 13
Which statement is NOT CORRECT?
a) The antifraud rationale behind the use of Benford’s law is that producing empirical distributions of digits that conform to the law is difficult for non-experts. Fraudsters may thus be biased toward simpler and more intuitive distributions, such as the uniform.
b) If a data set complies with Benford’s law, it can still be fraudulent.
c) According to Benford’s law, the probability that the first digit equals 1 is about 4.6%, while it’s 30% for digit 9.
d) Most financial data and accounting numbers generally conform to Benford’s law.

A

c) According to Benford’s law, the probability that the first digit equals 1 is about 4.6%, while it’s 30% for digit 9.

vice versa
digit 1= 30%
digit 9= 4.6%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Question 14
According to Benford’s law, the first digit d appears with a probability of:
P(d)=log 10 (1/d)
P(d)=log10 (1+ 1/d)
P(d)=log10(d)
P(d)=log10(1-1/d)

A

P(d)=log10 (1+ 1/d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Question 15
Which of the following data sets typically does not comply with Benford’s law?
a) Data where numbers represent sizes of facts or events.
b) Data in which numbers have no relationship to each other.
c) Data sets that arise from additive fluctuations.
d) Some well-known infinite integer sequences.

A

c) Data sets that arise from additive fluctuations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Question 16
Breakpoint analysis is an
a) inter-account fraud detection method.
b) intra-account fraud detection method.

A

b) intra-account fraud detection method.

17
Q

Question 17
Peer group analysis focusses on
a) local instead of global fraud patterns.
b) global instead of local fraud patterns.

A

a) local instead of global fraud patterns.

18
Q

Question 18
Deviating behaviour because of special events (e.g., Black Friday, Christmas) is more likely to be considered as anomalous by
a) Breakpoint analysis.
b) Peer group analysis.

A

a) Breakpoint analysis.

19
Q

Question 19
When using association rules for fraud detection, the frequency of an itemset is measured by its
a) support.
b) confidence.

A

a) support.

20
Q

Question 20
Which statement about isolation forests is NOT CORRECT?
a) The idea is to train an ensemble or forest of binary trees to detect anomalies.
b) The intuition behind the algorithm is that anomalies will be detected using only a few splits whereas normal observations will need a lot of splits to get isolated.
c) If the average path length of x is close to zero, the anomaly score will be close to one, hence x can be regarded as an anomalous instance.
d) Isolation forest usually underperforms several other state-of-the-art anomaly detectors on various real-life data sets.

A

d) Isolation forest usually underperforms several other state-of-the-art anomaly detectors on various real-life data sets.

21
Q

Question 21
Which of the following is an advantage of isolation forests?
a) Computationally efficient
b) Easy to parallelize
c) Applicable to big and high dimensional data
d) Good performance
e) All of the above.

A

e) All of the above.

22
Q

Question 22
In the Local Outlier Factor method, the Reachability distance is
a) the maximum of the k-distance and the distance (e.g., Euclidean, Manhattan) between 2 observations.
b) the average of the k-distance and the distance (e.g., Euclidean, Manhattan) between 2 observations.
c) the minimum of the k-distance and the distance (e.g., Euclidean, Manhattan) between 2 observations.
d) none of the above.

A

a) the maximum of the k-distance and the distance (e.g., Euclidean, Manhattan) between 2 observations.

23
Q

Question 23
In the Local Outlier Factor method, a lower Local Reachability Density for a data point A implies that
a) the neighbors are far away from A.
b) the neighbors are close to A.

A

a) the neighbors are far away from A.

24
Q

Question 24
In the Local Outlier Factor method, an outlier has a LOF score of
a) bigger than 1.
b) around 1.
c) less than 1.

A

a) bigger than 1.

25
Q

Question 25
When using autoencoders for anomaly detection, the hidden layer has
a) more neurons than the input layer.
b) less neurons than the input layer.

A

b) less neurons than the input layer.

26
Q

Question 26
According to the research of Tiukhova et al (2022)., which of the following outlier detection techniques usually works best
a) Local Outlier Factor
b) Autoencoders
c) iForest
d) Deep Autoencoders

A

iForest