1 - Tools of the Trade Flashcards

1
Q

What is Steve’s main task at Shu Money Financial?

A

To lead the development of a new strategy for prioritizing cases sent to the Recoveries Department.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens to a debt if a customer has not paid anything owed in the previous six months?

A

The debt is charged off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three tiers of account assignment in the Recoveries Department?

A
  • Tier 1: Accounts that have made a payment
  • Tier 2: Accounts that have never made a payment but had some contact
  • Tier 3: Accounts that have never made a payment and had no contact
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the primary goal for Steve in the Recoveries Department?

A

To increase the profitability of the Recoveries Department by improving work prioritization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a data-informed approach involve according to Steve?

A

Gathering appropriate data about customers and systematically analyzing it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the basic data workflow comprised of?

A
  • Data collection
  • Storage
  • Preparation
  • Exploration
  • Modeling (including experimentation and prediction)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the acronym ETL stand for in data processing?

A

Extract, Transform, Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of the data transformation step?

A

To ensure data quality by resolving inconsistencies, standardizing formatting, and removing duplicate records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: Data quality checks are unnecessary if the data is collected from reliable sources.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some key features of good data architectures?

A
  • Scalability
  • Availability
  • Security
  • Performance
  • Cost efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is one major advantage of cloud computing over local computing?

A

Data can be accessed from anywhere with internet access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: The process of loading transformed data can be done in a _______ load or in _______ loads.

A

full; incremental

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What was a significant disadvantage of local data storage mentioned in the text?

A

Data was vulnerable to physical disasters and required manual backups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does cloud computing help in terms of data security?

A

Cloud providers manage security and integrity of the systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What did Steve learn from his data analysis project at business school?

A

Cloud computing has major advantages over local storage solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between local computing and cloud computing as explained by Brett?

A

Local computing stores data on a personal device, while cloud computing stores data on remote servers accessible via the internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the primary responsibility of systems defenders?

A

Defending the integrity of systems

This includes protecting against ransomware and malware attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is ransomware?

A

A type of malicious software that blocks access to a computer system until a payment is made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is malware?

A

Software that disrupts, damages, or misuses a computer system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How has cloud computing improved data portability?

A

Allows data to remain in the country of interest while being accessible from other locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the key services delivered by cloud computing?

A
  • Data storage
  • Servers
  • Databases
  • Networking
  • Software
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does it mean that cloud computing is elastic?

A

Users can adjust the number of services they need without being locked into a specific amount.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the cost advantages of cloud computing?

A

Pay only for the storage and services actually used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What security risks are associated with cloud computing?

A

Accounts can be hacked and data stolen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a disadvantage of cloud storage?
Delays in transferring data and creating backups due to internet speeds.
26
What is vendor lock-in in cloud computing?
Difficulty in migrating from one cloud storage provider to another.
27
Which major cloud vendors are mentioned?
* Google Cloud * Amazon Web Services * Microsoft Azure
28
What are APIs?
Sets of rules that explain how computers or applications communicate with one another.
29
What is the difference between structured and unstructured data?
* Structured data: predefined values, stored in relational databases * Unstructured data: not captured in traditional databases, includes emails, chatbots, videos, etc.
30
What is a data lake?
A data storage mechanism that holds raw data in its native format until needed.
31
What is the significance of data quality in data science?
High data quality is crucial for accurate analysis and predictive modeling.
32
What does 'garbage in, garbage out' mean?
The quality of output is determined by the quality of the input data.
33
What is SQL?
Structured Query Language used for managing and querying relational databases.
34
What is the purpose of the SELECT command in SQL?
Defines what fields to extract from a database.
35
What are two popular programming languages for data science?
* Python * R
36
What is the role of libraries and packages in Python and R?
They provide built-in functions that facilitate programming tasks.
37
What must a data scientist understand about input data?
The nature of the data, including cleaning priorities and idiosyncrasies.
38
What is the importance of collaboration between data science customers and teams?
Ensures data preparation improves quality for later modeling.
39
Fill in the blank: The _______ is a data storage mechanism that holds raw data in its native format.
[data lake]
40
What is the role of libraries and packages in programming languages?
They include built-in functions that can be readily called in the programming language.
41
What is R commonly used for in data science?
Statistical analysis, modeling, analyzing spatial and time series data, classifying, and clustering.
42
Name two popular integrated development environments for R.
* RStudio * R Tools for Visual Studio
43
What is a key strength of Python?
It can be easily integrated with other programming languages.
44
Why have Python and R gained popularity over the past decade?
They are free and have large networks of developers.
45
Which programming language was once dominant for statistical programming?
SAS
46
List three major proprietary software options for data analysis.
* Matlab * STATA * SPSS
47
What can customers inquire about when using proprietary software?
The duration of the current license and plans for renewal.
48
True or False: Simple programs can be quickly rewritten in a new language like Python or R.
True
49
What is AutoML?
Automated machine learning solutions that can be used in point-and-click mode or hands-on programming.
50
What does feature engineering in AutoML involve?
Automatically exploring new potential input variables.
51
What is the primary function of code repositories in data science?
Allow multiple programmers to work on a project while tracking their contributions.
52
Name two code repository platforms.
* Github * Bitbucket
53
What is the importance of documenting code?
It helps others understand the purpose and functionality of the code.
54
What is a data product in data science?
The translation of raw data into a practical solution tailored to the end user's needs.
55
What is a data dictionary?
It identifies the fields in a database and how each field is defined.
56
What should a cleaned data set include for effective modeling?
Quality-checked and documented data, along with derived data fields.
57
What are predictive models used for in data products?
Estimating probabilities and expected outcomes relevant to business decisions.
58
What is the highest level of data product?
An automated decision-making tool.
59
Fill in the blank: The data science team is responsible for providing information for _______.
[business decisions]
60
What should a customer ask about data storage options?
The advantages and disadvantages of the current data storage system.
61
What is the significance of data quality checks?
They ensure the accuracy and reliability of data used in analysis.
62
What should be included in data quality checks?
* Frequency of missing data * Patterns of missing data * Imputation methods for missing values
63
What coding languages are often used in data science?
Python and R, among others.
64
What is an important consideration regarding data products?
The level of data product appropriate for the business's IT capacity and human resource skill set.
65
What does an automated decision-making algorithm do?
Identifies priorities and handles decision-making processes automatically.