Data Science for Business Leaders Flashcards

https://www.datacamp.com/courses/data-science-for-business-leaders

1
Q

What is data science?

A

Data science is a set of methodologies for taking in thousands of forms of data that are available to us today, and using them to draw meaningful conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can data do?

A
  • Describe the current state of an organization or process
  • Detect anomalous events
  • Diagnose the causes of events and behaviors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three steps of the data science workflow?

A
  • Data collection
  • Exploration and visualization
  • Experimentation and prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we need for machine learning?

A
  • A well-defined question
  • A set of example data
  • A new set of data to use our algorithm on
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some applications of data science?

A

Fraud detection, IoT, image recognition…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some common jobs in a data science team?

A

Data engineer, data analyst, machine learning scientist…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the responsibilities of data engineers?

A
  • Information architects: control the flow of information
  • Build the storage solutions and infrastructure
  • Maintain data access: ensure the data is easy to access and process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What tools do data engineers use?

A
  • SQL, to store and manage big data
  • Java, Scala or Python to process data and automate data related tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the responsibilities of data analysts?

A
  • Create dashboards
  • Hypothesis testing
  • Data visualization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What tools do data analysts use?

A
  • Spreadhseets for simple storage and analysis
  • SQL for large scale analysis
  • BI Tools (Tableau, Power BI, Looker) for dashboarding and sharing information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the responsibilities of a machine learning scientist?

A
  • Make predictions and extrapolations
  • Classify data
  • Predict stock prices
  • Process images
  • Automate text analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What tools do machine learning scientists use?

A
  • Python or R for creating predictive models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are three types of team structures for a data science team?

A
  • isolated
  • embedded
  • hybrid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the characteristics of an isolated data science team?

A

An isolated data science team contains one or mutiple types of data employees, without engineering or product members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the characteristics of an isolated data science team?

A

Each data employee is part of a squad containing engineers and product managers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the characteristics of a hybrid data science team?

A

The hybrid structure is similar to the embedded structure, but includes an additional sync for all data employees across all squads, allowing uniform data processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some common sources of data?

A
  • Web events
  • Customer data
  • Logistics data
  • Customer transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does PII mean?

A

Personally Identifiable Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What information does PII include?

A
  • Name
  • Locatio
  • Email address
  • Any other piece of information that can be used to tie a web event back to a real human
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is data pseudonymization?

A

Assign a user a user ID, and store that information in a separate table with restricted access and regular logs audit. Events are then identified by the user ID rather than the user name.

21
Q

What is data anonymization?

A

Assigning the user a user ID, then destroying the table with the actual user names.

22
Q

What does GDPR mean?

A

General Data Protection Regulation

23
Q

What does GDPR consist in?

A
  • Applies to all data inside the EU
  • Give individuals control over their personal data
  • Regulates how long data can be stored
  • Mandates appropriate anonymization
  • Disclose data collection and gain consent
24
Q

What is solicited data?

A

Solicited data is data gathered when asking customers about their opinion.

25
Q

What is solicited data useful for?

A
  • Create marketing collateral
  • Attenuate decision making risk
  • Monitor quality
26
Q

What are some common types of solicited data?

A
  • Surveys
  • Customer reviews
  • In-app questionnaires
  • Focus groups
27
Q

What does NPS stand for?

A

Net Promoter Score

28
Q

What does the NPS measure?

A

The Net Promotioner Score measures how likely users are to recommend a product.

29
Q

What are the types of soliciated data?

A
  • Qualitative (very subjective, requires a lot of analysis)
    • Conversations
    • Open-ended questions
    • Good for generating hypotheses
  • Quantitative (can be easily summarized in a graph)
    • Multiple choice
    • Rating scale
    • Good for validating hypotheses
30
Q

What are the two types of preferences?

A

Sated and revelaed

31
Q

What is a stated preference?

A

A stated preference qualifies what a user says they want or believe.

32
Q

What is a revealed preference?

A

A revealed preference qualifies a preference made visible by a user’s action or purchasing decision.

33
Q

What is an example where stated and revealed preference differ?

A

People will state they prefer to go to the gym and exercise, but their behavior reveals that they prefer to go to the beach and relax.

34
Q

What are some best practices when soliciting data?

A
  • Be specific
  • Avoid loaded language
  • Calibrate (compare to known quantities)
  • Require actionable results (have a hypothesis for each question)
35
Q

What are some common ways to collect external data?

A
  • APIs
  • Public records
  • Mechanical turk
36
Q

What does API stand for?

A

Application Programming Interface

37
Q

What are some notable APIs?

A
  • Twitter
  • Wikipedia
  • Yahoo Finance
  • Google Maps
38
Q

What are some notable public records?

A
  • data.gov
  • data.europa.eu
39
Q

What does mechanical turk consist in?

A

Mechanical turk consists in getting humans to complete a task we eventually plan on computerizing (labeling pictures for image recognition). Several people will qualify a few images, with the same image being qualified by several people to ensure qualification quality. AWS M Turk can be used to recruit such people.

40
Q

What can mechanical turk be used for?

A
  • Label customer reviews
  • Extract text from a form
  • Highlight key words in a sentence
41
Q

What are some types of data storage?

A
  • Unstructured (document database)
  • Tabular (relational database)
42
Q

What are some examples of unstructured data?

A
  • Email
  • Text
  • Video and audio files
  • Web pages
  • Social media
43
Q

What query language do document databases use?

A

NoSQL

44
Q

What query language do relational databases use?

A

SQL

45
Q

What is a dashboard?

A

A dash board is a set of metrics, usually in the form of graphd, that update on schedule.

46
Q

What are some common dashboard elements?

A
  • Tracking a value over time
  • Tracking composition over time
  • Categorical comparison
  • Highlighting a single number
  • Displaying text
47
Q

Where can you build dashboards?

A
  • Spreadsheets: Excel or Google Sheets
  • BI Tools: Power BI, Looker
  • Customized tools: R Shiny or d3.js
48
Q

What is an ad hoc analysis request?

A
  • Not repeated on a weekly or daily basis
  • Can come from many places
    • Product
    • Finance
    • Engineering
49
Q
A