Data Science for Business Leaders Flashcards
https://www.datacamp.com/courses/data-science-for-business-leaders
What is data science?
Data science is a set of methodologies for taking in thousands of forms of data that are available to us today, and using them to draw meaningful conclusions.
What can data do?
- Describe the current state of an organization or process
- Detect anomalous events
- Diagnose the causes of events and behaviors
What are the three steps of the data science workflow?
- Data collection
- Exploration and visualization
- Experimentation and prediction
What do we need for machine learning?
- A well-defined question
- A set of example data
- A new set of data to use our algorithm on
What are some applications of data science?
Fraud detection, IoT, image recognition…
What are some common jobs in a data science team?
Data engineer, data analyst, machine learning scientist…
What are the responsibilities of data engineers?
- Information architects: control the flow of information
- Build the storage solutions and infrastructure
- Maintain data access: ensure the data is easy to access and process
What tools do data engineers use?
- SQL, to store and manage big data
- Java, Scala or Python to process data and automate data related tasks
What are the responsibilities of data analysts?
- Create dashboards
- Hypothesis testing
- Data visualization
What tools do data analysts use?
- Spreadhseets for simple storage and analysis
- SQL for large scale analysis
- BI Tools (Tableau, Power BI, Looker) for dashboarding and sharing information
What are the responsibilities of a machine learning scientist?
- Make predictions and extrapolations
- Classify data
- Predict stock prices
- Process images
- Automate text analysis
What tools do machine learning scientists use?
- Python or R for creating predictive models
What are three types of team structures for a data science team?
- isolated
- embedded
- hybrid
What are the characteristics of an isolated data science team?
An isolated data science team contains one or mutiple types of data employees, without engineering or product members.
What are the characteristics of an isolated data science team?
Each data employee is part of a squad containing engineers and product managers.
What are the characteristics of a hybrid data science team?
The hybrid structure is similar to the embedded structure, but includes an additional sync for all data employees across all squads, allowing uniform data processes.
What are some common sources of data?
- Web events
- Customer data
- Logistics data
- Customer transactions
What does PII mean?
Personally Identifiable Information
What information does PII include?
- Name
- Locatio
- Email address
- Any other piece of information that can be used to tie a web event back to a real human