DataCamp: Data Science for Business Flashcards
What is data science
It’s a set of methodologies used to gather thousands of forms of data available and draw meaningful conclusions from the data.
What can data do ?
describe current state of an organisation or process
detect anomalous events
diagnose the cause of events and behaviours
predict future events
Data Science Workflow
Data Collection
Exploration and Visualization
Experimentation and Prediction
Applications of Data Science:
Traditional Machine Learning
Internet of Things - Refers to gadgets that are not standard computers but have ability to transmit data such as smart watches and home appliances.
Image classification using Deep Learning
Data Pseudonymization
Anonymization of data to protect privacy.
PII - Personally Identifiable Information
Data that can be linked to an individual
Data Sources
Web events, logistics data, Financial transactions, customer data
Solicited Data (SD)
Obtained by requesting opinion from customers such as surveys, in-app questionnaires, focus groups and customer reviews. SD are used to: 1. De-risk decision making 2. Monitor quality and 3. Create marketing collateral.
Types of Solicited Data
Qualitative (subjective): conversations, open-ended questions. Help to generate hypothesis.
Quantitative: multiple choice, rating scale. Used to validate hypothesis.
Net Promoter Score
Quantitative method of measuring stated preference
Other data sources
APIs (a way of requesting data from 3rd parties over the internet), such as Twitter, Wikipedia, Google Maps, Yahoo Finance
Public records
Mechanical Turks - Manual data input by humans
Unstructured data
such as emails, text, video and audio files, web pages and social media are stored in document databases. Use NoSQL for data retrieval
Tabular data
Relational database. Use SQL for data retrieval.
Dashboards
A dshboard is a set of metrics, usually in the form of graphs, that update on a pre-defined frequency such as daily, weekly, real-time etc. They help to visualize and explore collected data. Examples of dashboards include time series (which tracks a value over time)
Ad hoc request
Requests for data that does not need to be repeated on a weekly or daily basis. Such request should be specific, include context and priority.