Intro to DSA Flashcards
Data Science and Analytics
New techniques to solve problems in a VUCA (Volatility, Uncertainty, Complexity, Ambiguity) world through data-driven approaches.
Data Science is a _ field using _ methods, _, and _ to gain _ from structured and unstructured data.
multidisciplinary, scientific, algorithms, and systems, insights
_ _ is the basis of _ for performing _ and discovering _.
Mathematics knowledge, algorithms, analysis, insights
_ and _ skills are essential for data _, _, and _ using languages like Python, _, and _.
Programming, coding, extraction, transformation, storage, R, SQL
_ _ is the deep _ of an industry’s _, opportunities, _, and _ to apply data science effectively.
Domain expertise, knowledge, challenges, risks, methods
Accessing data from _ sources in _ formats, applying solutions _ _ fields. It is _ _.
various, different, across multiple; domain agnostic
Data science combines skills from _ disciplines like _, _, and _ knowledge. It is _.
multiple, math, programming, domain; multidisciplinary
Data science requires _ across _ and _ areas to achieve _ solutions. It is a _ _.
collaboration, roles, expertise, effective; team sport
What is Big Data?
Massive, complex datasets that traditional software can’t handle.
5 V’s of Big Data
Variety, Volume, Velocity, Veracity, Value (bonus: Variability).
Variety
The wide range of data types (texts, videos, etc.).
Volume
The large size of data sets.
Velocity
The speed at which data is generated and processed.
Veracity
Trustworthiness and protection of data.
Value
The benefit derived from extracting and transforming data.
Visualization
The process of interpreting patterns and trends in the data.
Map Reduce and Parallel Computing
Big data processing technique that splits data into chunks and processes them in parallel, then aggregates the results.
Qualitative Data
Nominal (categories) and ordinal (ranks).
Quantitative Data
Discrete (countable) and continuous (measurable).
Structured Data
Data in a standardized format, easy to search and organize (e.g., SQL, Excel).
Unstructured Data
Data with no predefined structure, often text-heavy (e.g., PDFs, MP3, MP4).
Semi-structured Data
Contains tags or markers, not strictly organized (e.g., XML, JSON).
4th Industrial Revolution
The rise of cyber-physical systems and AI-driven technologies like IoT and AI Assistants.
Blockchain
A shared, immutable ledger used to record transactions and track assets.
Data Science Pipeline
Data Collection → Data Preparation → Data Visualization → Data Analysis → Data Storytelling.
What are the 4 areas of math Data Science utilizes?
Linear algebra, calculus, probability, and statistics.
It is used to address previously unsolvable problems.
Big Data.