Data Analytics and Data Science Flashcards
What is Data Analytics?
The process of inspecting, cleansing, transforming, and modeling data to extract useful information.
How does Data Mining differ from Data Analytics?
Data Mining focuses on discovering patterns in large datasets, while Data Analytics involves processing data to make informed decisions.
What is Data Science?
An interdisciplinary field that applies scientific methods, processes, and algorithms to extract knowledge from data.
What are some applications of Data Science and Analytics?
Healthcare (Medical imaging, genome analysis)
Finance (Fraud detection, risk analysis)
Retail (Product recommendation, logistics)
Scientific research (Black hole imaging, physics experiments)
What is the Data Analytics Pipeline?
Raw Data
Target Data
Processed Data
Transformed Data
Pattern Discovery
Knowledge Extraction
How is data stored?
Data can be stored in tabular form, where rows represent instances (objects) and columns represent features (attributes).
What are the different types of data values?
Nominal: Categories without order (e.g., country names).
Ordinal: Ordered categories (e.g., small, medium, large).
Interval: Ordered with meaningful differences but no true zero (e.g., temperature).
Ratio: Ordered with meaningful differences and a true zero (e.g., weight, height).
What is the difference between structured and unstructured data?
Structured Data: Pre-defined format (e.g., databases, Excel).
Unstructured Data: No pre-defined format (e.g., text, audio, images).
What are some examples of structured data?
Record Data (Fixed attributes per record)
Data Matrix (Table format with rows as objects and columns as attributes)
Transaction Data (Sets of related items, e.g., supermarket purchases)
Graph Data (Nodes and edges representing relationships, e.g., social networks)
What are some examples of unstructured data?
Documents (Emails, research papers)
Images (Medical scans, satellite photos)
Audio (Speech recordings, music)
Videos (Surveillance footage, movies)
What is Data Preprocessing?
The process of cleaning and transforming raw data to improve its quality before analysis.
What are common data preprocessing techniques?
Handling missing values (mean, carry-over, interpolation)
Attribute transformation (binarization, discretization)
Sampling (reducing dataset size while maintaining patterns)
Feature selection (removing irrelevant variables)
Dimensionality reduction (e.g., PCA)
What is Principal Component Analysis (PCA)?
A technique for reducing data dimensionality while preserving important variance.
What are the two primary types of Data Modelling?
Regression: Predicting continuous values (e.g., house prices).
Classification: Predicting categorical values (e.g., spam vs. non-spam emails).
What is Linear Regression?
A statistical method used to model relationships between a dependent variable and one or more independent variables.
What is Clustering?
A type of unsupervised learning that groups similar data points together.
What are common clustering techniques?
K-Means Clustering: Assigns data points to K clusters based on similarity.
DBSCAN (Density-Based Spatial Clustering): Identifies clusters of varying densities.