Data Analytics and Data Science Flashcards

Question 1

Q

What is Data Analytics?

Answer

A

The process of inspecting, cleansing, transforming, and modeling data to extract useful information.

Question 2

Q

How does Data Mining differ from Data Analytics?

Answer

A

Data Mining focuses on discovering patterns in large datasets, while Data Analytics involves processing data to make informed decisions.

Question 3

Q

What is Data Science?

Answer

A

An interdisciplinary field that applies scientific methods, processes, and algorithms to extract knowledge from data.

Question 4

Q

What are some applications of Data Science and Analytics?

Answer

A

Healthcare (Medical imaging, genome analysis)

Finance (Fraud detection, risk analysis)
Retail (Product recommendation, logistics)
Scientific research (Black hole imaging, physics experiments)

Question 5

Q

What is the Data Analytics Pipeline?

Answer

A

Raw Data
Target Data
Processed Data
Transformed Data
Pattern Discovery
Knowledge Extraction

Question 6

Q

How is data stored?

Answer

A

Data can be stored in tabular form, where rows represent instances (objects) and columns represent features (attributes).

Question 7

Q

What are the different types of data values?

Answer

A

Nominal: Categories without order (e.g., country names).

Ordinal: Ordered categories (e.g., small, medium, large).
Interval: Ordered with meaningful differences but no true zero (e.g., temperature).
Ratio: Ordered with meaningful differences and a true zero (e.g., weight, height).

Question 8

Q

What is the difference between structured and unstructured data?

Answer

A

Structured Data: Pre-defined format (e.g., databases, Excel).

Unstructured Data: No pre-defined format (e.g., text, audio, images).

Question 9

Q

What are some examples of structured data?

Answer

A

Record Data (Fixed attributes per record)

Data Matrix (Table format with rows as objects and columns as attributes)
Transaction Data (Sets of related items, e.g., supermarket purchases)
Graph Data (Nodes and edges representing relationships, e.g., social networks)

Question 10

Q

What are some examples of unstructured data?

Answer

A

Documents (Emails, research papers)

Images (Medical scans, satellite photos)
Audio (Speech recordings, music)
Videos (Surveillance footage, movies)

Question 11

Q

What is Data Preprocessing?

Answer

A

The process of cleaning and transforming raw data to improve its quality before analysis.

Question 12

Q

What are common data preprocessing techniques?

Answer

A

Handling missing values (mean, carry-over, interpolation)

Attribute transformation (binarization, discretization)
Sampling (reducing dataset size while maintaining patterns)
Feature selection (removing irrelevant variables)
Dimensionality reduction (e.g., PCA)

Question 13

Q

What is Principal Component Analysis (PCA)?

Answer

A

A technique for reducing data dimensionality while preserving important variance.

Question 14

Q

What are the two primary types of Data Modelling?

Answer

A

Regression: Predicting continuous values (e.g., house prices).

Classification: Predicting categorical values (e.g., spam vs. non-spam emails).

Question 15

Q

What is Linear Regression?

Answer

A

A statistical method used to model relationships between a dependent variable and one or more independent variables.

Question 16

Q

What is Clustering?

Answer

A

A type of unsupervised learning that groups similar data points together.

Question 17

Q

What are common clustering techniques?

Answer

A

K-Means Clustering: Assigns data points to K clusters based on similarity.

DBSCAN (Density-Based Spatial Clustering): Identifies clusters of varying densities.