Data Analytics and Data Science Flashcards

1
Q

What is Data Analytics?

A

The process of inspecting, cleansing, transforming, and modeling data to extract useful information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Data Mining differ from Data Analytics?

A

Data Mining focuses on discovering patterns in large datasets, while Data Analytics involves processing data to make informed decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Data Science?

A

An interdisciplinary field that applies scientific methods, processes, and algorithms to extract knowledge from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some applications of Data Science and Analytics?

A

Healthcare (Medical imaging, genome analysis)

Finance (Fraud detection, risk analysis)
Retail (Product recommendation, logistics)
Scientific research (Black hole imaging, physics experiments)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Data Analytics Pipeline?

A

Raw Data
Target Data
Processed Data
Transformed Data
Pattern Discovery
Knowledge Extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is data stored?

A

Data can be stored in tabular form, where rows represent instances (objects) and columns represent features (attributes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the different types of data values?

A

Nominal: Categories without order (e.g., country names).

Ordinal: Ordered categories (e.g., small, medium, large).
Interval: Ordered with meaningful differences but no true zero (e.g., temperature).
Ratio: Ordered with meaningful differences and a true zero (e.g., weight, height).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between structured and unstructured data?

A

Structured Data: Pre-defined format (e.g., databases, Excel).

Unstructured Data: No pre-defined format (e.g., text, audio, images).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some examples of structured data?

A

Record Data (Fixed attributes per record)

Data Matrix (Table format with rows as objects and columns as attributes)
Transaction Data (Sets of related items, e.g., supermarket purchases)
Graph Data (Nodes and edges representing relationships, e.g., social networks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some examples of unstructured data?

A

Documents (Emails, research papers)

Images (Medical scans, satellite photos)
Audio (Speech recordings, music)
Videos (Surveillance footage, movies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Data Preprocessing?

A

The process of cleaning and transforming raw data to improve its quality before analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are common data preprocessing techniques?

A

Handling missing values (mean, carry-over, interpolation)

Attribute transformation (binarization, discretization)
Sampling (reducing dataset size while maintaining patterns)
Feature selection (removing irrelevant variables)
Dimensionality reduction (e.g., PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Principal Component Analysis (PCA)?

A

A technique for reducing data dimensionality while preserving important variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two primary types of Data Modelling?

A

Regression: Predicting continuous values (e.g., house prices).

Classification: Predicting categorical values (e.g., spam vs. non-spam emails).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Linear Regression?

A

A statistical method used to model relationships between a dependent variable and one or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Clustering?

A

A type of unsupervised learning that groups similar data points together.

17
Q

What are common clustering techniques?

A

K-Means Clustering: Assigns data points to K clusters based on similarity.

DBSCAN (Density-Based Spatial Clustering): Identifies clusters of varying densities.