Data Sources, Collection & Validation Flashcards

1
Q

What is the difference between primary and secondary data sources?

A

Primary data is first-hand information collected directly from sources (e.g., surveys, interviews), while secondary data is pre-existing data used for analysis (e.g., government records, books).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are real-time data sources, and why are they useful?

A

Real-time data is collected and processed instantly, allowing immediate decision-making. Examples include stock market tracking, IoT monitoring, and traffic navigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are three key considerations in data collection?

A

Accuracy (data correctness), reliability (consistency over time), and ethics (ensuring privacy and compliance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three common data collection methods?

A

Surveys and questionnaires, interviews, sensors, transactions and web scraping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the six stages of the data processing cycle?

A

Collection, preparation, input, processing, output, and storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does batch processing differ from real-time processing?

A

Batch processing processes large amounts of data at scheduled times (e.g., payroll systems), while real-time processing occurs immediately as data is received (e.g., fraud detection).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data validation, and why is it important?

A

Data validation ensures data accuracy, completeness, and consistency before storage or processing, reducing errors and improving reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are three types of data validation?

A

Format validation (ensures correct data format), range validation (checks numerical values within limits), and presence check (ensures required fields are filled). Consistency check: confirming data matches related information, Uniqueness check: ensures values are distinct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are common disadvantages of data processing?

A

Inaccurate data can lead to misleading results, data breaches pose security risks, and maintaining infrastructure can be costly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does cloud storage improve data collection and processing?

A

Cloud storage allows remote access, scalability, automatic backups, and integration with AI and big data tools for efficient processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the four main types of data storage, and how do they differ in purpose and functionality?

A

Databases – Used for storing structured data, allowing efficient organization and retrieval.

Data Warehouses – Large repositories that store vast datasets from multiple sources, primarily for data analysis.

Cloud Storage – Stores data on remote servers accessible via the internet, offering scalability and flexibility.

Local Storage – Stores data on physical devices like hard drives or solid-state drives, providing direct access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between online, distributed and cloud based processing:

A

Online processing is the interactive processing of data as it is input by users.
*
Distributed processing involves distributing data processing work across numerous computers.
*
Cloud based processing means using cloud based resources to process data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the data validation process?

A

Analyzing Data – Understanding business requirements, choosing the right analysis technique, and processing results.

Sampling – Testing a small subset of data before validating the full dataset to save time and resources.

Validating Database – Ensuring database data is relevant by comparing source and target data fields.

Comparison – Handling incomplete data and verifying output accuracy against expected results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly