Data Sources, Collection & Validation Flashcards

Question 1

Q

What is the difference between primary and secondary data sources?

Answer

A

Primary data is first-hand information collected directly from sources (e.g., surveys, interviews), while secondary data is pre-existing data used for analysis (e.g., government records, books).

Question 2

Q

What are real-time data sources, and why are they useful?

Answer

A

Real-time data is collected and processed instantly, allowing immediate decision-making. Examples include stock market tracking, IoT monitoring, and traffic navigation.

Question 3

Q

What are three key considerations in data collection?

Answer

A

Accuracy (data correctness), reliability (consistency over time), and ethics (ensuring privacy and compliance).

Question 4

Q

What are three common data collection methods?

Answer

A

Surveys and questionnaires, interviews, sensors, transactions and web scraping.

Question 5

Q

What are the six stages of the data processing cycle?

Answer

A

Collection, preparation, input, processing, output, and storage.

Question 6

Q

How does batch processing differ from real-time processing?

Answer

A

Batch processing processes large amounts of data at scheduled times (e.g., payroll systems), while real-time processing occurs immediately as data is received (e.g., fraud detection).

Question 7

Q

What is data validation, and why is it important?

Answer

A

Data validation ensures data accuracy, completeness, and consistency before storage or processing, reducing errors and improving reliability.

Question 8

Q

What are three types of data validation?

Answer

A

Format validation (ensures correct data format), range validation (checks numerical values within limits), and presence check (ensures required fields are filled). Consistency check: confirming data matches related information, Uniqueness check: ensures values are distinct

Question 9

Q

What are common disadvantages of data processing?

Answer

A

Inaccurate data can lead to misleading results, data breaches pose security risks, and maintaining infrastructure can be costly.

Question 10

Q

How does cloud storage improve data collection and processing?

Answer

A

Cloud storage allows remote access, scalability, automatic backups, and integration with AI and big data tools for efficient processing.

Question 11

Q

What are the four main types of data storage, and how do they differ in purpose and functionality?

Answer

A

Databases – Used for storing structured data, allowing efficient organization and retrieval.

Data Warehouses – Large repositories that store vast datasets from multiple sources, primarily for data analysis.

Cloud Storage – Stores data on remote servers accessible via the internet, offering scalability and flexibility.

Local Storage – Stores data on physical devices like hard drives or solid-state drives, providing direct access.

Question 12

Q

What is the difference between online, distributed and cloud based processing:

Answer

A

Online processing is the interactive processing of data as it is input by users.
*
Distributed processing involves distributing data processing work across numerous computers.
*
Cloud based processing means using cloud based resources to process data.

Question 13

Q

What is the data validation process?

Answer

A

Analyzing Data – Understanding business requirements, choosing the right analysis technique, and processing results.

Sampling – Testing a small subset of data before validating the full dataset to save time and resources.

Validating Database – Ensuring database data is relevant by comparing source and target data fields.

Comparison – Handling incomplete data and verifying output accuracy against expected results.