Explore Core Data Concepts Flashcards

Question

What is a transaction?

Answer 1

A sequence of operations that are atomic. This means all operations in the sequence must be completed successfully. If something goes wrong, all operations run so far in the sequence must be undone Each transaction has a defined beginning point, followed by steps to modify the data within the database. At the end, the database either commits the changes to make them permanent or rolls back the changes to the starting point. Bank transfers are a good example: you deduct funds from one account and credit the equivalent funds to anotehr account/ If the system fails after decducting the funds, they must be reinstated in the original account.

Answer 2

A transactional database must adhere to the ACID (Atomicity, Consistency, Isolation, Durability) properties to ensure that the database remains consistent while processing transactions.

Answer 3

Atomicity: guarantees that each transaction is treated as a single unit, which either succeed completely or fails completely. If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes

Answer 4

Ensurs that a transaction can only take the data in the database from one valid state to another. A consistent database should never lose or create data in a manner that can't be accounted for. For ex. if you add funds to an account, there must be a corresponding deduction of funds somewhere, or a record that describes where the funds have come from if they have been received externally. You can't suddenly create or lose money

Answer 5

Ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially A concurrent process can't see the data in an inconsistent state (ie. funds have been deducted from one account but not yet credited to another)

Answer 6

Guarantees that once a transaction has been committed it will remain committed even if there is a system failure such as a power outage or crash

Answer 7

Need to manage concurrent users possibly attempting to access and modify the same data at the same time, processing the transactions in isolation while keeping the database consistent and recoverable

Answer 8

Apply locks to data when it is updated. Lock prevents another process from reading the data until the lock is released. Lock is only released when the transaction commits or rolls back. Extensive locking can lead to poor performance while applications wait for locks to be released.

Answer 9

A distributed database is a database in which data is stored across different physical locations. It may be held in multiple computers located in the same physical location (for example, a datacenter), or may be dispersed over a network of interconnected computers.

Answer 10

When compared to non-distributed database systems, any data update to a distributed database will take time to apply across multiple locations. If you require transactional consistency in this scenario, locks may be retained for a very long time, especially if there's a network failure between databases at a critical point in time.

Answer 11

Relax the strict isolation requirements of transactions and implement "eventual consistency."

Answer 12

As an application writes data, each change is recorded by one server and then propagated to the other servers in the distributed database system asynchronously. Can lead to temporary inconsistencies in the data. Ideal where the application doesn't require any ordering guarantees. Examples include counts of shares, likes, or non-threaded comments in a social media system.

Answer 13

Typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels.

Answer 14

Analytical workloads are used for data analysis and decision making. Analytics are generated by aggregating the facts presented by the raw data into summaries, trends, and other kinds of “Business information.” Analytics can be based on a snapshot of the data at a given point in time, or a series of snapshots. Decision makers usually don't require all the details of every transaction. They want the bigger picture.

Answer 15

Transactional information, however, is an integral part of analytical information. If you don't have good records of daily sales, you can't compile a useful report to identify trends.

Answer 16

Processing data as it arrives

Answer 17

Buffering and processing the data in groups

Answer 18

**Advantages of batch processing include:** * Large volumes of data can be processed at a convenient time. * It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours. **Disadvantages of batch processing include:** * The time delay between ingesting the data and getting the results. * All of a batch job's input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt. The input data must be carefully checked before the job can be run again. Even minor data errors, such as typographical errors in dates, can prevent a batch job from running.

Answer 19

Moving data to a data analysis system where the data is not real-time.

Answer 20

When new, dynamic data is generated on a continual basis Ideal for time-critical operations that require an instant real-time response. For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.

Answer 21

Data Scope: 1. Batch processing can process all the data in the dataset. 2. Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example). Data Size: 1. Batch processing is suitable for handling large datasets efficiently. 2. Stream processing is intended for individual records or micro batches consisting of few records. Performance: 1. The latency for batch processing is typically a few hours. 2. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds. Latency is the time taken for the data to be received and processed. Analysis: 1. Typically use batch processing for performing complex analytics. 2. Stream processing used for simple response functions, aggregates, or calculations such as rolling averages.

Answer 22

Manage databases, assign permissions to users, store backup copies of data and restore data in case of any failures 1. Responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions 2. Responsible for the overall availability and consistent performance and optimizations of the database solutions 3. Work with stakeholders to implement policies, tools, and processes for backup and recovery plans to recover following a natural disaster or human-made error 4. Responsible for managing the security of the data in the database, granting privileges over the data, granting or denying access to users as appropriate.

Answer 23

Applying data cleaning routines, identifying business rules, and turn data into useful information. 1. Collaborate with stakeholders to design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads. * Use a wide range of data platform technologies, including relational and nonrelational databases, file stores, and data streams. 2. Ensure privacy of data 3. Manage and monitor data stores and pipelines to ensure data loads perform as expected.

Answer 24

Explore and analyze data to create visualizations and charts to enable organizations to make informed decisions. * Design and build scalable models * Clean and transform data * Enabling advanced analytics capabilities through reports and visualizations.

Explore Core Data Concepts Flashcards

(49 cards)