Bulk API Flashcards by Gwynn Bleidd

What is the max number of records that can be processed by Batch Apex

50 million

How well did you know this?

Not at all

Perfectly

How many GB of data can Bulk queries retrieved (and how are they divided up?)

Can retrieve up to 15GB of data, divided into 15 1GB files

How well did you know this?

Not at all

Perfectly

In which scenario would Batch Apex not work well?

Anything Synchronous like a VF page that needs to query more than 50,000 records

How well did you know this?

Not at all

Perfectly

What two operations does Bulk API support?

Query and queryAll

How well did you know this?

Not at all

Perfectly

What does the QueryAll operation do?

Returns records that have been deleted because of a merge or delete
Returns information about archived Task and Event records

How well did you know this?

Not at all

Perfectly

What time limit is there on executing bulk API queries, and what error is thrown?

2 minutes and it fails with QUERY_TIMEOUT

How well did you know this?

Not at all

Perfectly

For Bulk API, what happens when the results exceeds a 1GB file size (or takes longer than 10 minutes?)

The completed results are cached and another attempt is made.

How well did you know this?

Not at all

Perfectly

How many attempts are made for Bulk API when they timeout (or the file size is greater than 1GB), and what type of error is thrown

After that it fails with Retried more than 15 times.

How well did you know this?

Not at all

Perfectly

How long are Bulk API results stored?

7 days

How well did you know this?

Not at all

Perfectly

Which API would be good to use when loading a few thousand to millions of records

Bulk API

How well did you know this?

Not at all

Perfectly

On which principle is Bulk API based on?

REST

How well did you know this?

Not at all

Perfectly

What are the benefits for using Bulk API?

Developed to simplify and optimize the process of loading or deleting large data sets
Super-fast processing speeds
Reduced client-side programmatic language
Easy-to-monitor job status
Automatic retry of failed records
Support for parallel processing
Minimal roundout trips to Force.com
Minimal API calls
Limited dropped connections
Easy-to-tune batch size

How well did you know this?

Not at all

Perfectly

What is the default chunk size for Bulk API?

100,000 record chunks by default

How well did you know this?

Not at all

Perfectly

How can you configure the chunk size for Bulk API?

Use chunkSize header to configure smaller chunks or larger ones up to 250,000

How well did you know this?

Not at all

Perfectly

What is the maximum chunk size for Bulk API?

250,000

How well did you know this?

Not at all

Perfectly

What is the file size limit for Bulk API

Study These Flashcards

10 MB

What is the limit on the number of records that can be processed by the Bulk API?

Study These Flashcards

10,000 records

What is the maximum character data limit for all the data in a batch when using Bulk API?

Study These Flashcards

10 Million characters data

What is the character field Max limit for Bulk API?

Study These Flashcards

32,000 characters

What is the limit for fields per record for Bulk API?

Study These Flashcards

5,000 fields

What is the limit for all characters per record for Bulk API?

Study These Flashcards

400,000 characters per record

What is the max size of a file that can be loaded using Bulk API?

Study These Flashcards

10MB

For binary content, what is the max zip file size when using Bulk API?

Study These Flashcards

10MB

For binary content, what is the max total size of the unzipped content when using Bulk API?

Study These Flashcards

The total size of the unzipped content can’t exceed 20MB

What is the degree of parallelism

The amount of work completed (as a duration) divided by the actual amount of time it took to complete that work

How does the Bulk API work? (3 main steps)

1. Data streamed in large batches directly to temporary storage over a simple HTTP connection (The client creates the job, send all data to server in batches, check status and at the end retrieve the result) 2. Data set is managed in a job that can be monitored and controlled (aborted) from Admin Setup (The jobs are then split into multiple data batches that will return multiple results) 3. Data set can be processed faster by allocating multiple servers to process in parallel (The Processing servers dequeue batch from job, insert or update records, save the results back to the job)

Which of these is true about bulk queries? - Bulk API can access or query compound address and compound geolocation fields - Bulk queries always time out when querying more than 100,000 records - A bulk query can retrieve up to 15GB data divided into 1GB files - In order to keep results lean, bulk query does not support queryAll operations

A bulk query can retrieve up to 15GB data divided into | 1GB files

Which of these is one of the advantages of using Bulk API when uploading large volumes of data? - Bulk API loads data in bite-size chunks, increasing the speed of jobs - Bulk API is optimized for real-time client applications - Bulk API only allows batches to be processed serially - Bulk API moves the functionality and work from your client application to the server

Bulk API moves the functionality and work from your client application to the server

Which of the following is true: The hard delete function in the Bulk API: - Is disabled by default - Allows deleted records to stay in the Recycle Bin for 15 days - Can be used only when deleting fewer than 10,000 records - Is not a recommended strategy for deleting large data volumes

The hard delete function in the Bulk API is disabled by default

What is enabled for the Bulk API by default?

Parallel Mode

Describe Parallel Mode within the Bulk API

It is enabled by default. It allows for faster loading of data by processing batches in parallel

What are the trade-offs with respect to Parallel Mode?

There is a risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions

When should you use Parallel Mode versus Serial Mode?

Whenever possible, as it is a best practice.

When should you use Serial Mode versus Parallel Mode?

When there is a risk of lock contention and you cannot reorganize the batches to avoid these locks.

How can you organize data load batches to avoid risks of lock contention?

By organizing the data by parent Id. Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember Records referencing the same Account Id are in the same batch

What does the Bulk API do when it encounters locks?

1. Waits a few seconds for the lock to be released 2. If lock is not released, the records is marked as failed 3. If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later. 4. When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed 5. The batch will be tried again up to 10 times before the batch is marked as failed 6. As some records have succeeded, you should check the results of the data load to confirm success/error details

With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing

10 minutes.

With respect to data loads, how can you optimize batch sizes?

All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.

Bulk API Flashcards

(38 cards)