Bulk API Flashcards
What is the max number of records that can be processed by Batch Apex
50 million
How many GB of data can Bulk queries retrieved (and how are they divided up?)
Can retrieve up to 15GB of data, divided into 15 1GB files
In which scenario would Batch Apex not work well?
Anything Synchronous like a VF page that needs to query more than 50,000 records
What two operations does Bulk API support?
Query and queryAll
What does the QueryAll operation do?
- Returns records that have been deleted because of a merge or delete
- Returns information about archived Task and Event records
What time limit is there on executing bulk API queries, and what error is thrown?
2 minutes and it fails with QUERY_TIMEOUT
For Bulk API, what happens when the results exceeds a 1GB file size (or takes longer than 10 minutes?)
The completed results are cached and another attempt is made.
How many attempts are made for Bulk API when they timeout (or the file size is greater than 1GB), and what type of error is thrown
- After that it fails with Retried more than 15 times.
How long are Bulk API results stored?
7 days
Which API would be good to use when loading a few thousand to millions of records
Bulk API
On which principle is Bulk API based on?
REST
What are the benefits for using Bulk API?
- Developed to simplify and optimize the process of loading or deleting large data sets
- Super-fast processing speeds
- Reduced client-side programmatic language
- Easy-to-monitor job status
- Automatic retry of failed records
- Support for parallel processing
- Minimal roundout trips to Force.com
- Minimal API calls
- Limited dropped connections
- Easy-to-tune batch size
What is the default chunk size for Bulk API?
100,000 record chunks by default
How can you configure the chunk size for Bulk API?
Use chunkSize header to configure smaller chunks or larger ones up to 250,000
What is the maximum chunk size for Bulk API?
250,000
What is the file size limit for Bulk API
10 MB
What is the limit on the number of records that can be processed by the Bulk API?
10,000 records
What is the maximum character data limit for all the data in a batch when using Bulk API?
10 Million characters data
What is the character field Max limit for Bulk API?
32,000 characters
What is the limit for fields per record for Bulk API?
5,000 fields
What is the limit for all characters per record for Bulk API?
400,000 characters per record
What is the max size of a file that can be loaded using Bulk API?
10MB
For binary content, what is the max zip file size when using Bulk API?
10MB
For binary content, what is the max total size of the unzipped content when using Bulk API?
The total size of the unzipped content can’t exceed 20MB
What is the degree of parallelism
The amount of work completed (as a duration) divided by the actual amount of time it took to complete that work
How does the Bulk API work? (3 main steps)
- Data streamed in large batches directly to temporary storage over a simple HTTP connection
(The client creates the job, send all data to server in batches, check status and at the end retrieve the result) - Data set is managed in a job that can be monitored and controlled (aborted) from Admin Setup (The jobs are then split into multiple data batches that will return multiple results)
- Data set can be processed faster by allocating multiple servers to process in parallel (The Processing servers dequeue batch from job, insert or update records, save the results back to the job)
Which of these is true about bulk queries?
- Bulk API can access or query compound address and compound geolocation fields
- Bulk queries always time out when querying more than 100,000 records
- A bulk query can retrieve up to 15GB data divided into
1GB files - In order to keep results lean, bulk query does not support queryAll operations
A bulk query can retrieve up to 15GB data divided into
1GB files
Which of these is one of the advantages of using Bulk API when uploading large volumes of data?
- Bulk API loads data in bite-size chunks, increasing the speed of jobs
- Bulk API is optimized for real-time client applications
- Bulk API only allows batches to be processed serially
- Bulk API moves the functionality and work from your client application to the server
Bulk API moves the functionality and work from your client application to the server
Which of the following is true: The hard delete function in the Bulk API:
- Is disabled by default
- Allows deleted records to stay in the Recycle Bin for 15 days
- Can be used only when deleting fewer than 10,000 records
- Is not a recommended strategy for deleting large data volumes
The hard delete function in the Bulk API is disabled by default
What is enabled for the Bulk API by default?
Parallel Mode
Describe Parallel Mode within the Bulk API
It is enabled by default. It allows for faster loading of data by processing batches in parallel
What are the trade-offs with respect to Parallel Mode?
There is a risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions
When should you use Parallel Mode versus Serial Mode?
Whenever possible, as it is a best practice.
When should you use Serial Mode versus Parallel Mode?
When there is a risk of lock contention and you cannot reorganize the batches to avoid these locks.
How can you organize data load batches to avoid risks of lock contention?
By organizing the data by parent Id.
Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember Records referencing the same Account Id are in the same batch
What does the Bulk API do when it encounters locks?
- Waits a few seconds for the lock to be released
- If lock is not released, the records is marked as failed
- If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later.
- When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed
- The batch will be tried again up to 10 times before the batch is marked as failed
- As some records have succeeded, you should check the results of the data load to confirm success/error details
With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing
10 minutes.
With respect to data loads, how can you optimize batch sizes?
All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.