Quizlet Cards Flashcards
What is multitenancy?
Means of providing a single application to multiple organizations from a single software stack
What does Salesforce do when providing its CRM to a new customer?
Instead of providing a complete set of hardware/software resources to an organization, Salesforce inserts a layer of software between the single instance and other customer deployments. This layer is invisible to the organizations, which only see their own data and schemas while Salesforce reorganizes the data behind the scenes to perform efficient operations.
How does Salesforce ensure that tenant-specific customizations do not breach the security of other tenants or effect their performance?
Salesforce uses a runtime engine that generates application components for each organization using the customers metadata
How does Salesforce store the application data for each organization?
In a few large database tables that are partitioned by tenant and serve as heap storage. The platform’s runtime engine then materializes virtual tables based on the customer’s metadata.
What are the two side-effects of the way that Salesforce stores customer application data?
1.) Traditional performance-tuning techniques will show little to no results.
2.) You cannot optimize the underlying SQL of the application because it is generated by the system, and not written by each tenant.
How long might it take before the text in a searchable object’s created or updated record is searchable?
15 minutes or more
In what order does Salesforce perform indexed searches?
1.) Searches the indexes for appropriate records
2.) Narrows down the results by access permissions, search limits, and other filters, creating a result set
3.) Once a result set reaches a predetermined size, all other records are discarded
4.) Finally, the result set is used to query the records from the database to retrieve the fields that a user sees
What are the main areas of the application that are impacted by differing architectures in implementations with large data volumes?
1.) The loading or updating of large numbers of records, either directly or with integrations.
2.) Extracting records using reports, list views, or queries.
What is the Force.com query optimizer?
The Force.com Query Optimizer works behind the scenes to determine the best path to the data being requested based on the filters in the query. It will determine the best index from which to drive the query, the best table from which to drive the query if no good index is available, and more.
What is PK Chunking?
PK Chunking (or Primary Key Chunking) is a strategy for querying large data sets. PK Chunking is a feature of the Bulk API that splits a query into chunks of records with sequential record Ids (i.e. the Primary Keys). Ids are always indexed, so this is an efficient method for querying large data sets.
When would you use PK Chunking?
When you need to query or extract 10s or 100s of millions of records, for example, when you need to initially query an entire data set to setup a replicated database, or if you need to query a set of data as part of an archival strategy where the record count could be in the millions
What is the default size for PK chunking?
100,000 records
What is the maximum size for a PK Chunk?
250,000 records
What is the most efficient chunk size(recommended) for an organization?
It depends on a number of factors, such as the data volume, and the filters of the query. Customers may need to experiment with different chunk sizes to determine what is the most optimal for their implementation. 100,000 records is the default size, with chunks able to reach 250,000 records per chunk, but the increase in records per chunk means less efficiency in performance
What is the format of the header to include in the Bulk API to enable PK Chunking?
Sf-force-enable-PKchunking
When using PK chunking, how would you specify the chunk size in the header?
Sf-force-enable-PKchunking:
chunkSize = 100000;
What is the best practice for querying a supported object’s share table using PK Chunking?
Determining the chunks is more efficient in this case if the boundaries are defined on the parent object record Ids rather than the share table record Ids. So, for example, the following header could be used for a Bulk API query against the OpportunityShare object table using PK Chunking:
Sforce-Enable-PKChunking: chunkSize=150000; parent=Opportunity
When loading data with parent references into Salesforce, what is more efficient? Using External ID or a Salesforce ID?
Using an External Id has additional overhead in that it performs a kind of “lookup” to find the record, whereas this additional overhead does not occur (or is bypassed) when using the native Salesforce Id.
What is better? Performing upserts, or performing Inserts followed by Updates?
Upserts are more costly than performing Inserts and Updates separately. Avoid Upserts when it comes to large data volumes
What are some of the best practices for optimizing your data load performance?
1.) Introduce bypass logic for triggers, validation rules, workflow rules (but not at the cost of data integrity)
2.) Defer Sharing Calculations
3.) Minimize the number of fields loaded for each record. Foreign key, lookup relationships, and roll up summary fields are likely to increase processing times.
4.) Minimize the number of triggers where possible. Also, where possible, convert complex trigger code to Batch Apex that processes asynchronously after data is loaded
What are skinny tables?
Skinny tables are tables created by Salesforce that contain frequently used fields in order to avoid joins and increase performance when running reports and queries
Why would Skinny tables be needed?
Behind the scenes, for each object, Salesforce maintains separate tables for standard fields and custom fields. Normally, when a query or report contains both types of fields, a join would be needed between these two behind-the-scenes tables. A Skinny Table, which could contain standard and custom fields for an object, would eliminate the need for those joins.
For what objects are Skinny tables available?
Account, Contact, Opportunity, Lead, Cases, and Custom Objects
True or false: Picklist fields are available on Skinny tables
True
True or false: Lookup fields are available on Skinny Tables
False
True or false: Formula fields are available on Skinny Tables
False
True or False: Text Area(Long) fields are available on Skinny Tables
True
How do you create a Skinny Table for an object?
Contact Salesforce Support
How many columns can a Skinny Table contain?
100
True or False: Skinny tables cannot contain fields from other objects
True
Describe considerations with respect to Skinny Tables and Sandboxes
Skinny tables are copied to Full Copy Sandboxes, but not for other Sandboxes. If needed in other Sandboxes, contact Salesforce Suport
For what fields does Salesforce automatically maintain indexes?
1.) RecordTypeId
2.) SystemModStamp/LastModifiedDate
3.) CreatedDate
4.) Id
5.) Division
6.) Name
7.) Foreign Keys (Lookups and Master-Detail fields)
8.) Email (Leads and Contacts)
Which data types cannot be indexed?
1.) Text Area (Long)
2.) Text Area (Rich)
3.) Multi-Select Picklist
4.) Non-Deterministic Formulas
5.) Encrypted Text
What custom field type is automatically indexed when created?
External Id
What data types can be External Ids?
1.) Text
2.) Email
3.) Auto-Number
4.) Number
Describe indexes and tables
The Salesforce architecture makes the underlying data tables for custom fields unsuitable for indexing. Therefore, Salesforce creates an Index Table that contains a copy of the data, along with information about the data types.
By default, Index Tables do not include records that are null (with empty values), however you can work with Salesforce to include these if needed.
The Force.com Query Optimizer will use an index on a standard field if the filter:
Matches less than 30% of the first million records and less than 15% of the remaining records, up to a maximum of 1 million records.
The Force.com Query Optimizer will use an index on a custom field if the filter:
Matches less than 10% of the total number of records for the object, up to a maximum of 333,333 records.
What should you always do to prepare for a data load?
Test in a Sandbox environment first
This is enabled for the Bulk API by default
Parallel Mode
Describe Parallel Mode within the Bulk API
It is enabled by default. It allows for faster loading of data by processing batches in parallel
What are the trade-offs with respect to Parallel Mode?
There is risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions.
When should you use Parallel Mode vs Serial Mode?
Whenever possible, as it is best practice
When should you use Serial Mode vs. Parallel Mode?
When there is a risk of lock contention and you cannot reorganize the batches to avoid these locks
How can you organize data load batches to avoid risks of lock contention?
By organizing the data by parent Id.
Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example, in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember records referencing the same Account Id are in the same batch.
What does the Bulk API do when it encounters locks?
1.) Waits a few seconds for the lock to be released.
2.) If lock is not released, record is marked as failed.
3.) If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later.
4.) When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed.
5.) The batch will be tried again up to 10 times before the batch is marked as failed.
6.) As some records may have succeeded, you should check the results of the data load to confirm success/error details.
What operations are likely to cause lock contention and, as a result, require data loads to be run in Serial Mode via the Bulk API?
1.) Creating New Users
2.) Updating User Roles
3.) Updating Territories
4.) Changing ownership for records with a Private sharing model
With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing
10 minutes
With respect to data loads, how can you optimize batch sizes?
All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.
When loading data via batches, if more than N unprocessed requests/batches from a single organization are in the queue, additional batches from the organization will be delayed while batches from other organizations are processed
2,000