Large Data Volumes Flashcards
What is the criteria for Standard Indexes
Standard Indexed Fields
Used if the filter matches less than 30% of the total records, up to one million records.
For example, a standard index is used if:
• A query is executed against a table with two million records, and the filter matches 600,000 or fewer records.
• A query is executed against a table with five million records, and the filter matches one million or fewer records.
Skinny Tables
Salesforce creates skinny tables to contain frequently used fields and to avoid joins, and it keeps the skinny tables in sync with their source tables when the source tables are modified. To enable skinny tables, contact salesforce.com Customer Support.
Skinny tables are most useful with tables containing millions of records. They can be created on custom objects, and on Account, Contact, Opportunity, Lead, and Case objects. In addition, they can enhance performance for reports, list views, and SOSL.
Skinny tables can contain a maximum of 100 columns.
• Skinny tables cannot contain fields from other objects.
• Skinny tables are not copied to sandbox organizations. To have production skinny tables activated in a sandbox organization, contact salesforce.com Customer Support.
Custom Indexes
Custom Indexed Fields
Used if the filter matches less than 10% of the total records, up to 333,333 records.
For example, a custom index is used if:
• A query is executed against a table with 500,000 records, and the filter matches 50,000 or fewer records.
• A query is executed against a table with five million records, and the filter matches 333,333 or fewer records.
What happens if one of the search criteria is not indexed?
If the criteria for an indexed field are not met, only that index is excluded from the query—other indexes might still be used if they are in the WHERE clause and meet the thresholds for records.
Do all fields in the OR clause need to be indexed for it to work?
All fields in the OR clause must be indexed in order for any index to be used
What are the OOB standard indexed fields?
- RecordTypeId
- Division
- CreatedDate
- Systemmodstamp (LastModifiedDate)
- Name
- Email (for contacts and leads)
- Foreign key relationships (lookups and master-detail)
- The unique Salesforce record ID, which is the primary key for each object
Salesforce also supports custom indexes on custom fields, with the exception of multi-select picklists, text area (long), text area(rich), non-deterministic formula fields, and encrypted text fields.
External IDs cause an index to be created on that field, which is then considered by the Force.com query optimizer. External IDs can be created on only the following fields. • Auto Number • Email • Number • Text
What are two column custom indexes
Two-column custom indexes are a specialized feature of the Salesforce platform. They are useful for list views and other situations in which you want to use one field to select the records to display and a second field to sort those records. For example, an Account list view that selects by State and sorts by City can use a two-column index with State in the first column and City in the second.
Two-column indexes are subject to the same restrictions as single-column indexes, with one exception. Two-column indexes can have nulls in the second column by default, whereas single-column indexes cannot, unless
salesforce.com Customer Support has explicitly enabled the option to include nulls.
What are the techniques to optimize performance for LDV?
- Mashups
External website or Ape Callout to get data - Defer Sharing Calculation
Perform sharing calc at night or weekend - Using SOQL and SOSL
- Deleting Data
Use bulk APIs hard delete option - Search
What are the advantages and disadvantages of Mashups?
Advantages of Using Mashups
• Data is never stale.
• No proprietary method needs to be developed to integrate the two systems.
Disadvantages of Using Mashups
• Accessing data takes more time.
• Functionality is reduced. For example, reporting and workflow do not work on the external data.
When would you use SOQL?
Use SOQL when:
• You know in which objects or fields the data resides.
• You want to:
◊ Retrieve data from a single object or from multiple objects that are related to one another
◊ Count the number of records that meet specified criteria
◊ Sort results as part of the query
◊ Retrieve data from number, date, or checkbox fields
When would you use SOSL?
Use SOSL when:
• You don’t know in which object or field the data resides, and you want to find it in the most efficient way possible.
• You want to:
◊ Retrieve multiple objects and fields efficiently, and the objects might or might not be related to one another
◊ Retrieve data for a particular division in an organization using the divisions feature, and you want to find it in the most efficient way possible
Consider the following when using SOQL or SOSL
• Both SOQL WHERE filters and SOSL search queries can specify text you should look for. When a given search can use either language, SOSL is generally faster than SOQL if the search expression uses leading wildcards or a CONTAINS term.
• In some cases, when multiple WHERE filters are being used in SOQL, indexes cannot be used even though the fields in the WHERE clause can be indexed. In this situation, decompose the single query into multiple queries, each of which should
have one WHERE filter, and then combine the results.
• Executing a query with a WHERE filter that has null values for picklists or foreign key fields doesn’t use the index, and should
be avoided.
• When designing custom query-search user interfaces, it’s important to:
◊ Keep the number of fields to be searched or queried to a minimum. Using a large number of fields leads to a large
number of permutations, which can be difficult to tune.
◊ Determine whether SOQL , SOSL, or a combination of the two is appropriate for the search.
Best Practices for Reporting?
Reduce the number of records to query—use a value in the data to segment the query. For example, query for only a single state instead of for all states. (Divisions)
• Minimize the number of:
◊ Objects queried in the report
◊ Relationships used to generate the report
• De-normalize data when practical—“over de-normalizing” the data results in more overhead. Use summarized data stored on the parent record for the report. This practice is more efficient than having the report summarize the child records.
Reduce the number of fields queried—only add fields to a
report, list view, or SOQL query that is required.
• Reduce the amount of data by archiving unused
records—move unused records to a custom object table to reduce the size of the report object.
• Use report filters that emphasize the use of standard or
custom indexed fields. Use index fields in report filters,
whenever possible.
Best Practices when Loading Data from API
Use the Salesforce Bulk API when you have more than a few hundred thousand records.
Use the fastest operation possible—insert() is fastest,
update() is next, and upsert() is next after that. If
possible, also break upsert() into two operations:
create() and update().
• Ensure that data is clean before loading when using the
Bulk API. Errors in batches trigger single-row processing
for that batch, and that processing heavily impacts
performance.
When updating, send only fields that have changed (delta-only loads).
Use Public Read/Write security during initial load to avoid
sharing calculation overhead
For custom integrations:
• Authenticate once per load, not on each record.
• Use GZIP compression and HTTP keep-alive to avoid
drops during lengthy save operations.
If possible for initial loads, populate roles before populating sharing rules. If possible, add people and data before creating and assigning groups and queues.
Disable Apex triggers, workflow rules, and validations during loads; investigate the use of batch Apex to process records after the load is complete.
When changing child records, group them by parent—group records by the field ParentId in the same batch to minimize locking conflicts.
Best Practices while querying data from the API
• Use the getUpdated() and getDeleted() SOAP API
to sync an external system with Salesforce at intervals
greater than 5 minutes. Use the outbound messaging
feature for more frequent syncing.
• When using a query that can return more than one million results, consider using the query capability of the Bulk API, which might be more suitable.