Columnstores and Sparse Columns Flashcards
What are columnstore indexes used for?
Storing and querying large data warehousing fact tables (Large amounts of data that don’t change)
What is a column store?
It’s data that is logically organized in columns and rows but physically stored in a column-wise data format
What is a rowgroup?
A group of rows that are compressed into columnstore format at the same time.
What is the maximum amount of rows in a rowgroup?
1,048,576 rows
What happens to a rowgroup that has all data deleted?
It transitions from COMPRESSION to TOMBSTONE state and is later removed by a background process named the tuple-mover.
What happens with a columnstore that has many small rowgroups?
The columnstore index quality suffers.
What is a column segment?
A column of data from within the rowgroup.
What does the rowgroup contain?
One column segment for every column on the table.
What is the clustered columnstore index?
It is the physical storage for the entire table.
What is a Delta Rowgroup?
It is a clustered B-Tree index that stores rows until the number of rows reaches the threshold of a columnstore (1048576 rows) and then it is moved into the columnstore.
What are the benefits to using a columnstore?
- It provides a high level of data compression, typically by 10x, to reduce data warehouse storage costs.
- Performance is better for queries
- Smaller in memory footprint
- Batch execution by processing multiple rows together
- Queries usually only select a few columns which reduces I/O
Can you have a columnstore index on a rowstore table?
Yes, if you have 2016 or higher. You have to use a nonclustered columnstore and it takes more space.
When do you use a nonclustered columnstore index?
To perform analysis in real time on an OLTP workload.
How many columns must be included for a nonclustered columnstore index on a Memory Optimized OLTP table?
All of them.
When should you use sparse columns?
When the space saved is at least 20 percent to 40 percent