Schema Design Flashcards
What are the restrictions on column family names?
Must use printable characters
Is it better to use longer or shorter column family and column names, and why?
Shorter. Each row in the Hfile contains both the column family name and the column name so long names waste space
What is the recommended maximum number of column families
No more than 3 columns families per table.
When designing column families for data what is recommended?
Keep data that is accessed simultaneously together
Flushing and Compaction occur per what?
Region
What triggers a minor compaction?
The number of files per column family
If one column family is large and has lots of files, will the other column families for that table also be flushed from Memstore?
Yes
The more column families, the greater the ___ load?
I/0 load
What are the most common attributes on a column family?
COMPRESSION VERSIONS TTL MIN_VERSIONS BLOCKSIZE IN_MEMORY BLOCKCACHE BLOOMFILTER
What are the valid values for compression? What is the default?
NONE, GZ, LSO, SNAPPY.
The default is NONE.
What are the valid values for VERSIONS? What is the default?
1+. The default is 3.
What are the valid values for MIN_VERSIONS? What is the default?
0+. The default is 0.
What are the valid values for BLOCKSIZE? What is the default?
1 byte - 2GB
The default is 64k
What are the valid values for IN_MEMORY? What is the default?
true, false
The default is false
What are the valid values for BLOOMFILTER? What is the default?
NONE, ROL,ROWCOL
The default is NONE.
Is compression recommended?
Yes for columns not containing already compressed data such as JPEG or PNG
What is the syntax for enabling compression on a column family?
alter ‘table’, {NAME => ‘colfam’, COMPRESSION => ‘codec’}
What does the VERSION attribute specify?
How many versions of a cell to retain