Schema Design Flashcards
What are the restrictions on column family names?
Must use printable characters
Is it better to use longer or shorter column family and column names, and why?
Shorter. Each row in the Hfile contains both the column family name and the column name so long names waste space
What is the recommended maximum number of column families
No more than 3 columns families per table.
When designing column families for data what is recommended?
Keep data that is accessed simultaneously together
Flushing and Compaction occur per what?
Region
What triggers a minor compaction?
The number of files per column family
If one column family is large and has lots of files, will the other column families for that table also be flushed from Memstore?
Yes
The more column families, the greater the ___ load?
I/0 load
What are the most common attributes on a column family?
COMPRESSION VERSIONS TTL MIN_VERSIONS BLOCKSIZE IN_MEMORY BLOCKCACHE BLOOMFILTER
What are the valid values for compression? What is the default?
NONE, GZ, LSO, SNAPPY.
The default is NONE.
What are the valid values for VERSIONS? What is the default?
1+. The default is 3.
What are the valid values for MIN_VERSIONS? What is the default?
0+. The default is 0.
What are the valid values for BLOCKSIZE? What is the default?
1 byte - 2GB
The default is 64k
What are the valid values for IN_MEMORY? What is the default?
true, false
The default is false
What are the valid values for BLOOMFILTER? What is the default?
NONE, ROL,ROWCOL
The default is NONE.
Is compression recommended?
Yes for columns not containing already compressed data such as JPEG or PNG
What is the syntax for enabling compression on a column family?
alter ‘table’, {NAME => ‘colfam’, COMPRESSION => ‘codec’}
What does the VERSION attribute specify?
How many versions of a cell to retain
What does TTL specify?
The Time to Live for a cell. Cells are automatically deleted after the specified number of seconds
What does MIN_VERSIONS specify?
The minimum number of versions of a cell to retain.
When specifying MIN_VERSIONS what else must be specified?
TTL
Are there any restrictions on the value of MIN_VERSIONS?
It must be smaller than the value of VERSIONS
What scenario does using all three VERSION, TTL and MIN_VERSION settings cause?
keep the last T seconds worth of data, at most N versions, but retain at most M versions
What does the BLOCKSIZE specify?
The minimum amount of data read during any read request