KX Interview Questions 03/11/22 Flashcards
What is the difference between SQL and kdb?
KDB is faster than SQL at time series queries due to its column-oriented architecture. qSQL is optimized for use with time series queries as it has more time-based joins. qSQL is shorter and simpler for time series queries than standard SQL. KDB functional select statements also allow for more dynamic and shorter queries than readable SQL code. Q is column oriented and Relational databases are row oriented.
What makes KDB so fast?
Vector programming where vectors can have attributes and optimised operators to improve efficiency, columnar storage, parallel computations, small memory footprint.
What is vector programming?
A vector is an array of items of the same type. Q is built around optimizing vector usage. There is many built in attributes and operations unique to Q which can make the use of vectors more efficient. Q stores table columns as vectors so that these efficiencies are still available for use on tables.
What is columnar storage?
When table data in Q is broke into columns which are stored and read by the system as single vectors each. Some programming languages save data in a row-oriented manner. Column oriented languages allow for faster queries when seeking column data but also allow for more efficient compression of columns as compression algorithms work best on arrays of the same type.
What are parallel computations?
Parallel computations are carried out simultaneously. Large problems can be broken down into smaller ones that can all be solved simultaneously. Used to speed up calculations. Peach is used to conduct parallel computing across multiple cores.
What is a small memory footprint?
KDB has a small memory footprint as t does not take up a lot of RAM while running.
What is the Ticker-plants vanilla architecture? (tick.q)
Data Feed (Source) -> Tickerplant (Converts/Publishes) -> RDB (Current days data) -> HDB (Historical data) -> Gateway (UI to query)
What is meant by database maintenance (dbmaint.q)?
Dbmaint.q contains utility functions for maintenance of partitioned tables. It can be used to make table schema changes, adding columns renaming columns and changing types etc.
How is data stored in the feed handler?
Converts compiled language into column-oriented rows.
How is data stored in the ticker plant?
Converts column-oriented rows into Q tables, time and sym as their first column with grouped attribute applied to the sym columns.
How is data stored in the real time database?
Stores Q tables, time and sym as their first column with grouped attribute applied to the sym columns.
How is data stored in the HDB?
Saves down the Q tables as partitioned tables with parted attributes on their sym column.
What is a splayed table?
Used when a full table is too large to fit into memory. Saved by directory set .Q.en [directory of sym file; table name]. Each column is saved in a seperate file under the same name as the column name. Each column is a vector of corresponding type as the Q table.
What is a partitioned table?
A partitioned database is structured as a series of partitions each of which contain a splayed table of their own.
What is a segmented table?
A segmented table is used when a partitioned table becomes too large to exist on one storage device, so it is segmented across different storage locations. Data retrieval across multiple locations can be enabled with parallel computing in Q also. Sym and Par.txt will exist in the root directory.
What are the attributes?
Grouped, Unique, Parted, Sorted
When to use #g grouped attribute?
Lists with no apparent structure, creates a dictionary that maps each value to a list of positions. Speeds up select queries with where clauses. Large memory overhead.
When to use #u unique attribute?
Distinct items/No repeated items. creates an internal hash map. Uses a large amount of memory. Speeds up searches such as distinct.
When to use #p parted attribute?
Items are contiguous. Applied to on disk sym columns. Small memory overhead. Creates an internal mapping of the position of the first instance of an item. Once the first instance is found within the internal map data retrieval from the table can skip many rows and be quick.
When to use #s sorted attribute?
On ascending data. No memory overhead. Allows for binary search.
What is enumeration?
Enumeration is saving all of the unique memory expensive values such as symbols to a corresponding less memory expensive number. They can be remapped to their original type using the sym file as an internal map.
What is multithreaded programming, each vs peach?
What is the purpose of the “.Q” namespace?
The .Q namespace contains utility objects for q programming.
What is the purpose of the “.Z” namespace?
The .z namespace contains environment variables and functions, and hooks for callbacks. Used with IPC. and for finding system info.
Name two other namespaces?
.j and .h
What is get used for?
Reading/Memory mapping a variable or kdb file. Returns its value. Often used to map columns of data from a splayed table into memory.
What is set used for?
Assign a value to a global variable or save an object as a file/directory. To splay a table: “dir set t/set[dir;t].
What is load used for?
Load binary data from a file/directory. Loads the file but does not return the contents until called on. rload is used to load a splayed table from a directory.