Tick/Attributes/Joins/FStatements Interview Questions Flashcards
Name each attribute in KDB.
Sorted (#s), Unique (#u), Parted (#p) & Grouped (#g).
Where and why would you use the #s attribute.
To lists/columns that are sorted in ascending order. Allows for binary search.
Where and why would you use the #u attribute.
To a group of distinct items such as a key-column/domain. Allows to speed up searches (distinct etc.), allows q to exit some comparisons early.
Where and why would you use the #p attribute.
On lists where the items are contiguous. Creates an internal map via the position of the first output of each occurrence. Once the first instance is found, data retrieval is quick .
Where and why would you use the #g attribute.
On lists where there is no apparent structure. Maps each output to a list of positions. Speeds up select where queries. (Large overhead of memory)
Rank the attributes in terms of memory overhead.
- Grouped (large memory usage)
- Unique
- Parted
- Sorted (no memory)
Name the 5 types of joins.
Simple Join (,), Inner Join (ij), Left Join (lj), Union Join (uj) & Asof Join (aj).
Explain ij.
Joins columns of t1 which have an entry in the key column of t2. t2 must be keyed. It’s key columns must be columns of t1.
Explain lj.
Joins columns of t1 along the matching columns of t2. Returns a record of every entry of t1 regardless of whether it appears in t2 or not.
Explain uj.
Joins two tables vertically. No need for keys or columns of the same name sharing data types. Often used to join trade and quote and sort ascending by time.
Explain aj.
Joining tables with reference to time. Used for getting the prevailing quote at the time of a trade. aj[‘sym; ‘time; trade; quote]. First argument is column to search on (sym), second argument is column to join on (time).
What attributes would you use with asof join?
Apply the grouped attribute to the sym column on the quote table to aid finding the prevailing quote. If the table is on disk, apply the parted attribute.
What is the syntax of functional select/exec?
?[t;c;b;a]
t = a table
c = a list of where specifications
b = a dictionary of grouping constraints
a = a dictionary of aggregates
What is the 5th argument in a functional select?
The window argument. Used for returning rows. If the fifth argument was 5 it would return the first five rows.
What is the 6th argument in a functional select?
The indices specification argument. Used for specifying the indices of the table table that you want.
What is the syntax for functional exec?
?[table;();();()].
What is the syntax for functional update?
![t;c;b; updatedColumns]
What is the syntax for functional delete?
![t;c;0b;a]
a = list of symbols referring to columns to be removed.
t + c same as select.
a or c to be present not both.
Why are functional statements used instead of qSql queries?
Allows users to dynamically select columns and build where clauses.
Avoiding overly complicated and long qSql statements.
What does fby do?
function-by. Applies aggregate functions to each member of the group selected. Saves you having to create an intermediary table and doing a left join.
Give an example of fby?
select from t where they price>(avg;price) fby (sym;size).
This is selecting each sym and sizes in the table whose price exceeds the average price of the whole table.
What is a compound column?
A column which contains lists.
How will a compound column appear on disk?
2 columns associated. A name file and a name# file. name# stores flattened values of the column. The name file will hold the count of each list in the column.
How would you query a string column in a HDB?
Using “like”, select from t where ID like “abc” or using adverbs (~:) select from t where ID ~: “abc”.
Why is a compound column split into two separate files?
To speed up queries. Scanning an index file is faster than scanning one large general list. The trade off is memory.
How would you set the q timer.
-t in the command line or \t in a q session. Timer logic is defined in .z.ts.
What arguments does .z.ts take?
The current time stamp.
What’s the difference between scan and over?
Over will operate on a list so that the 2nd argument is applied iteratively to the 1st and it only returns a final result. Scan operates the same but returns intermediary results.
Give an example of scan and over in use.
+/ (over) +\ (scan). (+) over 1 2 3 (+) scan 1 2 3
What are the pros and cons of using strings to store data?
Pros: They don’t need to be enumerated so they will not bloat the sym file.
They can be easily changed if needed.
Cons: Queries are slower than symbols. Cannot use = or ~ without adverbs. Require compound columns to speed up queries which take up memory.
What is a namespace?
Containers within a kdb workspace used to conveniently divide an application between modules, logically divide code into callable libraries and avoid name clashes.
How can you create a namespace?
Directly assigning a variable containing the dot notation or by moving into a namespace using \d.
What is meant by protected execution?
(Also the two forms of functional amend)
@ and . operators to provide meaningful error responses.
How would you calculate VWAP per sym for a table?
select vwap: size wavg price by sym from trade.