Tick/Attributes/Joins/FStatements Interview Questions Flashcards

Question 1

Q

Name each attribute in KDB.

Answer

A

Sorted (#s), Unique (#u), Parted (#p) & Grouped (#g).

Question 2

Q

Where and why would you use the #s attribute.

Answer

A

To lists/columns that are sorted in ascending order. Allows for binary search.

Question 3

Q

Where and why would you use the #u attribute.

Answer

A

To a group of distinct items such as a key-column/domain. Allows to speed up searches (distinct etc.), allows q to exit some comparisons early.

Question 4

Q

Where and why would you use the #p attribute.

Answer

A

On lists where the items are contiguous. Creates an internal map via the position of the first output of each occurrence. Once the first instance is found, data retrieval is quick .

Question 5

Q

Where and why would you use the #g attribute.

Answer

A

On lists where there is no apparent structure. Maps each output to a list of positions. Speeds up select where queries. (Large overhead of memory)

Question 6

Q

Rank the attributes in terms of memory overhead.

Answer

A

Grouped (large memory usage)
Unique
Parted
Sorted (no memory)

Question 7

Q

Name the 5 types of joins.

Answer

A

Simple Join (,), Inner Join (ij), Left Join (lj), Union Join (uj) & Asof Join (aj).

Question 8

Q

Explain ij.

Answer

A

Joins columns of t1 which have an entry in the key column of t2. t2 must be keyed. It’s key columns must be columns of t1.

Question 9

Q

Explain lj.

Answer

A

Joins columns of t1 along the matching columns of t2. Returns a record of every entry of t1 regardless of whether it appears in t2 or not.

Question 10

Q

Explain uj.

Answer

A

Joins two tables vertically. No need for keys or columns of the same name sharing data types. Often used to join trade and quote and sort ascending by time.

Question 11

Q

Explain aj.

Answer

A

Joining tables with reference to time. Used for getting the prevailing quote at the time of a trade. aj[‘sym; ‘time; trade; quote]. First argument is column to search on (sym), second argument is column to join on (time).

Question 12

Q

What attributes would you use with asof join?

Answer

A

Apply the grouped attribute to the sym column on the quote table to aid finding the prevailing quote. If the table is on disk, apply the parted attribute.

Question 13

Q

What is the syntax of functional select/exec?

Answer

A

?[t;c;b;a]
t = a table
c = a list of where specifications
b = a dictionary of grouping constraints
a = a dictionary of aggregates

Question 14

Q

What is the 5th argument in a functional select?

Answer

A

The window argument. Used for returning rows. If the fifth argument was 5 it would return the first five rows.

Question 15

Q

What is the 6th argument in a functional select?

Answer

A

The indices specification argument. Used for specifying the indices of the table table that you want.

Question 16

Q

What is the syntax for functional exec?

Answer

A

?[table;();();()].

Question 17

Q

What is the syntax for functional update?

Answer

A

![t;c;b; updatedColumns]

Question 18

Q

What is the syntax for functional delete?

Answer

A

![t;c;0b;a]

a = list of symbols referring to columns to be removed.

t + c same as select.

a or c to be present not both.

Question 19

Q

Why are functional statements used instead of qSql queries?

Answer

A

Allows users to dynamically select columns and build where clauses.

Avoiding overly complicated and long qSql statements.

Question 20

Q

What does fby do?

Answer

A

function-by. Applies aggregate functions to each member of the group selected. Saves you having to create an intermediary table and doing a left join.

Question 21

Q

Give an example of fby?

Answer

A

select from t where they price>(avg;price) fby (sym;size).

This is selecting each sym and sizes in the table whose price exceeds the average price of the whole table.

Question 22

Q

What is a compound column?

Answer

A

A column which contains lists.

Question 23

Q

How will a compound column appear on disk?

Answer

A

2 columns associated. A name file and a name# file. name# stores flattened values of the column. The name file will hold the count of each list in the column.

Question 24

Q

How would you query a string column in a HDB?

Answer

A

Using “like”, select from t where ID like “abc” or using adverbs (~:) select from t where ID ~: “abc”.

Question 25

Q

Why is a compound column split into two separate files?

Answer

A

To speed up queries. Scanning an index file is faster than scanning one large general list. The trade off is memory.

Question 26

Q

How would you set the q timer.

Answer

A

-t in the command line or \t in a q session. Timer logic is defined in .z.ts.

Question 27

Q

What arguments does .z.ts take?

Answer

A

The current time stamp.

Question 28

Q

What’s the difference between scan and over?

Answer

A

Over will operate on a list so that the 2nd argument is applied iteratively to the 1st and it only returns a final result. Scan operates the same but returns intermediary results.

Question 29

Q

Give an example of scan and over in use.

Answer

A

+/ (over) +\ (scan). (+) over 1 2 3 (+) scan 1 2 3

Question 30

Q

What are the pros and cons of using strings to store data?

Answer

A

Pros: They don’t need to be enumerated so they will not bloat the sym file.
They can be easily changed if needed.
Cons: Queries are slower than symbols. Cannot use = or ~ without adverbs. Require compound columns to speed up queries which take up memory.

Question 31

Q

What is a namespace?

Answer

A

Containers within a kdb workspace used to conveniently divide an application between modules, logically divide code into callable libraries and avoid name clashes.

Question 32

Q

How can you create a namespace?

Answer

A

Directly assigning a variable containing the dot notation or by moving into a namespace using \d.

Question 33

Q

What is meant by protected execution?
(Also the two forms of functional amend)

Answer

A

@ and . operators to provide meaningful error responses.

Question 34

Q

How would you calculate VWAP per sym for a table?

Answer

A

select vwap: size wavg price by sym from trade.

Question 35

Q

What is the basic syntax for a complete select statement?

Answer

A

Select…by….from…..where

Question 36

Q

What does the by column do in qSql?

Answer

A

The by phrase contains the key columns of the new table.

Question 37

Q

Describe a typical tick setup

Answer

A

Data Feed (Data Source)
Feed Handler (Formats data in Q and provides it TP).
Ticker-Plant (Captures and logs incoming data formats it into Q tables)
Log File (Data Recovery)
RDB (Stores current days data+writes to HDB at end of day)
HDB (Loads all data before current day for access)
Real Time subscriber (streaming analytics)
Gateway (user interface to query HDB)

Question 38

Q

Describe how a ticker-plant operates.

Answer

A

Data from data feed parsed by feed handler. Parsed data is sent to the TP. TP logs this data and populates internal tables. TP publishes table data to RDB. RDB stores stores data from that day. At EOD the RDB saves that data to the HDB, discards it’s table data and creates a new log file. The HDB is used to load and query all data saved by the RDB before the current day.

Question 39

Q

What is a chained ticker plant?

Answer

A

A nested ticker-plant. A ticker-plant attached to a master ticker-plant.

Question 40

Q

What type of language does the feed handler receive data from the data feed.

Answer

A

Usually a compiled language (C/Java)

Question 41

Q

What feed-handler function sends data to the ticker-plant?

Question 42

Q

What does -w do? (tick setup)

Answer

A

Specifies the maximum space that an RDB can take up before it shuts down (usually 4x expected feed).

Question 43

Q

What would you do if your ticker-plant goes down?

Answer

A

A complete mirror tick setup can be used.

Question 44

Q

What would you do if an RDB fails?

Answer

A

Restart the RDB which will replay the log file and recover lost data.

Question 45

Q

What is batching mode? (tick setup)

Answer

A

The TP temporarily stores data in its tables and publishes it to the clients in time intervals.

Used for tasks where update speed is not critical and ticker plant efficiency can be prioritised e.g human trading via a GUI.

Question 46

Q

What is zero latency mode? (tick setup)

Answer

A

The TP doesn’t store any data, and publishing to clients happens every time a message is received via the .u.upd function.

Used for tasks where update speed is important e.g high frequency trading.

Question 47

Q

How would you choose batching mode? (tick setup)

Answer

A

Setting a timer in the command line setting up the TP. The .u.ts function has an if statement which will choose batch mode if t is set.

Question 48

Q

How can you check if a log file exists?

Question 49

Q

In what format does the TP receive data from the feed-handler ?

Answer

A

Column oriented lists.

Question 50

Q

How does the TP create a q table from the data it receives?

Answer

A

Logic in the .u.upd function gets columns of the table from the schema and creates a dictionary with these columns as keys.

Question 51

Q

How to tell if RTS is hogging memory or acting slow.

Answer

A

.z.W dictionary will show TP connections and the outstanding bytes waiting in queue for each.

Question 52

Q

How does the RDB write down data to the HDB?

Answer

A

The .Q.hdpf function saves all tables by calling .Q.dpft, clears tables and send reload message to HDB. (HDB port; directory of HDB; partition name; ‘#p field)

Question 53

Q

How does the RDB tell the HDB to reload itself?

Answer

A

The .Q.dpft within the .u.end function.

Question 54

Q

When querying a HDB, what column should the where clause point to first?

Answer

A

The date partition or else the query will crash.

Question 55

Q

What is a .d file?

Answer

A

Stores mapping of columns to reconstruct tables in memory.

Question 56

Q

How would you get a file a path to a segmented database?

Answer

A

Use .Q.par which is able to read par.txt file and build the file path from there.

Question 57

Q

What is the virtual column in a table?

Answer

A

A hidden column called “i” which specifies the index for each row.

Question 58

Q

Why would you splay a table?

Answer

A

When the full table is too large to fit in memory.

Question 59

Q

Why partition a table?

Answer

A

Years worth of data would be too large for data to be split by columns so we split horizontally usually by time.

Question 60

Q

Why segment a table?

Answer

A

Spread it out over different I/O channels to facilitate parallel data retrieval.

Question 61

Q

How would you save a splayed table?

Answer

A

(Directory) set .Q.en [directory of sym file; table name/data]

Question 62

Q

Explain .u.tick (ticker-plant).

Answer

A

Executes .u.init and verified that all tables on the TP have time, sym as their first two columns.

Applies grouped attribute to sym columns of all tables.

Set .u.d as current date
Execute .u.ld to create TP log file

Arguments include name of schema file and directory of log file and HDB.

Question 63

Q

Explain .u.init (ticker-plant function).

Answer

A

Executed within .u.init.

Defines the list of tables .u.t which can be subscribed to.

Defines the dictionaries .u.w which maps each table name to the related subscriber handle.

No arguments

Question 64

Q

Explain .u.ld (ticker-plant function).

Answer

A

Executed in .u.end of day.

Creates ticker-plant log file and establishes connection

1 argument: .u.d

Answer 63

A

Executed as a result of an synchronous call from a RTS.

Connects a RTS to TP.

Subscribes to specified tables/syms.

Arguments include Table Name(s); Required Syms.

The RTS will be added to .u.w and will return empty schemas, location and number of messages in the log file.

The RTS will then replay the first .u.I messages to get up to speed.

Answer 64

A

Executed within .u.sub

Clear out any pre existing subscriptions from the RTS to the table they’ve subscribed within .u.w.

Arguments include table name(s) and handle.

Answer 65

A

Executed within .u.sub

Modifies .u.w with the new subscription.
Returns an empty copy of the relevant table to the new subscriber.

Arguments include table name(s) ; required syms.

Answer 66

A

Executed within .u.pub.

Grabs selected subset of table .

Arguments: table contents and required syms

Answer 67

A

Executed within .z.ts if publishing on timer in batch mode or within .u.upd if in zero latency mode.

Publishes out the relevant rows of the input tables to RTS.

Arguments: table name; current contents

Answer 68

A

Executed within .u.endofday

Sends asynchronous message to each RTS to execute the .u.end functions.

Arguments: yesterdays date

Answer 69

A

Executed in .u.ts

Sends same message to RTS to execute EOD function .u.end.

Increment current date .u.d

Close connection to old log file and connect to new log file.

No arguments.

Answer 70

A

Executed within timer function .z.ts

Checked if it’s past midnight by testing against .u.d

Args: the current date.

Answer 71

A

Executed every time a message is received from feed handler.

Checks if timestamp is present and adds if not.
Inserts data locally in ticker plant
Checks if log path is valid and if so sends it a message
Increments journal counter .u.j by one

Arguments: table name; new data

Answer 72

A

Executed every n milliseconds.

Publish relevant updates to RTS.

Clear tables and apply grouped attribute to sym column.

Set .u.i value to .u.j

Check if EOD

Arguments: current timestamp

Answer 73

A

Every time a message is received.

Publishes via .u.upd
Runs timer to see if EOD
Checks if timestamp is present adds if not

Creates a q table from the list of data received from feed-handler and publishes to subscribers

Checks if log path is valid if so sends it a message

Increments journal counter.u.i by one

Arguments: Table name; new data

Answer 74

A

Executed every 1 second
Every execution will check .u.ts for EOD
Arguments: current timestamp

Answer 75

A

Executed end of day after ticker-plant message.

Filters out any tables which don’t have #g attribute applied to their sym columns.

Uses the .Q.hdpf function which saves all tables by calling .Q.dpft, clears table and sends reload message to HDB.

Applies the grouped attribute back to each table

No arguments

Answer 76

A

Executed on a standalone line when RDB starts up

Initialises TP tables on the RDB to be empty

Uses -11! to replay TP log file as necessary

CD to the top level of HDB#

Arguments: TP subscription data; .u.I and .u.L

Answer 77

A

Path to the TP log file.

Answer 78

A

Handle to the log file. Set as end result of .u.ld function.

Answer 79

A

.u.L set () sets the path to the log file to be an empty list.

Answer 80

A

Within the .u.upd function the last step checks to see if the handle to log file exists, the table name and the data to be inserted and sends a message to the log file.

Answer 81

A

Lists, each list with the first item as the function and the rest of the items it’s arguments.

Answer 82

A

Sent in Binary format, retrieved in 3 item lists, the upd function, the table name and the data to be written.

Answer 83

A

The .u.rep function takes two arguments, empty TP tables and a two item list (.u.I and .u.L). Checks if .u.i is null, if yes, return if no, run -11! with the two item list as it’s arguments.

Answer 84

A

-11! (logFilePath) replays the whole log file into memory

-11! (N; logFilePath) this will replay first N records

-11! (-2; logFilePath) count of all the messages in the log file