Tick/Attributes/Joins/FStatements Interview Questions Flashcards

1
Q

Name each attribute in KDB.

A

Sorted (#s), Unique (#u), Parted (#p) & Grouped (#g).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where and why would you use the #s attribute.

A

To lists/columns that are sorted in ascending order. Allows for binary search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where and why would you use the #u attribute.

A

To a group of distinct items such as a key-column/domain. Allows to speed up searches (distinct etc.), allows q to exit some comparisons early.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where and why would you use the #p attribute.

A

On lists where the items are contiguous. Creates an internal map via the position of the first output of each occurrence. Once the first instance is found, data retrieval is quick .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where and why would you use the #g attribute.

A

On lists where there is no apparent structure. Maps each output to a list of positions. Speeds up select where queries. (Large overhead of memory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Rank the attributes in terms of memory overhead.

A
  1. Grouped (large memory usage)
  2. Unique
  3. Parted
  4. Sorted (no memory)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name the 5 types of joins.

A

Simple Join (,), Inner Join (ij), Left Join (lj), Union Join (uj) & Asof Join (aj).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain ij.

A

Joins columns of t1 which have an entry in the key column of t2. t2 must be keyed. It’s key columns must be columns of t1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain lj.

A

Joins columns of t1 along the matching columns of t2. Returns a record of every entry of t1 regardless of whether it appears in t2 or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain uj.

A

Joins two tables vertically. No need for keys or columns of the same name sharing data types. Often used to join trade and quote and sort ascending by time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain aj.

A

Joining tables with reference to time. Used for getting the prevailing quote at the time of a trade. aj[‘sym; ‘time; trade; quote]. First argument is column to search on (sym), second argument is column to join on (time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What attributes would you use with asof join?

A

Apply the grouped attribute to the sym column on the quote table to aid finding the prevailing quote. If the table is on disk, apply the parted attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the syntax of functional select/exec?

A

?[t;c;b;a]
t = a table
c = a list of where specifications
b = a dictionary of grouping constraints
a = a dictionary of aggregates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the 5th argument in a functional select?

A

The window argument. Used for returning rows. If the fifth argument was 5 it would return the first five rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the 6th argument in a functional select?

A

The indices specification argument. Used for specifying the indices of the table table that you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the syntax for functional exec?

A

?[table;();();()].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the syntax for functional update?

A

![t;c;b; updatedColumns]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the syntax for functional delete?

A

![t;c;0b;a]

a = list of symbols referring to columns to be removed.

t + c same as select.

a or c to be present not both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why are functional statements used instead of qSql queries?

A

Allows users to dynamically select columns and build where clauses.

Avoiding overly complicated and long qSql statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does fby do?

A

function-by. Applies aggregate functions to each member of the group selected. Saves you having to create an intermediary table and doing a left join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Give an example of fby?

A

select from t where they price>(avg;price) fby (sym;size).

This is selecting each sym and sizes in the table whose price exceeds the average price of the whole table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a compound column?

A

A column which contains lists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How will a compound column appear on disk?

A

2 columns associated. A name file and a name# file. name# stores flattened values of the column. The name file will hold the count of each list in the column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How would you query a string column in a HDB?

A

Using “like”, select from t where ID like “abc” or using adverbs (~:) select from t where ID ~: “abc”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why is a compound column split into two separate files?

A

To speed up queries. Scanning an index file is faster than scanning one large general list. The trade off is memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How would you set the q timer.

A

-t in the command line or \t in a q session. Timer logic is defined in .z.ts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What arguments does .z.ts take?

A

The current time stamp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What’s the difference between scan and over?

A

Over will operate on a list so that the 2nd argument is applied iteratively to the 1st and it only returns a final result. Scan operates the same but returns intermediary results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Give an example of scan and over in use.

A

+/ (over) +\ (scan). (+) over 1 2 3 (+) scan 1 2 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the pros and cons of using strings to store data?

A

Pros: They don’t need to be enumerated so they will not bloat the sym file.
They can be easily changed if needed.
Cons: Queries are slower than symbols. Cannot use = or ~ without adverbs. Require compound columns to speed up queries which take up memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a namespace?

A

Containers within a kdb workspace used to conveniently divide an application between modules, logically divide code into callable libraries and avoid name clashes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How can you create a namespace?

A

Directly assigning a variable containing the dot notation or by moving into a namespace using \d.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is meant by protected execution?
(Also the two forms of functional amend)

A

@ and . operators to provide meaningful error responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How would you calculate VWAP per sym for a table?

A

select vwap: size wavg price by sym from trade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the basic syntax for a complete select statement?

A

Select…by….from…..where

36
Q

What does the by column do in qSql?

A

The by phrase contains the key columns of the new table.

37
Q

Describe a typical tick setup

A

Data Feed (Data Source)
Feed Handler (Formats data in Q and provides it TP).
Ticker-Plant (Captures and logs incoming data formats it into Q tables)
Log File (Data Recovery)
RDB (Stores current days data+writes to HDB at end of day)
HDB (Loads all data before current day for access)
Real Time subscriber (streaming analytics)
Gateway (user interface to query HDB)

38
Q

Describe how a ticker-plant operates.

A

Data from data feed parsed by feed handler. Parsed data is sent to the TP. TP logs this data and populates internal tables. TP publishes table data to RDB. RDB stores stores data from that day. At EOD the RDB saves that data to the HDB, discards it’s table data and creates a new log file. The HDB is used to load and query all data saved by the RDB before the current day.

39
Q

What is a chained ticker plant?

A

A nested ticker-plant. A ticker-plant attached to a master ticker-plant.

40
Q

What type of language does the feed handler receive data from the data feed.

A

Usually a compiled language (C/Java)

41
Q

What feed-handler function sends data to the ticker-plant?

A

.u.upd

42
Q

What does -w do? (tick setup)

A

Specifies the maximum space that an RDB can take up before it shuts down (usually 4x expected feed).

43
Q

What would you do if your ticker-plant goes down?

A

A complete mirror tick setup can be used.

44
Q

What would you do if an RDB fails?

A

Restart the RDB which will replay the log file and recover lost data.

45
Q

What is batching mode? (tick setup)

A

The TP temporarily stores data in its tables and publishes it to the clients in time intervals.

Used for tasks where update speed is not critical and ticker plant efficiency can be prioritised e.g human trading via a GUI.

46
Q

What is zero latency mode? (tick setup)

A

The TP doesn’t store any data, and publishing to clients happens every time a message is received via the .u.upd function.

Used for tasks where update speed is important e.g high frequency trading.

47
Q

How would you choose batching mode? (tick setup)

A

Setting a timer in the command line setting up the TP. The .u.ts function has an if statement which will choose batch mode if t is set.

48
Q

How can you check if a log file exists?

A

.u.l

49
Q

In what format does the TP receive data from the feed-handler ?

A

Column oriented lists.

50
Q

How does the TP create a q table from the data it receives?

A

Logic in the .u.upd function gets columns of the table from the schema and creates a dictionary with these columns as keys.

51
Q

How to tell if RTS is hogging memory or acting slow.

A

.z.W dictionary will show TP connections and the outstanding bytes waiting in queue for each.

52
Q

How does the RDB write down data to the HDB?

A

The .Q.hdpf function saves all tables by calling .Q.dpft, clears tables and send reload message to HDB. (HDB port; directory of HDB; partition name; ‘#p field)

53
Q

How does the RDB tell the HDB to reload itself?

A

The .Q.dpft within the .u.end function.

54
Q

When querying a HDB, what column should the where clause point to first?

A

The date partition or else the query will crash.

55
Q

What is a .d file?

A

Stores mapping of columns to reconstruct tables in memory.

56
Q

How would you get a file a path to a segmented database?

A

Use .Q.par which is able to read par.txt file and build the file path from there.

57
Q

What is the virtual column in a table?

A

A hidden column called “i” which specifies the index for each row.

58
Q

Why would you splay a table?

A

When the full table is too large to fit in memory.

59
Q

Why partition a table?

A

Years worth of data would be too large for data to be split by columns so we split horizontally usually by time.

60
Q

Why segment a table?

A

Spread it out over different I/O channels to facilitate parallel data retrieval.

61
Q

How would you save a splayed table?

A

(Directory) set .Q.en [directory of sym file; table name/data]

62
Q

Explain .u.tick (ticker-plant).

A

Executes .u.init and verified that all tables on the TP have time, sym as their first two columns.

Applies grouped attribute to sym columns of all tables.

Set .u.d as current date
Execute .u.ld to create TP log file

Arguments include name of schema file and directory of log file and HDB.

63
Q

Explain .u.init (ticker-plant function).

A

Executed within .u.init.

Defines the list of tables .u.t which can be subscribed to.

Defines the dictionaries .u.w which maps each table name to the related subscriber handle.

No arguments

64
Q

Explain .u.ld (ticker-plant function).

A

Executed in .u.end of day.

Creates ticker-plant log file and establishes connection

1 argument: .u.d

65
Q

Explain .u.sub. (ticker-plant function).

A

Executed as a result of an synchronous call from a RTS.

Connects a RTS to TP.

Subscribes to specified tables/syms.

Arguments include Table Name(s); Required Syms.

The RTS will be added to .u.w and will return empty schemas, location and number of messages in the log file.

The RTS will then replay the first .u.I messages to get up to speed.

66
Q

Explain .u.del (ticker-plant function).

A

Executed within .u.sub

Clear out any pre existing subscriptions from the RTS to the table they’ve subscribed within .u.w.

Arguments include table name(s) and handle.

67
Q

Explain .u.add (ticker-plant function).

A

Executed within .u.sub

Modifies .u.w with the new subscription.
Returns an empty copy of the relevant table to the new subscriber.

Arguments include table name(s) ; required syms.

68
Q

Explain .u.sel (ticker-plant function).

A

Executed within .u.pub.

Grabs selected subset of table .

Arguments: table contents and required syms

69
Q

Explain .u.pub (ticker-plant function).

A

Executed within .z.ts if publishing on timer in batch mode or within .u.upd if in zero latency mode.

Publishes out the relevant rows of the input tables to RTS.

Arguments: table name; current contents

70
Q

Explain .u.end (ticker-plant function).

A

Executed within .u.endofday

Sends asynchronous message to each RTS to execute the .u.end functions.

Arguments: yesterdays date

71
Q

Explain .u.endofday (ticker-plant function).

A

Executed in .u.ts

Sends same message to RTS to execute EOD function .u.end.

Increment current date .u.d

Close connection to old log file and connect to new log file.

No arguments.

72
Q

Explain .u.ts (ticker-plant function).

A

Executed within timer function .z.ts

Checked if it’s past midnight by testing against .u.d

Args: the current date.

73
Q

Explain .u.upd in batch mode (ticker-plant function).

A

Executed every time a message is received from feed handler.

Checks if timestamp is present and adds if not.
Inserts data locally in ticker plant
Checks if log path is valid and if so sends it a message
Increments journal counter .u.j by one

Arguments: table name; new data

74
Q

Explain .z.ts in batch mode (ticker-plant function).

A

Executed every n milliseconds.

Publish relevant updates to RTS.

Clear tables and apply grouped attribute to sym column.

Set .u.i value to .u.j

Check if EOD

Arguments: current timestamp

75
Q

Explain .u.upd in zero latency mode (ticker-plant function).

A

Every time a message is received.

Publishes via .u.upd
Runs timer to see if EOD
Checks if timestamp is present adds if not

Creates a q table from the list of data received from feed-handler and publishes to subscribers

Checks if log path is valid if so sends it a message

Increments journal counter.u.i by one

Arguments: Table name; new data

76
Q

Explain .z.ts in zero-latency mode (ticker-plant function).

A

Executed every 1 second
Every execution will check .u.ts for EOD
Arguments: current timestamp

77
Q

Explain .u.end (RDB function)

A

Executed end of day after ticker-plant message.

Filters out any tables which don’t have #g attribute applied to their sym columns.

Uses the .Q.hdpf function which saves all tables by calling .Q.dpft, clears table and sends reload message to HDB.

Applies the grouped attribute back to each table

No arguments

78
Q

Explain .u.rep (RDB function)

A

Executed on a standalone line when RDB starts up

Initialises TP tables on the RDB to be empty

Uses -11! to replay TP log file as necessary

CD to the top level of HDB#

Arguments: TP subscription data; .u.I and .u.L

79
Q

What is .u.l ?

A

Path to the TP log file.

80
Q

What is .u.l (lowercase l)

A

Handle to the log file. Set as end result of .u.ld function.

81
Q

How do you initialise a log file?

A

.u.L set () sets the path to the log file to be an empty list.

82
Q

How does the TP send messages to the log file?

A

Within the .u.upd function the last step checks to see if the handle to log file exists, the table name and the data to be inserted and sends a message to the log file.

83
Q

What is contained in a log file?

A

Lists, each list with the first item as the function and the rest of the items it’s arguments.

84
Q

I’m what format are log file entries.

A

Sent in Binary format, retrieved in 3 item lists, the upd function, the table name and the data to be written.

85
Q

How does the RDB replay a log file?

A

-11!

86
Q

How does the RDB use -11! ?

A

The .u.rep function takes two arguments, empty TP tables and a two item list (.u.I and .u.L). Checks if .u.i is null, if yes, return if no, run -11! with the two item list as it’s arguments.

87
Q

What are the 3 overloads of -11! ?

A

-11! (logFilePath) replays the whole log file into memory

-11! (N; logFilePath) this will replay first N records

-11! (-2; logFilePath) count of all the messages in the log file