KX Interview Questions 03/11/22 Flashcards

1
Q

What is the difference between SQL and kdb?

A

KDB is faster than SQL at time series queries due to its column-oriented architecture. qSQL is optimized for use with time series queries as it has more time-based joins. qSQL is shorter and simpler for time series queries than standard SQL. KDB functional select statements also allow for more dynamic and shorter queries than readable SQL code. Q is column oriented and Relational databases are row oriented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What makes KDB so fast?

A

Vector programming where vectors can have attributes and optimised operators to improve efficiency, columnar storage, parallel computations, small memory footprint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is vector programming?

A

A vector is an array of items of the same type. Q is built around optimizing vector usage. There is many built in attributes and operations unique to Q which can make the use of vectors more efficient. Q stores table columns as vectors so that these efficiencies are still available for use on tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is columnar storage?

A

When table data in Q is broke into columns which are stored and read by the system as single vectors each. Some programming languages save data in a row-oriented manner. Column oriented languages allow for faster queries when seeking column data but also allow for more efficient compression of columns as compression algorithms work best on arrays of the same type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are parallel computations?

A

Parallel computations are carried out simultaneously. Large problems can be broken down into smaller ones that can all be solved simultaneously. Used to speed up calculations. Peach is used to conduct parallel computing across multiple cores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a small memory footprint?

A

KDB has a small memory footprint as t does not take up a lot of RAM while running.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Ticker-plants vanilla architecture? (tick.q)

A

Data Feed (Source) -> Tickerplant (Converts/Publishes) -> RDB (Current days data) -> HDB (Historical data) -> Gateway (UI to query)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is meant by database maintenance (dbmaint.q)?

A

Dbmaint.q contains utility functions for maintenance of partitioned tables. It can be used to make table schema changes, adding columns renaming columns and changing types etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is data stored in the feed handler?

A

Converts compiled language into column-oriented rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is data stored in the ticker plant?

A

Converts column-oriented rows into Q tables, time and sym as their first column with grouped attribute applied to the sym columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is data stored in the real time database?

A

Stores Q tables, time and sym as their first column with grouped attribute applied to the sym columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is data stored in the HDB?

A

Saves down the Q tables as partitioned tables with parted attributes on their sym column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a splayed table?

A

Used when a full table is too large to fit into memory. Saved by directory set .Q.en [directory of sym file; table name]. Each column is saved in a seperate file under the same name as the column name. Each column is a vector of corresponding type as the Q table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a partitioned table?

A

A partitioned database is structured as a series of partitions each of which contain a splayed table of their own.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a segmented table?

A

A segmented table is used when a partitioned table becomes too large to exist on one storage device, so it is segmented across different storage locations. Data retrieval across multiple locations can be enabled with parallel computing in Q also. Sym and Par.txt will exist in the root directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the attributes?

A

Grouped, Unique, Parted, Sorted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When to use #g grouped attribute?

A

Lists with no apparent structure, creates a dictionary that maps each value to a list of positions. Speeds up select queries with where clauses. Large memory overhead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When to use #u unique attribute?

A

Distinct items/No repeated items. creates an internal hash map. Uses a large amount of memory. Speeds up searches such as distinct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When to use #p parted attribute?

A

Items are contiguous. Applied to on disk sym columns. Small memory overhead. Creates an internal mapping of the position of the first instance of an item. Once the first instance is found within the internal map data retrieval from the table can skip many rows and be quick.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When to use #s sorted attribute?

A

On ascending data. No memory overhead. Allows for binary search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is enumeration?

A

Enumeration is saving all of the unique memory expensive values such as symbols to a corresponding less memory expensive number. They can be remapped to their original type using the sym file as an internal map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is multithreaded programming, each vs peach?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the purpose of the “.Q” namespace?

A

The .Q namespace contains utility objects for q programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the purpose of the “.Z” namespace?

A

The .z namespace contains environment variables and functions, and hooks for callbacks. Used with IPC. and for finding system info.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Name two other namespaces?

A

.j and .h

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is get used for?

A

Reading/Memory mapping a variable or kdb file. Returns its value. Often used to map columns of data from a splayed table into memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is set used for?

A

Assign a value to a global variable or save an object as a file/directory. To splay a table: “dir set t/set[dir;t].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is load used for?

A

Load binary data from a file/directory. Loads the file but does not return the contents until called on. rload is used to load a splayed table from a directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is save used for?

A

Save to a binary file.

30
Q

What is 0: ?

A

File text operator. 5 forms: 1. Prepare text (Table to delimited strings) (delimiter/csv 0: t 2. Save Text (Save strings) (file symbol 0: strings) 3. Load csv (Load delimited strings) 4. load fixed (load other format lists/strings/matrix) 5. Key Value Pairs delimited string as key value pairs

31
Q

What is read0 used for?

A

Used to read text from a file or process handle. read0 (`:foo;6;5)
“world”

32
Q

What is 1: ?

A

File binary operator. Used to read and parse or write bytes.

33
Q

What is read 1 used for?

A

Read bytes from a file.

34
Q

How do you check ram on linux?

A

free command. free -h

35
Q

How do you manage disk space on linux?

A

du = disk usage and df = disk space available

36
Q

How do you handle unresponsive q/kdb+ processes in linux?

A

\ to exit q program

37
Q

Describe your undergraduate final project.

A
38
Q

Describe your undergraduate degree.

A

My undergraduate degree was psychology and computing. It combined psychological research, general computing, and design modules to cater for the growing importance of human computer interaction research and development of psychologically rewarding and not damaging technologies, as the world gets more technology driven with new technologies like AI and Virtual reality. I studied skills such as programming, data mining, UX design, UI design, Interaction Design, Usability design and User research alongside research modules such as research methods and statistical analysis for research. All of our class members gained accreditation by the Psychological Society of Ireland for our research.

39
Q

Describe your masters.

A

My post graduate degree was in business information and analytical systems. It was a highly technical business analytics degree. We had project management modules, business analytics modules and highly technical data analytics and programming modules. We used R, Lindo and Python for these modules. We had a year-long project within the program where we created a product demand forecasting system in python that used open source met Eireann weather forecast data to predict product demand of weather sensitive products such as ice cream, barbeque food, cold and hot drinks etc. We created a full business plan with employment
, marketing and finance need for the deployment of our product.

40
Q

What are some relevant modules you conducted in your undergraduate degree?

A

The technical modules such as 3 years of programming with python, data mining, relational databases. I’m sure the design modules will also be helpful in the future.

41
Q

What are some relevant modules you participated in your postgraduate degree?

A

Programming for data analytics with Python, prescriptive analytics with LINDO and descriptive and predictive analytics with R. We also had an AWS cloud technology module which I will likely use in the future alongside some of the business analytics and project management modules.

42
Q

What are some hobbies you have outside of work?

A

I enjoy playing soccer and going to the gym in my free time. I frequently travel as well whenever I get the chance.

43
Q

What are some technical skills you have in computing?

A

Python, R, LINDO, KDB now, and some SQL.

44
Q

Describe your postgraduate final project?

A

We had a year-long project within the program where we created a product demand forecasting system in python that used open source met Eireann weather forecast data to predict product demand of weather sensitive products such as ice cream, barbeque food, cold and hot drinks etc. We created a full business plan with employment
, marketing and finance need for the deployment of our product.

45
Q

What are the most relevant modules you studied for this position?

A

Business data strategy and management, IS project planning and oversight, descriptive and predictive analytics, prescriptive analytics, programming for data analysis, cloud technologies, secure data acquisition and management. SQL Relational databases. Python programming. Data Mining with pandas.

46
Q

How would you rate your skill level on each of your technical skills mentioned.

A

Python good. KDB nearly as good but still improving. SQL okay but needs recapping as I haven’t done my SQL module since my first year of college.

47
Q

What question do you want to ask us?

A

What is the role in detail, what kind of projects will I be working on and how big is the team?

48
Q

What is the difference between -7h and 7h?

A

-7h is an atom long. 7h is a vector of longs.

49
Q

State and explain the four ways in which data (or tables) can be saved to disk.

A

Simple (one file saved as delimited text)
Splayed (a table folder where each column is its own subfolder)
Partitioned (partitioned on time each partitions have a splayed table within)
Segmented (different partitions in different storage locations)

50
Q

What is the sym file? Why is it useful?

A

Contains all of the unique symbols in a table and their enumerations so that the table can be mapped back into memory as it was before it was enumerated.

51
Q

What is the difference between a splayed and partitioned table?

A

A splayed table is a table folder that contains files for each column. A partitioned table is a table folder that contains sub folders of time values within each time value is a splayed table of all data that occurred within that time interval

52
Q

Why would we segment a database?

A

Time series databases often get too long to have saved in one storage location. Segmenting a table saves large intervals of the tables to different storage locations. values from multiple storage locations can be accessed at once through parallel computing.

53
Q

What is the par.txt file?

A

par.txt contains the directories of each partition in a segmented database.

54
Q

Create a matrix with two horizontal rows. (1 2 3) (a b c)

A

m: (1 2 3; ab`c)

55
Q

How do you define a function?

A

f: {[parameters] logic}

56
Q

What is a projection?

A

When a function is passed with a less than defined arguments. It can be used to keep one or more arguments constant.

57
Q

Describe the processes comprising a basic tick architecture.

A

Data feed is the source of data usually formatted in a compiled language that is sent to the feed handler which converts the data to Q column-oriented rows. These rows are then sent to the Ticker plant which logs the message, converts the rows into Q tables with time and sym as their first two columns and then sends these tables to the RDB where todays data can be accessed and queried in memory. At the end of day this data is sent to the HDB and discarded from the RDB. The HDB holds the data loads the data from on disk partitioned tables which can be accessed but not modified by the client via the gateway.

58
Q

State the arguments comprising a functional select statement.

A

?[t;c;b;a] t is for table name; c is for a list of conditions (where) b is a dictionary of grouping constraints (the by condition) and a is the list of aggregates (what is selected)

59
Q

Describe and explain a parse tree.

A

Passing column names as arguments created by the outputs of parse.

60
Q

How do you open an IPC connection between two q processes?

A

opening a port = q -p 5000 in the command line or \p 5000 in the q session. listening to that port in another process is done.

61
Q

What is the difference between synchronous/asynchronous messaging?

A

Synchronous is used when you want your connecting process to get a return/output from an action taken on another process getting tables/variables. Asynchronous neg is used when you don’t want an output such as assigning variables or sending tables.

62
Q

Are there any q processes in a tickerplant that would make use of synchronous (or asynchronous) messaging?

A

The feedhandler uses an asynchronous connection to the tickerplant as the feedhandler does not need to receive or show any data from the tickerplant. The RDB connects to the tickerplant so that it can produce output of the tables it has received etc.

63
Q

State and explain the four attributes that one can apply to tables

A

Grouped (no structure) Unique (no repeated items) Parted (contiguous items that aren’t ordered) and Sorted (Ascending items)

64
Q

What is multi-threading?

A

parallel computing taking place on different cores and threads for increased performance.

65
Q

What is .z.u?

A

shows the user ID as a symbol.

66
Q

Which UNIX command(s) would show you the disk usage?

A

du

67
Q

Which UNIX command(s) would show you available disk space?

A

df

68
Q

Which UNIX command(s) would show you currently running processes in the static case?

A

htop

69
Q

Which UNIX command would show you currently running processes in the dynamic case?

A

top

70
Q

Which UNIX command(s) would one use to kill a currently running process?

A

kill -9