Folio Flashcards

Question 1

Q

Folio organizes data into a three level hierarchy

Answer

A

Projects: or “projs” are the top level unit of organization used to group records together (typically corresponds to a real-life project). Projects encapsulate a flat list of records, there are no pre-defined tree structures or tables in Folio.
Records: or “recs” are the basic unit of data modeling. Records are essentially associative arrays defined by a flat map of tags
Tags: tags are the leaf level of the model. A tag is a name/value pair

Question 2

Q

A given SkySpark server hosts one or more projects. Projects are used to group records together into a single “database”. The following features operate at the project level:

Answer

A

Database security (although the user database is shared between projects)
Fresco theming
Snapshots (backup and restore)
Queries and filter pathing

Projects must be named using a legal programmatic name and must be four chars or longer. Projects of three letters or less are reserved for SkyFoundry, as well as names such as “skyspark”, “folio”, etc. To take advantage of future cloud support, it is highly recommended to use a unique project name for all your projects.

Question 3

Q

Projects are physically stored on the file system under a directory structured as follows:

Answer

A

{skyspark-home}

/ db /

projA /

data/ // mastered data

proj. diffs // diff log file
password. props // password storage

bins/ // bins directory

binXXXX/ // bins sharded into sub-dirs by age

binXXXY/ // bins sharded into sub-dirs by age

snapshots/ // default location of snapshot zips

cache/ // used for various temporary cache files

projB/ …

Use the Host App page to manage your projects.

Question 4

Q

Records

Answer

A

A record or rec is the basic unit of modeling in the Folio database. Records are defined as a map of tags (name/value pairs). The tags assigned to a record are free-form; you may add, update, or remove tags at anytime. Different extensions define tag libraries for modeling data using standard conventions.

Question 5

Q

Tags which have special meaning.

Answer

A

id: all records have a required id tag with a Ref value which uniquely identifies the record
mod: a DateTime timestamp indicating the last time the record was modified; this value is used for concurrency control
dis: display name is an optional tag which should be used on any record that models data an end-user would see; this is the default tag used as the “title” of the record and links to the record
name: the programmatic name of the record; see Naming

Question 6

Q

Tags

Answer

A

Tags are the name/value pairs stored in a record. The name of a tag must follow the standard rules for Naming.

Question 7

Q

The value of the tag is one of the following scalar types:

Answer

A

Marker: indicates the tag is used solely to mark the record; markers are typically used to assign the record into a “type”
NA: indicates not available
Ref: identifier for Folio records
Bool: true of false
Number: 64-bit floating point number with optional Units
Str: unicode string
Uri: universal resource identifier per RFC 3986
Date: standard Fantom date class
Time: standard Fantom hour of day class
DateTime: standard Fantom timestamp with timezone
Bin: binary streamed data stored on file system
Coord: geographic coordinate in latitude/longitude

Note that although Bool is supported, convention is to use presence of a marker tag.

Question 8

Q

Storage

Answer

A

Folio persists data to disk, but operates as an in-memory based database. Most records and tags are read from disk on startup and stored in RAM for fast access. This is required to support the real-time nature of sensor data. But it also imposes limits on Folio since most hardware tend to have less RAM than disk space.

Every project, record, and tag is stored in RAM during runtime with the exception of bins(discussed next). As you design your data models for the tag database, you should take care to limit what is stored in memory and utilize bin tags as necessary. For example the time-series historian uses records/tags for indexing, but bins to actually store the time-series samples.

Persistence is managed using a file called “proj.diffs” for each project. This is a simple text based, append-only log file. As diffs are committed, they are applied in-memory and appended to the log file. During restart, the log file is replayed from beginning to end to reconstruct the state of the database. This design is extremely robust; in the rare case of data corruption the file is easily repaired with a text editor. However the design does require periodic compression of the log file using snapshots.

Question 9

Q

Bins

Answer

A

Bins are special tags used to store blobs or files on disk (as opposed to other tags which are stored in-memory during runtime). For future proofing for cloud/grid architectures, only stream access is provided to bins (no random access I/O). However, Folio does support an efficient model for append only transactions to a bin.

The value of the bin tag in-memory specifies the MIME type of the file. The bin tag must be created before attempting to read/write data to the stream.

Because a bin is a just a normal tag, a given record can have multiple bins. The standard tag for storing a normal file is file. But as an example, for an image file you might also store a thumbnailtag on the rec.

Question 10

Q

Naming

Answer

A

Programmatic names are used by Folio for features such as:

as tag names
as Grid column names
as project names
as value of the name tag

Programmatic names use camelCaseNaming as follows:

first char must be ASCII lower case letter: a - z
rest of chars must be ASCII letter or digit: a - z, A - Z, 0 - 9, or _

The name tag may be used to assign a well-known name to records. A given project has one namespace, so it is illegal to create multiple records with the same name tag value. Named records may be easily accessed by name in addition to their id. Funcs are uniquely named via the name tag.

Question 11

Q

Queries

Answer

A

The APIs for querying a Folio database are based on filters. Filters allow you to construct predicates using basic boolean logic and comparison operators. Filters support pathing: any tag with a Ref value may be traversed using the -> operator during the query operation.

The following types of queries are supported:

readAll: query all the records which match a filter
read: query a single record which matches a filter
readById: convenience for id == xxx
readByName: convenience for name == xxx

Question 12

Q

Indexing

Answer

A

All queries to a Folio project take the form of a predicate Filter which is used to match a set of records. In the simplest case, each record in a project is scanned and checked against the filter for a match. Because the records are stored in RAM this operation is very fast; a run-of-the-mill server and can roughly scan 10K records every millisecond. This time will scale up linearly with your database.

Question 13

Q

Query Optimizer

Answer

A

To optimize performance as the number of records grow, Folio will build and maintain a memory based index for “hot” tags. The query optimizer uses these indexes to avoid scanning every record. Lets take an example query:

site and equipRef==xxxxx

This query has two tags which are used: site and equipRef. In the case of site we only care if a rec has the tag (we don’t care about its value). In the case of equipRef we care if a rec has the tagand it has a specific Ref value. Folio indexing handles both cases: it indexes which records have a tag, but it also sub-indexes the tag values.

The query optimizer will select the best index to use for the scan. If we have 300 recs with the tag site (whatever the value might be) and we have 15 recs with the tag/value pairequipRef==xxxx, then the query optimizer will chose the smallest index. In this case it will chose the equipRef==xxxx bucket and we only have to scan 15 items before determining the result.

Question 14

Q

Auto Indexing

Answer

A

You do not need to configure anything special to use indexing. Folio always keeps track of what tags are being used by queries. As soon as it detects a “hot” tag, it will automatically build an index for that tag. The current algorithm indexes a tag once it has been used in 100 queries.

Question 15

Q

Query Tuning

Answer

A

A full scan of a large database with millions of recs might take 100s of milliseconds. This might be fast enough for populating user interface screens, but when distributed across multiple functions it can add up quickly. So if working with large databases it is important to ensure that hot queries are utilizing the index.

The best way to analyze queries is with the Debug tab in the Folio app. This tab provides a wealth of statistics on tag indexes and the queries being run.

Question 16

Q

The Tag Index section lists statistics on all tags which have been analyzed for optimization. This list includes statistics on tags which have not been indexed yet

Answer

A

tag: name of the tag used by a query filter
total: number of times this tag has been used in a query where the tag was eligible for index optimization
indexSize: if indexed, the number of recs which have this tag
indexVals: if indexed, the number of unique values of the tag which have been sub-indexed
lastUsed: how long ago was this tag used in a query

Question 17

Q

The Queries section lists statistics on all the queries which have been run for the project since boot time:

Answer

A

pattern: this is the canonical, normalized pattern of the query with variables replaced by “?”
total: total number of times the query pattern has been executed since boot time
avgDur: average time it takes to run this query. In the case of indexed queries, this duration excludes any queries which were run before the index was built
indexUsed: the tag index (or indexes) used by the optimizer for this query. Queries with multiple tags might use different indexes in different cases which is indicated by the percentage
lastUsed: how long ago this query pattern was executed

Question 18

Q

Diffs

Answer

A

Modifications to a Folio database are encapsulated as diffs. Diffs are a set of changes to apply to a record. Diffs work just like a patch file in a version control system. Diffs include the ability:

add or remove a record
update/add a set of tags on an existing record
remove a set of tags on an existing record (using the special remove value)

Question 19

Q

Transient Diffs

Answer

A

In general when a diff is committed, it is written to the log file for durability. However, if your application has rapidly changing real-time data this can cause serious performance issues. To support real-time data, Folio supports the concepts of transient diffs. Transient diffs are applied only to the in-memory representation of the records, but are not serialized to the log file. Transient diffs do not update the mod tag of the record. Transient diffs are allowed to be committed even when the project is in readonly mode.

Question 20

Q

Transient Recs

Answer

A

Folio also supports transient recs which are records never serialized to disk. Transient records are defined by specifying the transient marker tag during creation. Diffs applied to transient records are never written to disk regardless whether the diff itself is marked transient.

Question 21

Q

Trash

Answer

A

Records are moved into the trash bin by adding the trash marker tag. Trash recs continue to operate in the database just like any other record with two exceptions:

trash recs are not included in filtered queries unless the the filter explicitly includes thetrash tag
trash recs and their bins are skipped during snapshot backups

Question 22

Q

Concurrency Control

Answer

A

All records support the required mod tag indicating the timestamp of their last persistent modification. This timestamp is used to implement optimistic concurrency control. This model allows queries and diffs to operate without explicit locking. When constructing diffs, they are passed the version of the rec which was read. If during the diff commit the database detects that the record has been modified since the last read, then the commit fails with aConcurrentChangeErr.

Diffs support the ability to force a commit to by-pass concurrency control. This is typically used when updating status tags under complete control of a given application. Transient diffs to not update the mod tag, however unless the the force flag is used they are still checked for concurrent change.

Question 23

Q

Compaction

Answer

A

You can run a compaction operation on a folio project to compress the “proj.diffs” file. This can result in a smaller file size and speed up load time on restart. During a compaction operation the project is set into read-only mode.

Fantom API: Proj.compact
Axon API: folioCompact

User must be su to perform a compaction.

Question 24

Q

Snapshots

Answer

A

Folio supports the ability to take a snapshot of a project during runtime. A snapshot is a zip file which includes an atomic backup of the records, tags, and all the bin files. During a snapshot, the project is set into read-only mode. Any diffs committed or attempts to write to a bin during a snapshot operation will fail.

Fantom API: Proj.snapshot
Axon API: folioSnapshot

User must be su to perform a snapshot or restore.

Question 25

Q

Proj Meta

Answer

A

Every folio project should have exactly one rec with the projMeta tag. This record is used to store project wide settings. The following tags may be configured on the projMeta rec:

dis: display name for the project if you wish to use a string other than the project’s programmatic name
doc: summary string for the project
dateSpanDefault: string encoding of DateSpan used to configure the default date span used by Fresco apps
siteUri: URI value used to configure the public HTTP address of the SkySpark server
logoutRedirect: URI value to specify where the user is redirected after logging out of the project

Many of these tags can be configured in the SettingApp. Although some may be require manual editing using the FolioApp.

In addition to the tags above the system automatically maintains a version tag on the projMeta record (do not modify this tag). The projMeta rec is also used to configure tuning parameters for the folio database.

Question 26

Q