Folio Flashcards

1
Q

Folio organizes data into a three level hierarchy

A
  • Projects: or “projs” are the top level unit of organization used to group records together (typically corresponds to a real-life project). Projects encapsulate a flat list of records, there are no pre-defined tree structures or tables in Folio.
  • Records: or “recs” are the basic unit of data modeling. Records are essentially associative arrays defined by a flat map of tags
  • Tags: tags are the leaf level of the model. A tag is a name/value pair
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A given SkySpark server hosts one or more projects. Projects are used to group records together into a single “database”. The following features operate at the project level:

A
  • Database security (although the user database is shared between projects)
  • Fresco theming
  • Snapshots (backup and restore)
  • Queries and filter pathing

Projects must be named using a legal programmatic name and must be four chars or longer. Projects of three letters or less are reserved for SkyFoundry, as well as names such as “skyspark”, “folio”, etc. To take advantage of future cloud support, it is highly recommended to use a unique project name for all your projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Projects are physically stored on the file system under a directory structured as follows:

A

{skyspark-home}

/ db /

projA /

data/ // mastered data

proj. diffs // diff log file
password. props // password storage

bins/ // bins directory

binXXXX/ // bins sharded into sub-dirs by age

binXXXY/ // bins sharded into sub-dirs by age

snapshots/ // default location of snapshot zips

cache/ // used for various temporary cache files

projB/ …

Use the Host App page to manage your projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Records

A

A record or rec is the basic unit of modeling in the Folio database. Records are defined as a map of tags (name/value pairs). The tags assigned to a record are free-form; you may add, update, or remove tags at anytime. Different extensions define tag libraries for modeling data using standard conventions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tags which have special meaning.

A
  • id: all records have a required id tag with a Ref value which uniquely identifies the record
  • mod: a DateTime timestamp indicating the last time the record was modified; this value is used for concurrency control
  • dis: display name is an optional tag which should be used on any record that models data an end-user would see; this is the default tag used as the “title” of the record and links to the record
  • name: the programmatic name of the record; see Naming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tags

A

Tags are the name/value pairs stored in a record. The name of a tag must follow the standard rules for Naming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The value of the tag is one of the following scalar types:

A
  • Marker: indicates the tag is used solely to mark the record; markers are typically used to assign the record into a “type”
  • NA: indicates not available
  • Ref: identifier for Folio records
  • Bool: true of false
  • Number: 64-bit floating point number with optional Units
  • Str: unicode string
  • Uri: universal resource identifier per RFC 3986
  • Date: standard Fantom date class
  • Time: standard Fantom hour of day class
  • DateTime: standard Fantom timestamp with timezone
  • Bin: binary streamed data stored on file system
  • Coord: geographic coordinate in latitude/longitude

Note that although Bool is supported, convention is to use presence of a marker tag.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Storage

A

Folio persists data to disk, but operates as an in-memory based database. Most records and tags are read from disk on startup and stored in RAM for fast access. This is required to support the real-time nature of sensor data. But it also imposes limits on Folio since most hardware tend to have less RAM than disk space.

Every project, record, and tag is stored in RAM during runtime with the exception of bins(discussed next). As you design your data models for the tag database, you should take care to limit what is stored in memory and utilize bin tags as necessary. For example the time-series historian uses records/tags for indexing, but bins to actually store the time-series samples.

Persistence is managed using a file called “proj.diffs” for each project. This is a simple text based, append-only log file. As diffs are committed, they are applied in-memory and appended to the log file. During restart, the log file is replayed from beginning to end to reconstruct the state of the database. This design is extremely robust; in the rare case of data corruption the file is easily repaired with a text editor. However the design does require periodic compression of the log file using snapshots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bins

A

Bins are special tags used to store blobs or files on disk (as opposed to other tags which are stored in-memory during runtime). For future proofing for cloud/grid architectures, only stream access is provided to bins (no random access I/O). However, Folio does support an efficient model for append only transactions to a bin.

The value of the bin tag in-memory specifies the MIME type of the file. The bin tag must be created before attempting to read/write data to the stream.

Because a bin is a just a normal tag, a given record can have multiple bins. The standard tag for storing a normal file is file. But as an example, for an image file you might also store a thumbnailtag on the rec.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Naming

A

Programmatic names are used by Folio for features such as:

  • as tag names
  • as Grid column names
  • as project names
  • as value of the name tag

Programmatic names use camelCaseNaming as follows:

  • first char must be ASCII lower case letter: a - z
  • rest of chars must be ASCII letter or digit: a - z, A - Z, 0 - 9, or _

The name tag may be used to assign a well-known name to records. A given project has one namespace, so it is illegal to create multiple records with the same name tag value. Named records may be easily accessed by name in addition to their id. Funcs are uniquely named via the name tag.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Queries

A

The APIs for querying a Folio database are based on filters. Filters allow you to construct predicates using basic boolean logic and comparison operators. Filters support pathing: any tag with a Ref value may be traversed using the -> operator during the query operation.

The following types of queries are supported:

  • readAll: query all the records which match a filter
  • read: query a single record which matches a filter
  • readById: convenience for id == xxx
  • readByName: convenience for name == xxx
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Indexing

A

All queries to a Folio project take the form of a predicate Filter which is used to match a set of records. In the simplest case, each record in a project is scanned and checked against the filter for a match. Because the records are stored in RAM this operation is very fast; a run-of-the-mill server and can roughly scan 10K records every millisecond. This time will scale up linearly with your database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Query Optimizer

A

To optimize performance as the number of records grow, Folio will build and maintain a memory based index for “hot” tags. The query optimizer uses these indexes to avoid scanning every record. Lets take an example query:

site and equipRef==xxxxx

This query has two tags which are used: site and equipRef. In the case of site we only care if a rec has the tag (we don’t care about its value). In the case of equipRef we care if a rec has the tagand it has a specific Ref value. Folio indexing handles both cases: it indexes which records have a tag, but it also sub-indexes the tag values.

The query optimizer will select the best index to use for the scan. If we have 300 recs with the tag site (whatever the value might be) and we have 15 recs with the tag/value pairequipRef==xxxx, then the query optimizer will chose the smallest index. In this case it will chose the equipRef==xxxx bucket and we only have to scan 15 items before determining the result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Auto Indexing

A

You do not need to configure anything special to use indexing. Folio always keeps track of what tags are being used by queries. As soon as it detects a “hot” tag, it will automatically build an index for that tag. The current algorithm indexes a tag once it has been used in 100 queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Query Tuning

A

A full scan of a large database with millions of recs might take 100s of milliseconds. This might be fast enough for populating user interface screens, but when distributed across multiple functions it can add up quickly. So if working with large databases it is important to ensure that hot queries are utilizing the index.

The best way to analyze queries is with the Debug tab in the Folio app. This tab provides a wealth of statistics on tag indexes and the queries being run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The Tag Index section lists statistics on all tags which have been analyzed for optimization. This list includes statistics on tags which have not been indexed yet

A
  • tag: name of the tag used by a query filter
  • total: number of times this tag has been used in a query where the tag was eligible for index optimization
  • indexSize: if indexed, the number of recs which have this tag
  • indexVals: if indexed, the number of unique values of the tag which have been sub-indexed
  • lastUsed: how long ago was this tag used in a query
17
Q

The Queries section lists statistics on all the queries which have been run for the project since boot time:

A
  • pattern: this is the canonical, normalized pattern of the query with variables replaced by “?”
  • total: total number of times the query pattern has been executed since boot time
  • avgDur: average time it takes to run this query. In the case of indexed queries, this duration excludes any queries which were run before the index was built
  • indexUsed: the tag index (or indexes) used by the optimizer for this query. Queries with multiple tags might use different indexes in different cases which is indicated by the percentage
  • lastUsed: how long ago this query pattern was executed
18
Q

Diffs

A

Modifications to a Folio database are encapsulated as diffs. Diffs are a set of changes to apply to a record. Diffs work just like a patch file in a version control system. Diffs include the ability:

  • add or remove a record
  • update/add a set of tags on an existing record
  • remove a set of tags on an existing record (using the special remove value)
19
Q

Transient Diffs

A

In general when a diff is committed, it is written to the log file for durability. However, if your application has rapidly changing real-time data this can cause serious performance issues. To support real-time data, Folio supports the concepts of transient diffs. Transient diffs are applied only to the in-memory representation of the records, but are not serialized to the log file. Transient diffs do not update the mod tag of the record. Transient diffs are allowed to be committed even when the project is in readonly mode.

20
Q

Transient Recs

A

Folio also supports transient recs which are records never serialized to disk. Transient records are defined by specifying the transient marker tag during creation. Diffs applied to transient records are never written to disk regardless whether the diff itself is marked transient.

21
Q

Trash

A

Records are moved into the trash bin by adding the trash marker tag. Trash recs continue to operate in the database just like any other record with two exceptions:

  • trash recs are not included in filtered queries unless the the filter explicitly includes thetrash tag
  • trash recs and their bins are skipped during snapshot backups
22
Q

Concurrency Control

A

All records support the required mod tag indicating the timestamp of their last persistent modification. This timestamp is used to implement optimistic concurrency control. This model allows queries and diffs to operate without explicit locking. When constructing diffs, they are passed the version of the rec which was read. If during the diff commit the database detects that the record has been modified since the last read, then the commit fails with aConcurrentChangeErr.

Diffs support the ability to force a commit to by-pass concurrency control. This is typically used when updating status tags under complete control of a given application. Transient diffs to not update the mod tag, however unless the the force flag is used they are still checked for concurrent change.

23
Q

Compaction

A

You can run a compaction operation on a folio project to compress the “proj.diffs” file. This can result in a smaller file size and speed up load time on restart. During a compaction operation the project is set into read-only mode.

  • Fantom API: Proj.compact
  • Axon API: folioCompact

User must be su to perform a compaction.

24
Q

Snapshots

A

Folio supports the ability to take a snapshot of a project during runtime. A snapshot is a zip file which includes an atomic backup of the records, tags, and all the bin files. During a snapshot, the project is set into read-only mode. Any diffs committed or attempts to write to a bin during a snapshot operation will fail.

  • Fantom API: Proj.snapshot
  • Axon API: folioSnapshot

User must be su to perform a snapshot or restore.

25
Q

Proj Meta

A

Every folio project should have exactly one rec with the projMeta tag. This record is used to store project wide settings. The following tags may be configured on the projMeta rec:

  • dis: display name for the project if you wish to use a string other than the project’s programmatic name
  • doc: summary string for the project
  • dateSpanDefault: string encoding of DateSpan used to configure the default date span used by Fresco apps
  • siteUri: URI value used to configure the public HTTP address of the SkySpark server
  • logoutRedirect: URI value to specify where the user is redirected after logging out of the project

Many of these tags can be configured in the SettingApp. Although some may be require manual editing using the FolioApp.

In addition to the tags above the system automatically maintains a version tag on the projMeta record (do not modify this tag). The projMeta rec is also used to configure tuning parameters for the folio database.

26
Q
A