Folio Flashcards
Folio organizes data into a three level hierarchy
- Projects: or “projs” are the top level unit of organization used to group records together (typically corresponds to a real-life project). Projects encapsulate a flat list of records, there are no pre-defined tree structures or tables in Folio.
- Records: or “recs” are the basic unit of data modeling. Records are essentially associative arrays defined by a flat map of tags
- Tags: tags are the leaf level of the model. A tag is a name/value pair
A given SkySpark server hosts one or more projects. Projects are used to group records together into a single “database”. The following features operate at the project level:
- Database security (although the user database is shared between projects)
- Fresco theming
- Snapshots (backup and restore)
- Queries and filter pathing
Projects must be named using a legal programmatic name and must be four chars or longer. Projects of three letters or less are reserved for SkyFoundry, as well as names such as “skyspark”, “folio”, etc. To take advantage of future cloud support, it is highly recommended to use a unique project name for all your projects.
Projects are physically stored on the file system under a directory structured as follows:
{skyspark-home}
/ db /
projA /
data/ // mastered data
proj. diffs // diff log file
password. props // password storage
bins/ // bins directory
binXXXX/ // bins sharded into sub-dirs by age
binXXXY/ // bins sharded into sub-dirs by age
snapshots/ // default location of snapshot zips
cache/ // used for various temporary cache files
projB/ …
Use the Host App page to manage your projects.
Records
A record or rec is the basic unit of modeling in the Folio database. Records are defined as a map of tags (name/value pairs). The tags assigned to a record are free-form; you may add, update, or remove tags at anytime. Different extensions define tag libraries for modeling data using standard conventions.
Tags which have special meaning.
- id: all records have a required id tag with a Ref value which uniquely identifies the record
- mod: a DateTime timestamp indicating the last time the record was modified; this value is used for concurrency control
- dis: display name is an optional tag which should be used on any record that models data an end-user would see; this is the default tag used as the “title” of the record and links to the record
- name: the programmatic name of the record; see Naming
Tags
Tags are the name/value pairs stored in a record. The name of a tag must follow the standard rules for Naming.
The value of the tag is one of the following scalar types:
- Marker: indicates the tag is used solely to mark the record; markers are typically used to assign the record into a “type”
- NA: indicates not available
- Ref: identifier for Folio records
- Bool: true of false
- Number: 64-bit floating point number with optional Units
- Str: unicode string
- Uri: universal resource identifier per RFC 3986
- Date: standard Fantom date class
- Time: standard Fantom hour of day class
- DateTime: standard Fantom timestamp with timezone
- Bin: binary streamed data stored on file system
- Coord: geographic coordinate in latitude/longitude
Note that although Bool is supported, convention is to use presence of a marker tag.
Storage
Folio persists data to disk, but operates as an in-memory based database. Most records and tags are read from disk on startup and stored in RAM for fast access. This is required to support the real-time nature of sensor data. But it also imposes limits on Folio since most hardware tend to have less RAM than disk space.
Every project, record, and tag is stored in RAM during runtime with the exception of bins(discussed next). As you design your data models for the tag database, you should take care to limit what is stored in memory and utilize bin tags as necessary. For example the time-series historian uses records/tags for indexing, but bins to actually store the time-series samples.
Persistence is managed using a file called “proj.diffs” for each project. This is a simple text based, append-only log file. As diffs are committed, they are applied in-memory and appended to the log file. During restart, the log file is replayed from beginning to end to reconstruct the state of the database. This design is extremely robust; in the rare case of data corruption the file is easily repaired with a text editor. However the design does require periodic compression of the log file using snapshots.
Bins
Bins are special tags used to store blobs or files on disk (as opposed to other tags which are stored in-memory during runtime). For future proofing for cloud/grid architectures, only stream access is provided to bins (no random access I/O). However, Folio does support an efficient model for append only transactions to a bin.
The value of the bin tag in-memory specifies the MIME type of the file. The bin tag must be created before attempting to read/write data to the stream.
Because a bin is a just a normal tag, a given record can have multiple bins. The standard tag for storing a normal file is file. But as an example, for an image file you might also store a thumbnailtag on the rec.
Naming
Programmatic names are used by Folio for features such as:
- as tag names
- as Grid column names
- as project names
- as value of the name tag
Programmatic names use camelCaseNaming as follows:
- first char must be ASCII lower case letter: a - z
- rest of chars must be ASCII letter or digit: a - z, A - Z, 0 - 9, or _
The name tag may be used to assign a well-known name to records. A given project has one namespace, so it is illegal to create multiple records with the same name tag value. Named records may be easily accessed by name in addition to their id. Funcs are uniquely named via the name tag.
Queries
The APIs for querying a Folio database are based on filters. Filters allow you to construct predicates using basic boolean logic and comparison operators. Filters support pathing: any tag with a Ref value may be traversed using the -> operator during the query operation.
The following types of queries are supported:
- readAll: query all the records which match a filter
- read: query a single record which matches a filter
- readById: convenience for id == xxx
- readByName: convenience for name == xxx
Indexing
All queries to a Folio project take the form of a predicate Filter which is used to match a set of records. In the simplest case, each record in a project is scanned and checked against the filter for a match. Because the records are stored in RAM this operation is very fast; a run-of-the-mill server and can roughly scan 10K records every millisecond. This time will scale up linearly with your database.
Query Optimizer
To optimize performance as the number of records grow, Folio will build and maintain a memory based index for “hot” tags. The query optimizer uses these indexes to avoid scanning every record. Lets take an example query:
site and equipRef==xxxxx
This query has two tags which are used: site and equipRef. In the case of site we only care if a rec has the tag (we don’t care about its value). In the case of equipRef we care if a rec has the tagand it has a specific Ref value. Folio indexing handles both cases: it indexes which records have a tag, but it also sub-indexes the tag values.
The query optimizer will select the best index to use for the scan. If we have 300 recs with the tag site (whatever the value might be) and we have 15 recs with the tag/value pairequipRef==xxxx, then the query optimizer will chose the smallest index. In this case it will chose the equipRef==xxxx bucket and we only have to scan 15 items before determining the result.
Auto Indexing
You do not need to configure anything special to use indexing. Folio always keeps track of what tags are being used by queries. As soon as it detects a “hot” tag, it will automatically build an index for that tag. The current algorithm indexes a tag once it has been used in 100 queries.
Query Tuning
A full scan of a large database with millions of recs might take 100s of milliseconds. This might be fast enough for populating user interface screens, but when distributed across multiple functions it can add up quickly. So if working with large databases it is important to ensure that hot queries are utilizing the index.
The best way to analyze queries is with the Debug tab in the Folio app. This tab provides a wealth of statistics on tag indexes and the queries being run.