Data Storage Flashcards
What is the storage layer for in iOS applications?
The storage layer is responsible for storing data and keeping track of state. Objects and classes in this layer perform storing, saving, persisting, and mapping/serialization operations on data that other layers of the apps, such as the service layer, provide.
At the end of the day you’ll have things as simple as in-memory arrays and dictionaries and as complex as your own custom model objects and Core Data and Realm databases in this layer.
The main point is that this layer decouples everything related to data storage and persistence from other classes and layers of your application.
Here’s a typical set of classes that you’d have in your storage layer:
- A wrapper around Keychain
- A wrapper around NSUserDefaults
- A wrapper around file manager
- A wrapper around AVFoundation to store or retrieve audio and video files to/from disk
- Repository object that performs actually read and write to disk or database
- NSManagedObjects and its subclasses used to persisty our domain model objects to Core Data (if you use Core Data)
- Post custom domain model class that represents instances of each post in your application
- PostsStorage object that initiates storing/fetching of Post models using a repository object and mapping its NSManagedObject results to Post model objects
What can you use to store data on iOS?
Generally there are the following ways to store data in order from simple to complex:
- In-memory arrays, dictionaries, sets, and other data structures
- NSUserDefaults/Keychain
- File/Disk storage
- Core Data, Realm
- SQLite In-memory arrays, dictionaries, sets, and other data structures are perfectly fine for storing data. They are fast and simple to use.
The main disadvantage though is that they can’t be persisted without some work and can’t be used to store large amounts of data.
NSUserDefaults/Keychain are simple key-value stores. One is insecure and another one is secure respectively. Advantages are that they are easy to use, relatively fast, and are actually able to persist things to disk. Disadvantages are that they were not made as a replacement for databases and can’t handle large amounts of data or extensive querying.
File/Disk storage is actually a way of writing pieces of data (serialized or not) to/from a disk using NSFileManager. The great thing about it is that it can handle big files / large amounts of data but the disadvantage is that it was not made for querying.
Core Data or Realm are frameworks that simplify work with databases.They are great for large amounts of data and perfect for querying and filtering. Disadvantages are the setup overhead and learning curve.
What is NSCoding?
NSCoding is a widely used protocol for data serialization necessary for some of the data-storing operations using NSUserDefaults, NSFileManager, and Keychain.
NSCoding is a Cocoa protocol that allows objects that adopt it to be serialized for NSUserDefaults, NSFileManager, or Keychain storage. The way it works is you implement the init?(coder decoder: NSCoder) and encodeWithCoder methods in the objects that comply to that protocol. Those methods decode and encode the object respectively for persistence and retrieval.
The gotcha with implementing NSCoding is that every property that you encode and decode needs to comply to the NSCoding protocol as well. All the “primitive” values such as String, Number, and Int already do that and every custom object that you’re trying to serialize as one of the properties needs to comply to that protocol as well.
NSCoding is one of the fundamental protocols to use “lightweight” persistence implementation in iOS applications. Every iOS dev should be familiar with it.
What is NSUserDefaults?
NSUserDefaults is one of the common tools used in virtually every appli- cation for lightweight storage. Every iOS developer should be familiar with it.
NSUserDefaults is a key-value storage that can persist serialized NSCoding compliant objects and primitives. Unlike Keychain, it is not secure and does not persist between app uninstalls.
It’s main purpose is to store small objects that are easily retrievable but also not important to lose.
A typical use case for it is some locally stored user preferences and/or flags. Do not use it as a database replacement because it was not built for extensive querying or for handling large amounts of data.
Using user defaults for data that needs to be secure is a red flag. For example, you would never want to store a user’s password or access token in user defaults; use Keychain for that instead.
What is Keychain and when do you need it?
Storing data securely is important for every iOS app, big or small. Keychain is a secure alternative to NSUserDefaults. It is a key-value store that is encrypted by the system and persists between app reinstalls unlike other types of data such as NSUserDefaults, files on disk, and Core Data databases.
The advantage of Keychain is that it is secure, but the disadvantage is that its API is difficult to use. The main use case for Keychain is to store small objects and primitives, such as tokens and passwords, securely.
Use it instead of NSUserDefaults for that purpose, and just like with NSUserDefaults do not use it to store large amounts of data, such as databases, images, and videos.
How do you save data to a disk on iOS?
File storage is used to persist large amounts of data on a disk such as images, videos, and other kinds of files. NSFileManager is the class you would use to manipulate you app’s folder on a disk. It is capable of creating subdirectories and storing files. You can store or read any NSData ob- ject whether it’s an image, video, or an object serialized through the NSCoding protocol.
What database options are there for iOS applications?
The go-to database solution on iOS is Core Data. There is also an option to use SQLite directly but tools are not that advanced for that, so you’ll have to come up with some customizations of your own. Another popular database framework is Realm. Each one of them has their own advantages and disadvantages.
Core Data is an object graph and persistence framework that is the go-to solution for local database storage on iOS. Advantages of that framework are that it is widely used and is supported by Apple. You can use it almost out of the box in your project, and it does a decent job of persisting data and making querying more or less straightforward.
A disadvantage is that the Core Data API is not that easy to use in some scenarios and specifically in a multi-threading environment. Another big disadvantage of Core Data is that there’s a learning curve to it since it is not a straight- forward addition on top of a relational database where each object represents a row in a table (like in ActiveRecord, for example), but rather an object graph storage.
Realm is an alternative to the Core Data database solution. It was built from the ground up to be easier to use and faster than Core Data or SQL. Advantages of Realm are that it’s fast, has reactive features, is easier to use, is secure, and has entire cloud platform for syncing and other more advanced features. A disadvantage is that it is still in development - although the Realm team made a lot of progress recently - and as of the time of this writing, it doesn’t have all the features on par with Core Data’s NSFetchedResultsController. There are also issues with the size of realm databases.
Due to their playback feature, it has to store way more data to replay the events that happened as compared to Core Data or SQL, which stores only the latest snapshot without keeping a history of all the changes. Realm has a lot of potential to become the most popular solution for database storage on iOS in the long run, especially with all the backend/syncing functionality they are building into it.
SQLite is a relational database that powers Core Data under the hood. It can be accessed directly on iOS and used without Core Data, but it will require implementing custom tooling for accessing, reading, and writing to it. The main advantages of using SQLite directly are that it is going to be fast, and if you have SQL skills you can leverage them.
The main disadvantage though is that you’ll have to do all the heavy lifting of setting things up and accessing and migrating the database yourself; there are no good tools out there to help with that.
These days saying that there’s only Core Data on iOS for databases would raise a red flag because the expectation is that developers are constantly looking for better solutions and are aware of other alternatives such as Realm or SQLite.
How is data mapping important when you store data?
One of the three main purposes of the storage layer, besides actually storing and persisting the data, is data serialization. Just like when you get data in the service layer in JSON or another format from external APIs and then serialize and map it into your custom domain model in the storage layer, you will need to serialize and map your data to and from your custom domain model objects to the format that your storage understands.
The “mapping”chain for reading data looks like this:db -> raw data format -> custom domain models. And for writing like this: custom domain models -> raw data format -> db.
For example, that means that if you use Core Data, then serialization of your data that you’ll make before saving it in Core Data will be mapping it to NSManagedObjects and then saving those to the Core Data database. And vice versa, when you need to retrieve data from Core Data, you’ll create a predicate to query it and then you’ll get back a bunch of NSManagedObjects and/or their subclasses as the result. You’ll then need to map those objects into your own custom domain model objects to be able to easily work with them.
Specifically, in the case of NSManagedObjects, there are different approaches to working with data and quite often NSManagedObject subclasses are used directly as model objects throughout application. It is convenient after all to use them since mapping of values and properties is easily defined in the Core Data entity schema UI in Xcode.
But there’s a disadvantage to that approach that lies in coupling of responsibilities in NSManagedObject subclasses. If you use them throughout your application as domain models, then you couple yourself to Core Data directly and carry all the functionality of NSManagedObject, with them throughout the application. This issue is especially apparent when, inevitably, issues with multi-threading and concurrency arise.
A cleaner way of doing it would be to use NSManagedObject and/or its subclasses only for data persistence and retrieval and use your own custom objects throughout the application as domain models.
How would you approach major database/storage migration in your application?
In practice, database or underlying storage migrations happen very rarely on iOS applications. Typically codebases end up getting stuck with whatever they picked as the initial storage/database solution (quite often Core Data). But there’s a way you could organize your code using the Single Responsibility Principle where your codebase will be completely decoupled and agnostic of the persistence framework you use.
The main idea is to have a clear separation between your code that needs to access and use data from the database and the code that actually knows what database to use and how to access data in it. Typically that role is played by some kind of storage object that is the main object responsible for getting data in and out of database for the rest of the application.
Internally that object would use one or more other objects that actually know how to work with an underlying database, let’s say Core Data. And only those objects in the storage class actually refer to Core Data and know how to query it and how to write to it. Since the rest of the application doesn’t know anything about Core Data or whatever database solution you use, when the time comes, you could easily swap the underlying database for Realm, for example. You’d have to write some data migration code that will map and copy data from the existing Core Data to the new Realm. But the main approach will remain the same - the rest of your application continues to rely on the storage object to get the data and that object knows how to actually work with it.