Data Files and Serialization Flashcards

1
Q

What is data integration?

A

Data integration is combing data, which exists in different resources, and presenting them to the user.

File transfer - when two systems needs to communication they need to produce shared files which they both can make use of.

Shared database - is often used because it is language & platform independent.

Handling different data formats depending on the use case.

Common data transfer mechanism is essential - communication between system should be independent of language & platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is File Based data integration?

A

When two or more applications make use of shared files.

The integrator will be responsible for setting the format and other common protocols, for when the use of these files are executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mention a few pros for File Based Data Integration.

A

Files are popular because they compatible with all popular languages.

Integrators does only need to know input/output, not how the consuming apps works.

Highly flexible.

No extra tools needed.

Easy to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mention a few cons for File Based Data Integration.

A

Scalability - Adaptability - Security - Data synchronisation.

Also a format must be agreed upon sharing files.

A more desirable approach is Shared database, Messaging or RPC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which file types/formats are popular in Data Files?

A

CSV - XML - JSON

TEXT (not used for Integration, mostly used logging errors/processes, testing or to read tokens)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When would you use text files as your data file?

A

When logging errors/processes, testing or to read tokens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When could you use CSV files as Data file?

A

To move large amount of data (fast because no overhead like JSON OR XML

To view a data table (very human readable)

To better organise data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What pros are there for using CSV as Data file?

A

Human readable and easy to modify.
Faster to handle (no overhead).
It is easy to generate CSV files.
Very popular, many use CSV and therefore it is well know.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What cons are there for using CSV as Data file?

A

It can only handle basic data, a comma would break the structure since it is used as separator.

No distinction between text & numeric values.

Problems when importing CSV to SQL (No distinction between NULL and quotes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is JSON used as a Data file?

A

When creating micro-services & RESTful services

When storing data temporarily. Much easier than storing in a database.

Often used as configuration files & manifest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What pros are there for using JSON as Data file?

A

Easy to read - both for machines and humans

Easy to parse - Most languages support parsing JSON.

Better performance than XML(faster, easier to read)

Very popular..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What cons are there for using JSON as Data file?

A

Can’t use comments

It is encapsulated. The structure needs to be divided into seperate objects if you want to add new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When could you use XML files as Data file?

A

Heavily used in SOA.

In SOAP. when you need complicated objects

Configuration files and manifests for apps (Maven,
Gradle etc.)

Standard for Office file formats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What pros are there for using XML as Data file?

A

Great at describing data using attributes

Lightweight when parsing as it has a strict syntax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What cons are there for using XML as Data file?

A

Data redundancy, which leads to large size and storage costs.

Can not hold arrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which filetypes are used as configuration files?

A

JSON, XML

17
Q

Which filetypes can be used to transfer data from on db to another?

A

CSV, XML, JSON

18
Q

which filetype is often used for logs & reading encrypted files?

A

TEXT

19
Q

What is Serialisation?

A

Serialisation is the process of converting an object into a stream of bytes to be stored in memory, a database, or a file.

An example is storing an object to XML or JSON.

Serialisation is also called marshalling in some contexts.

20
Q

What is the reversed process of Serialisation?

A

Deserialisation (Unmarshalling).

21
Q

Mention some use cases of Serialisation.

A

Sending objects to a remote application via API.

Maintaining security or user-specific information across
applications - bytes.

A method of storing data.

22
Q

Mention 4 serialisers we have used in class.

A

JSON serialiser

Msgpack

Pickle

Protobuf

23
Q

What is characteristic about JSON serialiser?

A

Schema-less - we only need storage and not the representation when they are stored.

Self describing and easy to read by both human & machine.

Supported by most languages.

Standard when transferring data over the web.

24
Q

What is characteristic about Msgpack?

A

Schema-less - we only need storage and not the representation when they are stored.

It lets you exchange data among multiple languages like JSON.

A lot like JSON structure wise, but msgpack is also faster & smaller than JSON.

25
Q

What is characteristic about Pickle?

A

Schema-less - we only need storage and not the representation when they are stored.

Only used in Python

Can serialise most python objects

Relatively fast.

Less secure as it can trigger (if deserialised) malicious code.

26
Q

What is characteristic about Protobuf?

A

Used when serialising structured data, because it has a defined Schema.

It provides type checking.

Good integration with many languages - Not python though.

It involves an IDL (interface description language).

Not Human readable.

27
Q

What is the main purpose of serialisation?

A

To save the state of an object in order to be able to recreate it when needed.