08 - External data representation Flashcards
Explain the difference between internal and transferred data representation.
- Internal data is represented as data structures, arrays, or objects in programming languages like C, Java, or Python.
- Transferred data is represented as byte sequences for transmission, that must be flatten.
What are the key aspects of data transmission format?
- Agreement on a common format for the transmitted data.
- Conversion of data by the transmitter and receiver.
- Data is transmitted in the sender’s format and converted by the receiver.
What is the process of data serialization and deserialization in the context of data transfer and its requirements?
- Serialization involves converting data structures into a format that can be stored or transmitted.
- Deserialization is the reverse process, where the received or stored data is converted back into usable data structures.
- Requires correciton, efficiency, interoperability and ease to use
Discuss the limitations of CORBA, Java Object Serialization, DCOM COM+, JSON, Plain Text, and XML in data representation.
- CORBA: Over-designed and heavyweight.
- Java Object Serialization: Tailored to the Java environment.
- DCOM COM+: Tailored to Windows environment.
- JSON, Plain Text, XML: Lack * protocol description, high parsing overhead, and programmer maintenance required.
Explain the concept of endianness in data representation.
- Endianness refers to the order of bytes in a multi-byte number.
- Big-endian stores higher-order bytes at lower memory addresses.
- Little-endian stores lower-order bytes at lower memory addresses.
- In networking, addresses are always big-endian.
What challenges arise in data representation due to heterogeneity in communication?
- Different data structures like alignment, data size, and pointers vary across machines.
- Alignment on word boundaries can change the size of a structure between machines.
- Pointers, while convenient, have no meaning outside their defining machine
How should data be represented for efficient communication?
- Decide on the data types to support: base types, flat types, complex types.
- Determine encoding methods for data transmission and decoding methods for data reception.
- Use self-describing (tags) or implicit descriptions (end knowledge) for data encoding.
What is the role of stub generation in data representation?
- Systems generate stub code from an independent specification (IDL).
- IDL (Interface Description Language) describes an interface in a language-neutral way.
- Stub generation separates logical data description from dispatching, marshaling/unmarshaling, and data wire format.
Describe the key features of XDR’s approach to standardizing data representations.
- XDR defines a single byte order (Big Endian) and floating-point representation (IEEE).
- It decouples programs creating/sending portable data from those using/receiving it.
- New machines or languages don’t affect existing programs; they just need to convert between standard and local representations.
What are the canonical data types defined by XDR?
- Basic data: Integer, Unsigned Integer, (Hyper) Integer, Floating-Point, Void.
- Variable size data: Fixed-Length Opaque Data, Variable-Length Opaque Data, String.
- Composed data: Fixed-Length Array, Variable-Length Array, Structure, Discriminated Union.
What is the function of the XDR Library in data representation?
- XDR library solves data portability problems by transforming data to/from a canonical format.
- It allows reading and writing arbitrary C constructs in a consistent and well-documented manner.
- The library consists of functions for encoding/decoding data, based on defined data structures, known as filters.
Describe the types of data filters provided by XDR.
- Basic Data Filters: Handle types like char, short, int, long, float, double, and void.
- Variable Size Data Filters: For handling fixed-length opaque data, variable-length opaque data, and strings.
- Composed Data Filters: Used for fixed-length arrays, variable-length arrays, structures, and discriminated unions.
- Pointer Filters: Manage data structures involving pointers, with functions for transformation and memory allocation.
How is XDR used in data transfer?
- XDR data structures don’t contain metadata, so type determination from binary data is impossible.
- Client and server must agree on the format of transferred data.
- XDR files from RPC (Remote Procedure Call) have compatible encoder/decoder.
- The same code is used for both encoding and decoding data, and functions must be called in the same order.
What are the main features of Google Protocol Buffers?
- Defined by Google and widely used internally and externally.
- Supports common types and service definitions.
- Natively generates C++, Java, and Python code, with over 20 other languages supported by third parties.
- Efficient binary encoding and readable text encoding, significantly smaller and faster to process than XML.
- It’s not a full RPC system but handles marshalling, with many third-party RPC implementations available.
What properties characterize Google Protocol Buffers?
- Efficient binary serialization.
- Support for protocol evolution, allowing addition of new parameters.
- The order of specified parameters is not important, and non-essential parameters can be skipped.
- Supports somewhat complex structures and provides compile-time error checking.
- Used for RPC calls, serializing data to non-relational databases, and as a long-term storage format due to its backward compatibility.