08 - External data representation Flashcards
Explain the difference between internal and transferred data representation.
- Internal data is represented as data structures, arrays, or objects in programming languages like C, Java, or Python.
- Transferred data is represented as byte sequences for transmission, that must be flatten.
What are the key aspects of data transmission format?
- Agreement on a common format for the transmitted data.
- Conversion of data by the transmitter and receiver.
- Data is transmitted in the sender’s format and converted by the receiver.
What is the process of data serialization and deserialization in the context of data transfer and its requirements?
- Serialization involves converting data structures into a format that can be stored or transmitted.
- Deserialization is the reverse process, where the received or stored data is converted back into usable data structures.
- Requires correciton, efficiency, interoperability and ease to use
Discuss the limitations of CORBA, Java Object Serialization, DCOM COM+, JSON, Plain Text, and XML in data representation.
- CORBA: Over-designed and heavyweight.
- Java Object Serialization: Tailored to the Java environment.
- DCOM COM+: Tailored to Windows environment.
- JSON, Plain Text, XML: Lack * protocol description, high parsing overhead, and programmer maintenance required.
Explain the concept of endianness in data representation.
- Endianness refers to the order of bytes in a multi-byte number.
- Big-endian stores higher-order bytes at lower memory addresses.
- Little-endian stores lower-order bytes at lower memory addresses.
- In networking, addresses are always big-endian.
What challenges arise in data representation due to heterogeneity in communication?
- Different data structures like alignment, data size, and pointers vary across machines.
- Alignment on word boundaries can change the size of a structure between machines.
- Pointers, while convenient, have no meaning outside their defining machine
How should data be represented for efficient communication?
- Decide on the data types to support: base types, flat types, complex types.
- Determine encoding methods for data transmission and decoding methods for data reception.
- Use self-describing (tags) or implicit descriptions (end knowledge) for data encoding.
What is the role of stub generation in data representation?
- Systems generate stub code from an independent specification (IDL).
- IDL (Interface Description Language) describes an interface in a language-neutral way.
- Stub generation separates logical data description from dispatching, marshaling/unmarshaling, and data wire format.
Describe the key features of XDR’s approach to standardizing data representations.
- XDR defines a single byte order (Big Endian) and floating-point representation (IEEE).
- It decouples programs creating/sending portable data from those using/receiving it.
- New machines or languages don’t affect existing programs; they just need to convert between standard and local representations.
What are the canonical data types defined by XDR?
- Basic data: Integer, Unsigned Integer, (Hyper) Integer, Floating-Point, Void.
- Variable size data: Fixed-Length Opaque Data, Variable-Length Opaque Data, String.
- Composed data: Fixed-Length Array, Variable-Length Array, Structure, Discriminated Union.
What is the function of the XDR Library in data representation?
- XDR library solves data portability problems by transforming data to/from a canonical format.
- It allows reading and writing arbitrary C constructs in a consistent and well-documented manner.
- The library consists of functions for encoding/decoding data, based on defined data structures, known as filters.
Describe the types of data filters provided by XDR.
- Basic Data Filters: Handle types like char, short, int, long, float, double, and void.
- Variable Size Data Filters: For handling fixed-length opaque data, variable-length opaque data, and strings.
- Composed Data Filters: Used for fixed-length arrays, variable-length arrays, structures, and discriminated unions.
- Pointer Filters: Manage data structures involving pointers, with functions for transformation and memory allocation.
How is XDR used in data transfer?
- XDR data structures don’t contain metadata, so type determination from binary data is impossible.
- Client and server must agree on the format of transferred data.
- XDR files from RPC (Remote Procedure Call) have compatible encoder/decoder.
- The same code is used for both encoding and decoding data, and functions must be called in the same order.
What are the main features of Google Protocol Buffers?
- Defined by Google and widely used internally and externally.
- Supports common types and service definitions.
- Natively generates C++, Java, and Python code, with over 20 other languages supported by third parties.
- Efficient binary encoding and readable text encoding, significantly smaller and faster to process than XML.
- It’s not a full RPC system but handles marshalling, with many third-party RPC implementations available.
What properties characterize Google Protocol Buffers?
- Efficient binary serialization.
- Support for protocol evolution, allowing addition of new parameters.
- The order of specified parameters is not important, and non-essential parameters can be skipped.
- Supports somewhat complex structures and provides compile-time error checking.
- Used for RPC calls, serializing data to non-relational databases, and as a long-term storage format due to its backward compatibility.
What is the primary goal of Protocol Buffers?
- Provide a language- and platform-neutral way to specify and serialize data.
- Ensure the serialization process is efficient, extensible, and simple to use.
- Allow serialized data to be stored or transmitted over the network.
Describe the features of the Protocol Buffer Language.
Messages contain uniquely numbered fields.
Fields are represented by field-type, data-type, field-name, encoding-value, and optional default value.
* Supports primitive, enumerated, and nested message data-types.
* Enables structuring data into a hierarchy.
What are the different field types in Protocol Buffers?
- Required fields: Must be present exactly once in a well-formed message.
- Optional fields: Can appear zero or one time in a well-formed message.
- Repeated fields: Can appear any number of times (including zero) in a well-formed message.
What is the role of a .proto file in Protocol Buffers?
- The .proto file contains the specification of the message.
- It is compiled by the protoc tool, which generates code allowing programmers to manipulate the message type.
Explain the function of protoc-c in Protocol Buffers.
- It defines the .proto file following the language syntax.
message M1{ required string str = 1; optional int32 i = 2; }
- Generates .h and .c files, including structures and functions for manipulating messages.
- Provides functions like
m1_init
,m1_get_packed_size
,m1_pack
,m1_unpack
, andm1_free_unpacked
.
What are the field rules in Protoc-c?
- Required fields must appear exactly once in a well-formed message.
- Optional fields may appear zero or one time, with a boolean flag indicating their presence.
- Repeated fields can occur multiple times in a message, and their count is tracked.
How are optional fields handled in Protoc-c?
- Optional fields can have default values defined in the .proto file.
- If no value is assigned, the field takes a type-specific default value.
- The C structure includes a boolean to indicate if an optional field is transmitted.
Explain the handling of repeated fields in Protoc-c.
- Repeated fields represent arrays or lists in a message.
- The C structure includes a count of the number of elements and a pointer to the array.
- Memory management for repeated fields must be handled manually in C.
Describe the use of enumerated fields in Protoc-c.
- Enumerated fields allow a field to have one of a predefined list of values.
- Enums are defined in the .proto file and translated into C enums.
- Fields can be defined to use these enumerated types, providing a set of allowed values.
How are message types used in Protoc-c?
- Message types are created for each specific use case, like RPC calls or data storage.
- Each message type has its own structure and set of fields.
- Fields can be required, optional, or repeated, and are manipulated using generated C code.