Describe features of structured, semi-structured and unstructured data Flashcards
Structured Data
Structured data is data that adheres to a fixed schema, so all of the data has the same fields or properties. Most commonly, the schema for structured data entities is tabular - in other words, the data is represented in one or more tables that consist of rows to represent each instance of a data entity, and columns to represent attributes of the entity.
-It is typically found in relational databases.
-Examples include data in SQL databases, Excel spreadsheets, or CSV files.
-Structured data is easy to query and analyze because of its organized nature.
-It adheres to a schema, defining the data types and relationships.
Use cases:
-Ideal for traditional database applications.
-Well-suited for scenarios requiring strict data integrity and relationships.
Azure Services: Azure SQL Database, Azure Database for MySQL, Azure Database for PostgreSQL.
Semi-structured Data
Semi-structured data is information that has some structure, but which allows for some variation between entity instances. For example, while most customers may have an email address, some might have multiple email addresses, and some might have none at all.
-Common formats include JSON (JavaScript Object Notation) and XML (eXtensible Markup Language).
-Semi-structured data is more flexible than structured data and allows for changes without a predefined schema.
Use cases:
-Commonly used in web development for exchanging data between a server and a web application.
-Suitable for scenarios where the data structure may evolve over time.
Azure Services: Azure Cosmos DB (supports JSON), Azure Blob Storage (can store JSON, XML, etc.).
Unstructured Data
Not all data is structured or even semi-structured. For example, documents, images, audio and video data, and binary files might not have a specific structure. This kind of data is referred to as unstructured data.
-Unstructured data is more challenging to analyze directly because it lacks a clear structure.
-Natural language processing (NLP) and machine learning techniques are often used to extract insights from unstructured data.
Use cases:
-Valuable for sentiment analysis, image recognition, and other AI-driven applications.
-Widely used in big data analytics for extracting insights from diverse sources.
Azure Services: Azure Blob Storage (for images, videos, documents), Azure Cognitive Services (for processing unstructured data like images and text).
Data Stores
Organizations typically store data in structured, semi-structured, or unstructured format to record details of entities (for example, customers and products), specific events (such as sales transactions), or other information in documents, images, and other formats. The stored data can then be retrieved for analysis and reporting later.
There are two broad categories of data store in common use: File stores and Databases.