Foundations - Roles and Concepts Flashcards
what are the issues organization will have
Data Processing in SILOS
Excessive Data movement
Data Duplication
Data Engineer role
Raw data into valuable insights
Design, Develop and Maintain data architectures ad (ETL)
Getting Data from sources, Making it useful and Serving to stakeholders
What are the Key functions of DE
Build and Maintain Data Infrastructure
Ingest data from Various sources
Prepare ingested data for analytics
Catalog and document curated datasets
Automate regular data flows
Ensure Data Quality, Security and Compliance
Build and Maintain Data Infrastructure
Setting up databases, data lakes, and data warehouses on AWS services like Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon Redshift, among others
Ingest data from Various sources
Use tools like AWS Glue jobs or AWS Lambda functions to ingest data from databases, applications, files, and streaming devices into the centralized data platforms.
Prepare ingested data for analytics
Use technologies like AWS Glue, Apache Spark, or Amazon EMR to prepare data by cleaning, transforming, and enriching it.
Catalog and document curated datasets
Use AWS Glue crawlers to determine the format and schema, group data into tables, and write metadata to the AWS Glue Data Catalog. Use metadata tagging in Data Catalog for data governance and discoverability
Automate regular data flows and Pipelines
Simplify and accelerate data processing using services like AWS Glue workflows, AWS Lambda, or AWS Step Functions.
Ensure Data Quality, Security and Compliance
Create access controls, establish authorization policies, and build monitoring processes. Use Amazon DataZone or AWS Lake Formation to manage and govern access to data using fine-grained controls. These controls help ensure access with the right level of privileges and context
Chief data officer CDO - Responsibility
Builds a culture of using data to solve problems and accelerate innovation
Chief Data Office - Area of Interest
Data quality, data governance, data and artificial intelligence (AI) strategy, evangelizing the value of data to the business
Data architect - Responsibility
Driven to architect technical solutions to meet business needs, focuses on solving complex data challenges to help the CDO deliver on their vision
Data Architect - Area of Interest
Data pipeline, data processing, data integration, data governance, and data catalogs
Data engineer - Responsibility
Delivers usable, accurate datasets to the organization in a secure and high-performing manner
Data Engineer - Area of Interest
The variety of tools used for building data pipelines, ease of use, configuration, and maintenance