Foundations - Roles and Concepts Flashcards
what are the issues organization will have
Data Processing in SILOS
Excessive Data movement
Data Duplication
Data Engineer role
Raw data into valuable insights
Design, Develop and Maintain data architectures ad (ETL)
Getting Data from sources, Making it useful and Serving to stakeholders
What are the Key functions of DE
Build and Maintain Data Infrastructure
Ingest data from Various sources
Prepare ingested data for analytics
Catalog and document curated datasets
Automate regular data flows
Ensure Data Quality, Security and Compliance
Build and Maintain Data Infrastructure
Setting up databases, data lakes, and data warehouses on AWS services like Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon Redshift, among others
Ingest data from Various sources
Use tools like AWS Glue jobs or AWS Lambda functions to ingest data from databases, applications, files, and streaming devices into the centralized data platforms.
Prepare ingested data for analytics
Use technologies like AWS Glue, Apache Spark, or Amazon EMR to prepare data by cleaning, transforming, and enriching it.
Catalog and document curated datasets
Use AWS Glue crawlers to determine the format and schema, group data into tables, and write metadata to the AWS Glue Data Catalog. Use metadata tagging in Data Catalog for data governance and discoverability
Automate regular data flows and Pipelines
Simplify and accelerate data processing using services like AWS Glue workflows, AWS Lambda, or AWS Step Functions.
Ensure Data Quality, Security and Compliance
Create access controls, establish authorization policies, and build monitoring processes. Use Amazon DataZone or AWS Lake Formation to manage and govern access to data using fine-grained controls. These controls help ensure access with the right level of privileges and context
Chief data officer CDO - Responsibility
Builds a culture of using data to solve problems and accelerate innovation
Chief Data Office - Area of Interest
Data quality, data governance, data and artificial intelligence (AI) strategy, evangelizing the value of data to the business
Data architect - Responsibility
Driven to architect technical solutions to meet business needs, focuses on solving complex data challenges to help the CDO deliver on their vision
Data Architect - Area of Interest
Data pipeline, data processing, data integration, data governance, and data catalogs
Data engineer - Responsibility
Delivers usable, accurate datasets to the organization in a secure and high-performing manner
Data Engineer - Area of Interest
The variety of tools used for building data pipelines, ease of use, configuration, and maintenance
Data security officer - Responsibility
Ensures that data security, privacy, and governance are strictly defined and adhered to
Data Security Officer - Area of Interest
Keeping information secure, complying with data privacy regulations, protecting personally identifiable information (PII), and applying fine-grained access controls and data masking
Data scientist - Responsibility
Constructs the means for quickly extracting business-focused insight from data for the business to make better decisions
Data Scientist - Area of Interest
Tools that simplify data manipulation and provide deeper insight than visualization tools and tools that help build the machine learning (ML) pipeline
Data analyst - Responsibility
Reacts to market conditions in real time, must have the ability to find data and perform analytics quickly and easily
Data Analyst - Area of Interest
Querying data and performing analysis to create new business insights and producing reports and visualizations that explain the business insights