Database Specialty - Database Migration, DMS and SCT Flashcards
1
Q
Why database migration? (Use cases)
A
- Move from legacy to cloud
- Switch the database solution
- Replicate databases across regions
- Replicate to streaming platforms
- In place DB upgrades
- Archive data to S3
2
Q
Migration scenarios
A
- On-premise à AWS cloud
- Relational à Non-relational
- DBs hosted on EC2 à Managed AWS services
3
Q
Migration tools
A
- Native tools (e.g. mysqldump)
- SCT (Schema Conversion Tool)
- AWS DMS (Database Migration Service)
- Migration Playbooks (migration templates + best practices)
4
Q
DMS – Database Migration Service
A
- Quickly and securely migrate databases to AWS cloud, resilient, self-healing
- The source database remains available during the migration
- Minimal to zero-downtime
- Costs as low as $3 / TB
- Supports homogeneous and heterogeneous migrations
- Continuous data replication using CDC (Change data capture)
5
Q
DMS architecture
A
- Endpoints include DB connection and credentials information
- Replication instance is an EC2 instance running in a VPC, can be Multi-AZ
- Replication task specifies actual data tables to migrate and data transformation rules
6
Q
DMS – Database Migration Service > Points
A
- DMS can create target tables / primary keys if they don’t exist on the target
- Tables/keys can also be created manually
- Creates only those objects required that are required for efficient migration (e.g. primary keys / unique indexes etc.) –This is called as Basic Schema Copy
- Remaining DB elements/schema can be migrated using other tools (e.g. secondary indexes, FK constraints, or data defaults)
- e.g. use SCT or native schema export tools
7
Q
SCT – AWS Schema Conversion Tool
A
- Converts DB/DW schema from source to target (including procedures /
views / secondary indexes / FK and constraints) - Mainly for heterogeneous DB migrations and DW migrations
- For homogeneous migrations, can use it for migration assessment
- Can be installed on EC2 or locally on your computer (closer to source DB)
- Application conversion – can convert SQL statements embedded in the
application code - Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
scripts à Aurora / Redshift scripts) - Can optimize schemas in Redshift (recommends distribution and sort keys,
different from native Redshift advisor)
8
Q
SCT – AWS Schema Conversion Tool cont
A
- Converts DB/DW schema from source to target (including procedures /
views / secondary indexes / FK and constraints) - Mainly for heterogeneous DB migrations and DW migrations
- For homogeneous migrations, can use it for migration assessment
- Can be installed on EC2 or locally on your computer (closer to source DB)
- Application conversion – can convert SQL statements embedded in the
application code - Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
scripts à Aurora / Redshift scripts) - Can optimize schemas in Redshift (recommends distribution and sort keys,
different from native Redshift advisor)
9
Q
SCT – AWS Schema Conversion Tool Provides
A
- Provides assessment report
- Granular report showing which objects can be converted automatically and which
need manual intervention (with color codes) - Can be used to create a business case for migration
- Identifies issues/limitations / actions for schema conversion
- Granular report showing which objects can be converted automatically and which
- For objects that cannot be created manually, SCT
provides guidance to help you create the equivalent
schema manually
10
Q
Workload Qualification Framework (WQF)
A
- Standalone app that is included with SCT
- For workload assessment
- Qualifies for OLTP workloads based on
- Ease/complexity of migration
- Estimated time/effort required
- Recommends migration strategy and tools
- Integrated with SCT and DMS
- Just provide DB connection strings and WQF can generate the assessment reports
- Available as EC2 AMI
11
Q
DMS tasks (replication tasks)
A
- Lets you specify migration options
- Table mapping (transformation rules)
- Filters (to migrate data selectively)
- Migration type – full load, CDC, or both
- Supports data validation
- Monitoring options
- task status
- task’s control table
- Allows reloading table data (in case of errors)
12
Q
DMS task assessment reports
A
- For pre-migration assessment (optional, recommended)
- To run an assessment, a task must be in a Stopped state
- Lets you know potential issues that might occur during a given migration
- Includes JSON output containing summary and details of unsupported data
types
13
Q
DMS migration types - Full Load
A
- Migrate existing data
- performs a one-time migration from source to target
14
Q
DMS migration types - CDC Only
A
- Ongoing replication
- Only replicates data changes from source to target without migrating
existing data - CDC = Change data capture
15
Q
DMS migration types - Full Load + CDC
A
- Migrate existing data and replicate ongoing changes
- First, performs a one- time migration from source to target
- Then, continues replicating data changes
16
Q
DMS – Good things to know
A
- Recommended to create only the primary keys before full load
- Secondary keys and FKs should be created only after full load is complete
- In full load, multiple tables are loaded in parallel and this can
create issues if you have FKs (hence create them later) - Secondary keys can slow down the full load operation (hence create them later)
- Enable Multi-AZ for ongoing replication (for high availability and failover support)
- DMS can read/write from/to encrypted DBs
17
Q
Migrating Large Tables
A
- Break the migration into multiple tasks
- Use row filtering on a key or partition key to create multiple tasks
- Example – if you have integer primary key from 1 to 8000,000
- Create 8 tasks using row filtering to migrate 1000,000 records each
- Example 2 – if you have a date field as primary key
- Partition the data by month using row filtering
- Use full load tasks to migrate data of previous months
- Use full load + CDC to migrate current month data
18
Q
DMS Migrating LOBs / CLOBs
A
- LOB = Large Binary Object
- CLOB = Character Large Object
- DMS migrates LOB data in two phases
- creates a new row in the target table and populates all data except the LOB data
- updates the row with the LOB data
- LOB options
- Don’t include LOB columns – LOB data is ignored
- Full LOB mode – migrates all LOB data, piecewise in chunks (you provide LOB
chunk size) - Limited LOB mode – truncates each LOB to Max LOB size (is faster)
19
Q
Best practices for handling LOBs with DMS
A
- Full LOB mode performance is slowest, but data is not truncated
- For LOBs < few MBs
- Use Limited LOB mode with Max LOB size = largest LOB size in your DB
- Allocate enough memory to the replication instance
- For LOBs > few MBs
- Create a separate DMS task with Full LOB mode
- Ideally, separate task on a new replication instance
- LOB chunk size should allow DMS to capture most LOBs in as few chunks as possible
- Inline LOB mode
- Combines the advantages of both modes (full LOB and limited LOB mode)
- Migrate without truncating the data or slowing the task’s performance
- You specify InlineLobMaxSize (Full LOB mode must be set to true)
- Small LOBs are transferred inline, large LOBs by using source table lookup
- Supported only during full load (not during CDC)
20
Q
SCT Extractors (=DW migration)
A
- SCT extractors are migration agents installed locally (or on EC2)
- Extracts data from the source DW in parallel
- Supports encryption
- Data is optimized for Redshift and stored in local files
- Files are then loaded into S3 (using N/W or Snowball Edge)
- Use the COPY command to load data from S3 into Redshift
- If you are using Oracle or SQL Server, you can use DMS to keep
your DBs in sync with Redshift / or target engine
21
Q
Migration playbooks
A
- Series of step-by-step guides published by AWS
- Provide best practices and blueprints for different heterogeneous migrations
- https://aws.amazon.com/dms/resources/
22
Q
Service Substitutions
A
- Service Substitutions are highlighted in playbooks
- SQL server – DB mail to send email based on certain events
- This feature is not available in open source engines
- We can use Lambda functions through Aurora MySQL along with SNS to
emulate the email functionality
- Similar substitutions are possible with queues, file management etc.
- Can use AWS glue to manage ETL pipelines
- SCT can convert your Teradata / Oracle scripts and move them
into AWS Glue by auto-generating necessary python code
23
Q
Monitoring DMS
A
- Can monitor task progress by:
- checking the task status
- using the task’s control table
- or with CloudWatch
- DMS Task Logs and service metrics/statistics are provided by CloudWatch
- Task monitoring tab shows CloudWatch metrics
- Table statistics tab shows statistics for each table
- Can subscribe to event notifications (uses SNS)
- API calls are logged in CloudTrail