Transcript
What I learned while migrating MySQL on-premises to Amazon RDS
Michael Coburn Wednesday, October 14th, 2015
Who am I? • Senior Architect with Percona • Canadian but I now live in Costa Rica • I work with 6 named customers who use a mix of AWS & on-premises instances
2
Use Case for this presentation • To deliver a minimal-downtime migration from onpremise to RDS • Just long enough to cleanly disconnect all connections from on-premise, and then start them up against RDS • Achieved by setting RDS Primary as slave to on-premise Primary • Bring RDS current in replication • Migrate application connections over
3
What’s there to love about RDS? • You can scale your CPU, IOPS, and storage space independently • Provisioned IOPS can be scaled on-the-fly (now that’s hot) • Backups, software patching, failure detection, and (some) recovery is automated for you • DB event notifications via Amazon SNS • It is trivial to set up read replicas (slaves) • You can set up a secondary synchronous instance for failover of your primary master (think DRBD) — this is called Multi-AZ Deployment and is the default • You can choose from: MySQL Community, and now MariaDB! • And all the other great stuff from AWS: VPC, Encryption, firewalls, etc
4
What are the biggest surprises? • You do not have shell access to your database • No more gdb, strace, oprofile • No solution here • Tailing the error and slow logs, pt-diskstats, etc • But much of this is provided for in API calls or through CloudWatch metrics • No SUPER privilege • But in most cases RDS provides a stored procedure to overcome missing functionality
5
Only one SQL interface to the database • That’s right, just one — mysql client • no Percona XtraBackup, gdb, oprofile, pt-diskstats, etc • So how do you load your data? • mysqldump • mydumper • Changes to configuration variables • innodb_flush_log_at_trx_commit=0|2 • Consider higher Provisioned IOPS during loading
6
RDS timezone • Is your database already in UTC? Good, nothing to see here :) • Check for system_time_zone=UTC • If not.. watch for the following conditions: 1. RDS as slave and binlog_format=STATEMENT 2. Application not explicitly setting timezone at session level • TIMESTAMP fields will be wrong in RDS! • Advice: always have your databases running in UTC • deal with timezone localisation at the presentation layer in your application
7
Smaller max_allowed_packet than default • Did your expected 4-day load cycle just fail 3 1/2 days in with “packet too big”? • You just got bit by max_allowed_packet=1M (default is 16M) • If you are using mysqldump then the default is extended-insert format • This is optimal loading pattern for MySQL since it packs in many values into one INSERT • But can also mean you have very long lines == large packets • Advice: set global max_allowed_packet=16M;
8
No SUPER Privilege • Monitoring tools • Use of max_connections + 1 • Admin commands • CHANGE MASTER • KILL • PURGE BINARY LOGS • Breaks commonly used tools • pt-table-sync tries to set binlog_format on slaves
9
Triggers and Stored Procedures • Remember to update the SQL DEFINER entry • Exists so that a user with SUPER can deploy TRIGGER/SP and run as another user • Easiest is to set DEFINER=CURRENT_USER() • But this might not map well to your security model
10
Think about using —force • Consider that it isn’t trivial to restart a failed database load into RDS, so… • Think about using the —force option to mysql client, which makes the client skip errors and keep running • Maybe you can clean it up later with pt-tablechecksum and pt-table-sync • Or re-load just this one error-ing table • Nothing causes more frustration than a failure when the load is 95% complete
11
Splitting your dump up • Firstly, learn and love mydumper • parallel dump and load • file per table on disk (gzip) • schema separate from data • If you still use mysqldump • At least dump schema separate from data • These tasks will simplify troubleshooting later!
12
Skipping events is SLOW • RDS provides rds_skip_repl_error() but it can take 2-3s to execute • Takes no argument, so you are skipping only one event at a time • pt-table-checksum will give you scope of the data differences so you can decide whether to fix with pt-table-sync, or reload
13
pt-table-sync is slow with RDS • This isn’t RDS’ fault (at least, I don’t think so), probably Percona Toolkit’s • It does work, but is very slow • Generally I would not use pt-tablesync if my data difference was > 20% or rows different > 10mil
14
innodb_log_file_size cannot be changed • innodb_log_file_size is hardcoded to 128MB • So you are capped at 256MB • This can be too low for some writeintensive workloads, leading to stalls • UPDATE: NOW YOU CAN CHANGE IT!
15
CHANGE MASTER doesn't exist in RDS • Instead of CHANGE MASTER, you call a function rds_set_external_master()
16
New: Amazon Aurora! • Compatible with MySQL 5.6 • Claims 5x faster than MySQL 5.6 • Possibly eliminated doublewrite buffer, changes to InnoDB log files • Improved durability & availability • Possibly object store at database page level • Automatically replicated into 3 AZs • Scalable to 32 cores, 244GB RAM, 64TB disk • disk is auto-scaled with no downtime • VPC-supported • Inexpensive • vs Commercial databases
17
Thank you! • Questions? send them to:
[email protected] • Also: The call for papers for PLMCE 2016 will open THIS WEEK, so please check our official conference website often for details about submitting your speaker proposals. • PLMCE 2016 Information • You can also join our mailing list to stay informed about conference updates. • PLMCE 2016 Mailing List
18