A Cluster Scenario
In this example we create the smallest possible MySQL cluster based on four nodes running on three machines. Node 1 will run the cluster management software, Node 2 and Node 3 will serve as dats nodes and Node 4 is the MySQSL API, which runs on the same VM on Node 1.
+------------------+ +------------------+ +-------------------+ | Node 1 & 4 | | Node 2 | | Node 3 | | IP: 192.168.0.81 | | IP: 192.168.0.82 | | IP: 192.168.0.83 | | ndb_mgm; MySQL | | ndbd | | ndbd | +------------------+ +------------------+ +-------------------+ +------------------------------------------------------------------+ | VirtualBox Host | +------------------------------------------------------------------+
What is a Cluster and When to Use it?
When applications grow, there are several possibilities to improve the performance of the database layer. The two technologies used in most scenarios are replication and clustering.
Replication is used for moving the data from a master database to one or more slave databases. Per default, this data exchange is asynchronously. Semi-synchronous data replication can be implemented, but if you need synchronous data replication on several machines, clusters are the way to go.
A MySQL Cluster implements a shared nothing architecture and supports synchronous replication. You can very easily add and remove nodes to your cluster and scale out horizontally. This is especially true if you need scaling the database writes. Clusers use a special storage engine called NDB. This technology offers high-availability and high-redundancy. The documentation for a MySQL 5.7 cluster can be found here.
In this example, we will use VirtualBox as virtualization platform and run three nodes as the smallest possible cluster. In the first step, we create a template virtual machine, which comes with the basic packages, passwordless keybased authentication and a manually configured IP address. We will clone this machine and adapt it to our needs. Parts of this tutorial are based on the digital ocean howto.
We create the first VM based on Ubuntu 16.05 LTS and install the basics. MySQL requires the libaio1 package, which we will install in the template like this:
sudo apt-get install libaio1
Then it is time to download the MySQL Cluster software, which can be obtained for free from the MySQL developer page. This are the two steps necessary:
# Download wget https://dev.mysql.com/get/Downloads/MySQL-Cluster-7.5/mysql-cluster-gpl-7.5.4-debian7-x86_64.deb # Install sudo dpkg -i mysql-cluster-gpl-7.5.4-debian7-x86_64.deb
Assign a static IP address in the file /etc/network/interfaces .
# The loopback network interface auto lo iface lo inet loopback ## The primary network interface auto enp0s3 iface enp0s3 inet static address 192.168.0.81 netmask 255.255.255.0 gateway 192.168.0.2 dns-nameservers 192.168.0.2,18.104.22.168
Make sure to install the guest additions, add a MySQL user, copy the MySQL startup script etc. Now the template machine is ready. Shut it down and clone it two times. Make sure to assign a new MAC address and then assign a different, static IP address.
Setting Up the Machines
In the following steps, we are going to setup the virtul machines. We run three machines which provide four services.
Install the Manager on Node 1
The cluster will be controlled via Node 1. First, create a data directory, where we will store the configuration files.
mkdir /opt/mysql/server-5.7/mysql-cluster/cluster-data chown -R mysql:mysql /opt/mysql/server-5.7/mysql-cluster/cluster-data
Then create the configuration file. This file defines the cluster and will be read during the startup of the cluster. Each node needs to have a unique id. Usualy, the ID starts with 0, but in this example we defined the node IDs starting from 1 in order to align it with the IP addresses.
root@node1:~# cat /opt/mysql/server-5.7/mysql-cluster/config.ini [ndb_mgmd] NodeId=1 # Management process options: hostname=192.168.0.81 #node1 # Hostname of the manager datadir=/opt/mysql/server-5.7/mysql-cluster/cluster-data # Directory for the log files [ndbd] NodeId=2 hostname=192.168.0.82 #node2 # Hostname of the first data node datadir=/usr/local/mysql/data # Remote directory for the data files [ndbd] NodeId=3 hostname=192.168.0.83 datadir=/usr/local/mysql/data # Remote directory for the data files [mysqld] # SQL node options: NodeId=4 hostname=192.168.0.81
Note that if you make changes to the config.ini files, they might not be loaded directly, but are read from a cache file. So if you change for instance the data directory, make sure to delete all cached files, e.g. ndb_1_config.bin.1*.
Install the MySQL Server on Node1
The installation of the MySQL service is pretty straight forward and follows a similar pattern as a regular MySQL setup. You just need to ensure that you use the Cluster Version which we already installed in the template machine and create symlinks accordingly. The configuration of the MySQL Server is done in /etc/my.cnf , as usual. In the following you can see an example configuration.
root@node1:~# cat /etc/my.cnf [mysqld_safe] socket = /tmp/mysql.sock nice = 0 [mysqld] # # * Basic Settings # user = mysql pid-file = /tmp/mysql.pid socket = /tmp/mysql.sock port = 3306 basedir = /opt/mysql/server-5.7 datadir = /opt/mysql/server-5.7/data tmpdir = /tmp skip-external-locking key_buffer_size = 16M max_allowed_packet = 16M query_cache_limit = 1M query_cache_size = 16M log_error = /var/log/mysql/error.log server-id = 1 expire_logs_days = 10 max_binlog_size = 100M sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES ndbcluster # run NDB storage engine ndb-connectstring=192.168.0.81 [mysql_cluster] # Options for MySQL Cluster processes: ndb-connectstring=192.168.0.81 # location of management server
The most important bits are at the bottom of the file. The command ndbcluster makes the NDB storage engine available and the section [mysql_cluster] points to the cluster management node. In our case, the MySQL server node and the cluster management node are on the same machine. In larger settings it can be benefitial to have a dedicated VM for each task.
Make sure to set the permissions for the data directory correctly and to place the startup script in /etc/init.d/mysql . Make sure to start the MySQL server on Node 1 and check the error.log file for any strange messages.
Setting up the Data Nodes
In the next step, we need to configure the data nodes. Launch Node 2 and login.
# Create the data directory sudo mkdir -p /var/mysql/data # permissions sudo chown -R mysql:mysql /var/mysql
The basic configuration is very simple. Just edit the /etc/my.cnf file and provide the address of the management node.
stefan@node2:~$ cat /etc/my.cnf [mysqld] # Options for mysqld process: ndbcluster [mysql_cluster] ndb-connectstring=192.168.0.81
Then we can start the data node with the following command:
sudo /opt/mysql/server-5.7/bin/ndbd --initial
You should see something like this:
stefan@node3:~$ sudo /opt/mysql/server-5.7/bin/ndbd 2016-11-26 16:39:29 [ndbd] INFO -- Angel connected to '192.168.0.81:1186' 2016-11-26 16:39:29 [ndbd] INFO -- Angel allocated nodeid: 3
If something goes wrong, check the error log file called ndb_X_out.log , where X is the node id. So in this example the actual file name is ndb_2_out.log . Proceed the same way with the second data node called Node 3.
Starting the Cluster
Now comes the great moment. We start the cluster. Please note that you cannot start th cluster before you have configured at least two data nodes and a MySQL API node. All components need to be running, otherwise there will be error messages.
/opt/mysql/server-5.7/bin/ndb_mgmd --reload -f /opt/mysql/server-5.7/mysql-cluster/config.ini
If there are no errors, we can login into the management console for the cluster with the following command.
You can then get an overview of your cluster with the show command. In the ideal case, the output reads as follows:
root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm -- NDB Cluster -- Management Client -- ndb_mgm> show Connected to Management Server at: 192.168.0.81:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @192.168.0.82 (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0, *) id=3 @192.168.0.83 (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.0.81 (mysql-5.7.16 ndb-7.5.4) [mysqld(API)] 1 node(s) id=4 @192.168.0.81 (mysql-5.7.16 ndb-7.5.4)
Connecting to MySQL
Now that our cluster is up and running, we can interact with MySQL, by connecting to the MySQL server running on Node 1. In order to distribute the data automatically on the nodes, we need to utilise the NDB Engine when we create our tables. This is a distributed version of the InnoDB engine and comes with different features. A comparison is given here. Connect to the MySQL instance on Node 1 as you always would.
root@node1:~# mysql -u root -p Enter password:
Then we can create a test table to see of the system works properly. In contrast to replication, the nodes do not store the whole data each, but rather gets the data distributed over at least two nodes. This is also the reason why we need to have at least two nodes.
Sakila in the Cluster
We can use the famous Sakila database that is available from the MySQL web site to test our Cluster briefly. The following series of commands retrieves the database and imports it into our cluster.
# Fetch the data wget http://downloads.mysql.com/docs/sakila-db.tar.gz # Extract tar xvfz sakila-db.tar.gz cd sakila-db/ # Change the default engine with the cluster engine sed -i 's/engine=InnoDB/ndbcluster/g' sakila-schema.sql # Import schema and data mysql -u root -p < sakila-schema.sql mysql -u root -p sakila < sakila-data.sql
Now the data resides in our cluster and is distributed via our two nodes. We can run the following sample query and the system transparently gets the data from the two nodes.
USE sakila; SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer, address.phone, film.title FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id INNER JOIN address ON customer.address_id = address.address_id INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id INNER JOIN film ON inventory.film_id = film.film_id WHERE rental.return_date IS NULL AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE(); ... 183 rows in set (0.00 sec)
If you try to start the cluster before all components are up and ready, finding the error can be a bit tricky. In the following example, we did not start the data nodes. The MySQL API node is running, but it does not show up.
root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm -- NDB Cluster -- Management Client -- ndb_mgm> show Connected to Management Server at: 192.168.0.81:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 (not connected, accepting connect from 192.168.0.82) id=3 (not connected, accepting connect from 192.168.0.83) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.0.81 (mysql-5.7.16 ndb-7.5.4) [mysqld(API)] 1 node(s) id=4 (not connected, accepting connect from 192.168.0.81)
Check the error log of the management cluster in that case. As long as the cluster does not consist of at least two data nodes and a MySQL API node, it will not start.