Jupyter docker stacks with a custom user

Jupyter allows to set a custom user instead of jovyan which is the default for all containers of the Jupyter Docker Stack. You need to change this user or its UID and GID in order to get the permissions right when you mount a volume from the host into the Jupyter container. The following steps are required:

  1. Create an unprivileged user and an asociated group on the host. Here we call the user and the group docker_worker
  2. Add your host user to the group. This gives you the permissions to modify and read the files also on the host. This is useful if your working directory on the hist is under source code control (eg. git)
  3. Launch the container with the correct settings that change the user inside the container

It is important to know that during the launch the container needs root privileges in order to change the settings in the mounted host volume and inside the container. After the permissions have been changed, the user is switched back and does not run with root privileges, but your new user. Thus make sure to secure your Docker service, as the permissions inside the container also apply to the host.

Prepare an unprivileged user on the host

1. sudo groupadd -g 1011 docker_worker
2. sudo useradd -s /bin/false -u 1010 -g 1020 docker_worker
3. Add your user to the group: sudo usermod -a -G docker_worker stefan

Docker-compose Caveats

It is important to know that docker-compose supports either an array or a dictionary for environment variables (docs).  In the case below we use arrays and we quote all variables. If you accidentally use a dictionary, then the quotes would be passed along to the Jupyter script. You would then see this error message: 

/usr/local/bin/start-notebook.sh: ignoring /usr/local/bin/start-notebook.d/*
Set username to: docker_worker
Changing ownership of /home/docker_worker to 1010:1020
chown: invalid user: ‘'-R'’

The docker-compose file

version: '2'
services:
    datascience-notebook:
        image: jupyter/base-notebook:latest
        volumes:
            - /tmp/jupyter_test_dir:/home/docker_worker/work            
        ports:
            - 8891:8888
        command: "start-notebook.sh"
        user: root
        environment:
          NB_USER: 'docker_worker'
          NB_UID: 1010
          NB_GID: 1020
          CHOWN_HOME: 'yes'
          CHOWN_HOME_OPTS: -R

Here you can see that we set the variables that cause the container to ditch jovyan in favor of docker_worker.

NB_USER: ‘docker_worker’
NB_UID: 1010
NB_GID: 1020
CHOWN_HOME: ‘yes’
CHOWN_HOME_OPTS: -R

This facilitates easy version control of the working directory of Jupyter. I also added the snipped to my Github Jupyter template.

Continue reading


Verifying Replication Consistency with Percona’s pt-table-checksum

Replication is an important concept for improving database performance and security. In this blog post, I would like to demonstrate how the consistency between a MySQL master and a slave can be verified. We will create two Docker containers, one for the master one for the slave.

Installing the Percona Toolkit

The Percona Toolkit is a collection of useful utilities, which can be obained for free from the company’s portal. The following commands install the prerequisits, download the package and eventually the package.

sudo apt-get install -y wget libdbi-perl libdbd-mysql-perl libterm-readkey-perl libio-socket-ssl-perl
wget https://www.percona.com/downloads/percona-toolkit/3.0.4/binary/debian/xenial/x86_64/\
    percona-toolkit_3.0.4-1.xenial_amd64.deb
sudo dpkg -i percona-toolkit_3.0.4-1.xenial_amd64.deb 

Setting up a Test Environment with Docker

The following command creates and starts a docker container. Note that these are minimal examples and are not suitable for a serious environment.

docker run --name mysql_master -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --log-bin \
   --binlog-format=ROW --server-id=1

Get the IP address from the master container:

# Get the IP of the master 
docker inspect mysql_master | grep IPAddress 

"SecondaryIPAddresses": null, 
    "IPAddress": "172.17.0.2"

You can connect to this container like this and verify the server id:

stefan@Lenovo ~/Docker-Projects $ mysql -u root -h 172.17.0.2
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 1     |
+---------------+-------+
1 row in set (0,00 sec)

We repeat the command for the slave, but use a different id. port and name:

docker run --name mysql_slave -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --server-id=2

For simplicity, we did not use Docker links, but will rather use IP addresses assigned by Docker directly.

Replication Setup

First, we need to setup a user with replication privileges. This user will connect from the slave to the master.

# On the host, interact with the master container
## Get the IP address of the slave container
$ docker inspect mysql_slave | grep IPAddress
            "SecondaryIPAddresses": null,
            "IPAddress": "172.17.0.3",
                    "IPAddress": "172.17.0.3",

## Login to the MySQL console of the master
### Grant permissions
GRANT REPLICATION SLAVE ON *.* TO `replication`@'172.17.0.3' IDENTIFIED BY 'SLAVE-SECRET';
### Get the current binlog position
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| mysqld-bin.000002 | 346 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0,00 sec)

Now log into the slave container and add the connection details for the master:

## Connect to the MySQL Slave instance
$ mysql -u root -h 172.17.0.3

### Setup the slave

mysql> CHANGE MASTER TO   
  MASTER_HOST='172.17.0.2',
  MASTER_PORT=3306,
  MASTER_USER='replication', 
  MASTER_PASSWORD='SLAVE-SECRET',
  MASTER_LOG_FILE='mysqld-bin.000002', 
  MASTER_LOG_POS=346;
Query OK, 0 rows affected, 2 warnings (0,05 sec)

### Start and check
mysql>   start slave;
Query OK, 0 rows affected (0,01 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.17.0.2
                  Master_User: percona
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000002
          Read_Master_Log_Pos: 346
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 284
        Relay_Master_Log_File: mysqld-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Now our simple slave setup is running.

Get some test data

Lets download the Sakila test database and import it into the master. It will be replicated immediately.

wget http://downloads.mysql.com/docs/sakila-db.tar.gz
~/Docker-Projects $ tar xvfz sakila-db.tar.gz 

mysql -u root -h 172.17.0.2 < sakila-db/sakila-schema.sql 
mysql -u root -h 172.17.0.2 < sakila-db/sakila-data.sql

Verify that the data is on the slave as well:

mysql -u root -h 172.17.0.3 -e "USE sakila;SHOW TABLES;"
+----------------------------+
| Tables_in_sakila           |
+----------------------------+
| actor                      |
| actor_info                 |
| address                    |
| category                   |
| city                       |
| country                    |
| customer                   |
...
| store                      |
+----------------------------+

After our setup is completed, we can proceed with Percona pt-table checksum.

Percona pt-table-checksum

The Percona pt-table-checksum tool requires the connection information of the master and the slave in a specific format. This is called the DSN (data source name), which is a coma separated string. We can store this information in a dedicated database called percona in a table called dsns. We create this table on the master. Note that the data gets replicated to the slave within the blink of an eye.

CREATE DATABASE percona;
USE percona;
 
CREATE TABLE `DSN-Table` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `dsn` varchar(255) NOT NULL,
 PRIMARY KEY (`id`)
);

The next step involves creating permissions on the slave and the master!

GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO 'percona'@'172.17.0.1' IDENTIFIED BY 'SECRET'; 
GRANT ALL PRIVILEGES ON percona.* TO 'percona'@'172.17.0.1';

The percona user is needed to run the script. Note that the IP address is this time from the (Docker) host, having the IP 172.17.0.1 by default. In real world scenarios, this script would either be run on the master or on the slave directly.

Now we need to add the information about the slave to the table we created. The Percona tool could also read this from the process list, but it is more reliable if we add the information ourselves. To do so, we add a record to the table we just created, which describes the slave DSN:

INSERT INTO percona.DSN-Table VALUES (1,'h=172.17.0.3,u=percona,p=SECRET,P=3306');

The pt-table-checksum tool the connects to the master instance and the the slave. It computes checksums of all databases and tables and compares results. You can use the tool like this:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \
  --recursion-method=dsn=t=percona.DSN_Table -h 172.17.0.2 -P 3306 -u percona -pSECRET
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-10T10:13:11      0      0        0       1       0   0.020 mysql.columns_priv
09-10T10:13:11      0      0        3       1       0   0.016 mysql.db
09-10T10:13:11      0      0        0       1       0   0.024 mysql.event
09-10T10:13:11      0      0        0       1       0   0.014 mysql.func
09-10T10:13:11      0      0       40       1       0   0.026 mysql.help_category
09-10T10:13:11      0      0      614       1       0   0.023 mysql.help_keyword
09-10T10:13:11      0      0     1224       1       0   0.022 mysql.help_relation
09-10T10:13:12      0      0      585       1       0   0.266 mysql.help_topic
09-10T10:13:12      0      0        0       1       0   0.031 mysql.ndb_binlog_index
09-10T10:13:12      0      0        0       1       0   0.024 mysql.plugin
09-10T10:13:12      0      0        6       1       0   0.287 mysql.proc
09-10T10:13:12      0      0        0       1       0   0.031 mysql.procs_priv
09-10T10:13:12      0      1        2       1       0   0.020 mysql.proxies_priv
09-10T10:13:12      0      0        0       1       0   0.024 mysql.servers
09-10T10:13:12      0      0        0       1       0   0.017 mysql.tables_priv
09-10T10:13:12      0      0     1820       1       0   0.019 mysql.time_zone
09-10T10:13:12      0      0        0       1       0   0.015 mysql.time_zone_leap_second
09-10T10:13:12      0      0     1820       1       0   0.267 mysql.time_zone_name
09-10T10:13:13      0      0   122530       1       0   0.326 mysql.time_zone_transition
09-10T10:13:13      0      0     8843       1       0   0.289 mysql.time_zone_transition_type
09-10T10:13:13      0      1        4       1       0   0.031 mysql.user
09-10T10:13:13      0      0        1       1       0   0.018 percona.DSN_Table
09-10T10:13:13      0      0      200       1       0   0.028 sakila.actor
09-10T10:13:13      0      0      603       1       0   0.023 sakila.address
09-10T10:13:13      0      0       16       1       0   0.033 sakila.category
09-10T10:13:13      0      0      600       1       0   0.023 sakila.city
09-10T10:13:13      0      0      109       1       0   0.029 sakila.country
09-10T10:13:14      0      0      599       1       0   0.279 sakila.customer
09-10T10:13:14      0      0     1000       1       0   0.287 sakila.film
09-10T10:13:14      0      0     5462       1       0   0.299 sakila.film_actor
09-10T10:13:14      0      0     1000       1       0   0.027 sakila.film_category
09-10T10:13:14      0      0     1000       1       0   0.032 sakila.film_text
09-10T10:13:14      0      0     4581       1       0   0.276 sakila.inventory
09-10T10:13:15      0      0        6       1       0   0.030 sakila.language
09-10T10:13:15      0      0    16049       1       0   0.303 sakila.payment
09-10T10:13:15      0      0    16044       1       0   0.310 sakila.rental
09-10T10:13:15      0      0        2       1       0   0.029 sakila.staff
09-10T10:13:15      0      0        2       1       0   0.020 sakila.store

The result shows a difference in the MySQL internal table for permissions. This is obviously not what we are interested in, as permissions are individual to a host. So we rather exclude the MySQL internal database and also the percona database, because it is not what we are interested in. Also in order to test it the tool works, we delete the last five category assignments from the table  with mysql -u root -h 172.17.0.3 -e “DELETE FROM sakila.film_category WHERE film_id > 995;” and update a row in the city table with 

mysql -u root -h 172.17.0.3 -e "update sakila.city SET city='Innsbruck' WHERE city_id=590;"

Now execute the command again:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona -h 172.17.0.2 -P 3306 -u percona -pSECRET
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-10T10:46:33      0      0      200       1       0   0.017 sakila.actor
09-10T10:46:34      0      0      603       1       0   0.282 sakila.address
09-10T10:46:34      0      0       16       1       0   0.034 sakila.category
09-10T10:46:34      0      1      600       1       0   0.269 sakila.city
09-10T10:46:34      0      0      109       1       0   0.028 sakila.country
09-10T10:46:34      0      0      599       1       0   0.285 sakila.customer
09-10T10:46:35      0      0     1000       1       0   0.297 sakila.film
09-10T10:46:35      0      0     5462       1       0   0.294 sakila.film_actor
09-10T10:46:35      0      1     1000       1       0   0.025 sakila.film_category
09-10T10:46:35      0      0     1000       1       0   0.031 sakila.film_text
09-10T10:46:35      0      0     4581       1       0   0.287 sakila.inventory
09-10T10:46:35      0      0        6       1       0   0.035 sakila.language
09-10T10:46:36      0      0    16049       1       0   0.312 sakila.payment
09-10T10:46:36      0      0    16044       1       0   0.320 sakila.rental
09-10T10:46:36      0      0        2       1       0   0.030 sakila.staff
09-10T10:46:36      0      0        2       1       0   0.027 sakila.store

You see that there is a difference in the tables sakila.city and in the table sakila.film_category. The tool does not report the actual number of differences, but rather the number of different chunks. To get the actual differences, we need to use a different tool, which utilises the checksum table that the previous step created.

Show the differences with pt-tabel-sync

The pt-table-sync tool is the counter part for the pt-table-checksum util. It can print or even replay the SQL statements that would render the slave the same state again to be in sync with the master. We can run a dry-run first, as the tool is potentially dangerous.

pt-table-sync --dry-run  --replicate=percona.checksums --sync-to-master h=172.17.0.3 -P 3306 \
   -u percona -pSECRET --ignore-databases mysql,percona
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing via replication P=3306,h=172.17.0.3,p=...,u=percona in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     08:57:51 08:57:51 0    sakila.city
#      0       0      0      0 Nibble    08:57:51 08:57:51 0    sakila.film_category

With –dry-run, you only see affected tables, but not the actual data because it does not really access the databases tables in question. Use –print additionally or instead of dry-run to get a list:

pt-table-sync --print --replicate=percona.checksums --sync-to-master h=172.17.0.3 -P 3306 \
  -u percona -pSECRET --ignore-databases mysql,percona
REPLACE INTO `sakila`.`city`(`city_id`, `city`, `country_id`, `last_update`) VALUES \
   ('590', 'Yuncheng', '23', '2006-02-15 04:45:25') 
  \ /*percona-toolkit src_db:sakila src_tbl:city  ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('997',... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('998', '11 ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('999', '3', ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('1000', '5', ... 

The command shows how we can rename back from Innsbruck to Yuncheng again and also provides the INSERT statements to get the deleted records back.When we replace –print with –execute, the data gets written to the master and replicated to the slave. To allow this, we need to set the permissions on the master

GRANT INSERT, UPDATE, DELETE ON sakila.* TO 'percona'@'172.17.0.1';
pt-table-sync --execute  --replicate=percona.checksums --check-child-tables \ 
  --sync-to-master h=172.17.0.3 -P 3306 -u percona -pSECRET --ignore-databases mysql,percona
REPLACE statements on sakila.city can adversely affect child table `sakila`.`address` 
   because it has an ON UPDATE CASCADE foreign key constraint. 
   See --[no]check-child-tables in the documentation for more information. 
   --check-child-tables error  while doing sakila.city on 172.17.0.3

This error indicates that updating the city table has consequences, because it is a FK to child tables. In this example, we are bold and ignore this warning. This is absolutely not recommended for real world scenarios.

pt-table-sync --execute  --replicate=percona.checksums --no-check-child-tables \
   --no-foreign-key-checks --sync-to-master h=172.17.0.3 -P 3306 -u percona -pSECRET \ 
   --ignore-databases mysql,percona

The command –no-check-child-tables ignores child tables and the command –no-foreign-key-checks ignores foreign keys.

Run the checksum command again to verify that the data has been restored:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \ 
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona 
   -h 172.17.0.2 -P 3306 -u percona -pSECRET

            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-10T11:24:42      0      0      200       1       0   0.268 sakila.actor
09-10T11:24:42      0      0      603       1       0   0.033 sakila.address
09-10T11:24:42      0      0       16       1       0   0.029 sakila.category
09-10T11:24:42      0      0      600       1       0   0.275 sakila.city
09-10T11:24:42      0      0      109       1       0   0.023 sakila.country
09-10T11:24:43      0      0      599       1       0   0.282 sakila.customer
09-10T11:24:43      0      0     1000       1       0   0.046 sakila.film
09-10T11:24:43      0      0     5462       1       0   0.284 sakila.film_actor
09-10T11:24:43      0      0     1000       1       0   0.036 sakila.film_category
09-10T11:24:43      0      0     1000       1       0   0.279 sakila.film_text
09-10T11:24:44      0      0     4581       1       0   0.292 sakila.inventory
09-10T11:24:44      0      0        6       1       0   0.031 sakila.language
09-10T11:24:44      0      0    16049       1       0   0.309 sakila.payment
09-10T11:24:44      0      0    16044       1       0   0.325 sakila.rental
09-10T11:24:44      0      0        2       1       0   0.029 sakila.staff
09-10T11:24:44      0      0        2       1       0   0.028 sakila.store

0 DIFFS, we are done!

 

 

 

 

 

 

 

 

 

 

 

 

 

Continue reading


Persistent Data in a MySQL Docker Container

Running MySQL in Docker

In a recent article on Docker in this blog, we presented some basics for dealing with data in containers. This article will present another popular application for Docker: MySQL containers. Running MySQL instances in Docker allows isolating database infrastructure with ease.

Connecting to the Standard MySQL Container

The description of the MySQL docker image provides a lot of useful information how to launch and connect to a MySQL container. The first step is to create standard MySQL container from the latest available image.

sudo docker run \
   --name=mysql-instance 
   -e MYSQL_ROOT_PASSWORD=secret 
   -p 3307:3306 
   -d 
   mysql:latest

This creates a MySQL container where the root password is set to secret. As the host is already running its own MySQL instance (which has nothing to do with this docker example), the standard port 3306 is already taken. Thus we publish utilise the port 3307 on the host system and forward it to the 3306 standard port from the container.

Connect from the Host

We can then connect from the command line like this:

mysql -uroot -psecret -h 127.0.0.1 -P3307

We could also provide the hostname localhost for connecting to the container, but as the MySQL client per default assumes that a localhost connection is via a socket, this would not work. Thus when using the hostname localhost, we needed to specify the protocol TCP, wo that the client connects via the network interface.

mysql -uroot -psecret -h localhost --protocol TCP -P3307

Connect from other Containers

Connecting from a different container to the MySQL container is pretty straight forward. Docker allows to link two containers and then use the exposed ports between them. The following command creates a new ubuntu container and links to the MySQL container.

sudo docker run -it --name ubuntu-container --link mysql-instance:mysql-link ubuntu:16.10 bash

After this command, you are in the terminal of the Ubuntu container. We then need to install the MySQL client for testing:

# Fetch the package list
root@7a44b3e7b088:/# apt-get update
# Install the client
root@7a44b3e7b088:/# apt-get install mysql-client
# Show environment variables
root@7a44b3e7b088:/# env

The last command gives you a list of environment variables, among which is the IP address and port of the MySQL container.

MYSQL_LINK_NAME=/ubuntu-container/mysql-link
HOSTNAME=7a44b3e7b088
TERM=xterm
MYSQL_LINK_ENV_MYSQL_VERSION=5.7.14-1debian8
MYSQL_LINK_PORT=tcp://172.17.0.2:3306
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MYSQL_LINK_PORT_3306_TCP_ADDR=172.17.0.2
MYSQL_LINK_PORT_3306_TCP=tcp://172.17.0.2:3306
PWD=/
MYSQL_LINK_PORT_3306_TCP_PORT=3306
SHLVL=1
HOME=/root
MYSQL_LINK_ENV_MYSQL_MAJOR=5.7
MYSQL_LINK_PORT_3306_TCP_PROTO=tcp
MYSQL_LINK_ENV_GOSU_VERSION=1.7
MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD=secret
_=/usr/bin/env

You can then connect either manually of by providing the variables

mysql -uroot -psecret -h 172.17.0.2
mysql -uroot -p$MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD -h $MYSQL_LINK_PORT_3306_TCP_ADDR -P $MYSQL_LINK_PORT_3306_TCP_PORT

If you only require a MySQL client inside a container, simply use the MySQL image from docker. Batteries included!

Continue reading


Persistent Docker Containers

Docker Fundamentals

Docker has become a very popular tool for orchestrating services. Docker it much more lightweight than virtual machines. For instance do containers not require a boot process. Docker follows the philosophy that one container serves only one process. So in contrast to virtual machines which often bundle several services together, Docker is built for running single services per container. If you come from the world of virtualised machines, Docker can be a bit confusing in the beginning, because it uses its own terminology. A good point to start is as always the documentation and there are plenty of great tutorials out there.

Images and Containers

Docker images serves as templates for the containers. As images and containers both have hexadecimal ids they are very easy to confuse. The following example shows step by step how to create a new container based on the Debian image and how to open shell access.

# Create a new docker container based on the debian image
sudo docker create -t --name debian-test debian:stable 
# Start the container
sudo docker start  debian-test
# Check if the container is running
sudo docker ps -a
# Execute bash to get an interactive shell
sudo docker exec -i -t debian-test bash

A shorter variant of creating and launching a new container is listed below. The run command creates a new container and starts it automatically. Be aware that this creates a new container every time, so assigning a container name helps with not confusing the image with the container. The command run is in particular tricky, as you would expect it to run (i.e. launch) a container only. In fact, it creates a new one and starts it.

sudo docker run -it --name debian-test debian:stable bash

Important Commands

The following listing shows the most important commands:

# Show container status
sudo docker ps -a
# List available images
sudo docker images 
# Start or stop a container
sudo docker start CONTAINERNAME
sudo docker stop CONTAINERNAME
# Delete a container
sudo docker rm CONTAINERNAME

You can of course create your own images, which will not be discussed in this blog post. It is just important to know that you can’t move containers from your host so some other machine directly. You would need to commit the changes made to the image and create a new container based on that image. Please be aware that this does not include the actual data stored in that container! You need to manually export any data and files from the original container and import it in the new container again. This is another trap worth noting. You can, however,  also mount data in the image, if the data is available at the host at the time of image creation. Details on data in containers can be found here.

Persisting Data Across Containers

The way how Docker persists data needs getting used to in the beginning, especially as it is easy to confuse images with containers. Remember that Docker images serve only as the template. So when you issue the command sudo docker run …  this actually creates a container from an image first and then starts it. So whenever you issue this command again, you will end up with a new container which does share any data with the previously created container.

Docker 1.9 introduced data volume containers, which allow to create dedicated data containers which can be used from several other containers. Data volume containers can be used for persisting data. The following listing shows how to create a data volume container and mount the volume in a container.

# Create a data volume
sudo docker volume create --name data-volume-test
# List all volumes
sudo docker volume ls
# Delete the container
sudo docker rm debian-test
# Create a new container, now with the data volume 
sudo docker create -v data-volume-test:/test-data -t --name debian-test debian:stable
# Start the container
sudo docker start debian-test
# Get the shell
sudo docker exec -i -t debian-test bash

After we logged into the shell, we can see the data volume we mounted on the directory test-data:

root@d4ac8c89437f:/# ls -la
total 76
drwxr-xr-x  28 root root 4096 Aug  3 13:11 .
drwxr-xr-x  28 root root 4096 Aug  3 13:11 ..
-rwxr-xr-x   1 root root    0 Aug  3 13:10 .dockerenv
drwxr-xr-x   2 root root 4096 Jul 27 20:03 bin
drwxr-xr-x   2 root root 4096 May 30 04:18 boot
drwxr-xr-x   5 root root  380 Aug  3 13:11 dev
drwxr-xr-x  41 root root 4096 Aug  3 13:10 etc
drwxr-xr-x   2 root root 4096 May 30 04:18 home
drwxr-xr-x   9 root root 4096 Nov 27  2014 lib
drwxr-xr-x   2 root root 4096 Jul 27 20:02 lib64
drwxr-xr-x   2 root root 4096 Jul 27 20:02 media
drwxr-xr-x   2 root root 4096 Jul 27 20:02 mnt
drwxr-xr-x   2 root root 4096 Jul 27 20:02 opt
dr-xr-xr-x 267 root root    0 Aug  3 13:11 proc
drwx------   2 root root 4096 Jul 27 20:02 root
drwxr-xr-x   3 root root 4096 Jul 27 20:02 run
drwxr-xr-x   2 root root 4096 Jul 27 20:03 sbin
drwxr-xr-x   2 root root 4096 Jul 27 20:02 srv
dr-xr-xr-x  13 root root    0 Aug  3 13:11 sys
drwxr-xr-x   2 root root 4096 Aug  3 08:26 test-data
drwxrwxrwt   2 root root 4096 Jul 27 20:03 tmp
drwxr-xr-x  10 root root 4096 Jul 27 20:02 usr
drwxr-xr-x  11 root root 4096 Jul 27 20:02 var

We can navigate into that folder and create a 100 M data file with random data.

root@d4ac8c89437f:~# cd /test-data/
root@d4ac8c89437f:/test-data# dd if=/dev/urandom of=100M.dat bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 6.69175 s, 15.7 MB/s
root@d4ac8c89437f:/test-data# du -h .
101M	.

When we exit the container, we can see the file in the host file system  here:

stefan@stefan-desktop:~$ sudo ls -l /var/lib/docker/volumes/data-volume-test/_data
insgesamt 102400
-rw-r--r-- 1 root root 104857600 Aug  3 15:17 100M.dat

We can use this volume transparently in the container, but it is not depending on the container itself. So whenever we have to delete to container or want to use the data with a different container, this solution works perfectly. Thw following command shows how we mount the same volume in an Ubuntu container and execute the ls command to show the content of the directory.

stefan@stefan-desktop:~$ sudo docker run -it -v data-volume-test:/test-data-from-debian --name ubuntu-test ubuntu:16.10 ls -l /test-data-from-debian
total 102400
-rw-r--r-- 1 root root 104857600 Aug  3 13:17 100M.dat

You can display a lot of usefil information about a container with the inspect command. It also shows the data container and where it is mounted.

sudo docker inspect ubuntu-test

...
        "Mounts": [
            {
                "Name": "data-volume-test",
                "Source": "/var/lib/docker/volumes/data-volume-test/_data",
                "Destination": "/test-data-from-debian",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
...

We delete the ubuntu container and create a new one. We then start the container, open a bash session and write some test data into the directory.

stefan@stefan-desktop:~$ sudo docker create -v data-volume-test:/test-data-ubuntu -t --name ubuntu-test ubuntu:16.10
f3893d368e11a32fee9b20079c64494603fc532128179f0c08d10321c8c7a166
stefan@stefan-desktop:~$ sudo docker start ubuntu-test
ubuntu-test
stefan@stefan-desktop:~$ sudo docker exec -it  ubuntu-test bash
root@f3893d368e11:/# cd /test-data-ubuntu/
root@f3893d368e11:/test-data-ubuntu# ls
100M.dat
root@f3893d368e11:/test-data-ubuntu# touch ubuntu-writes-a-file.txt

When we check the Debian container, we can immediately see the written file, as the volume is transparently mounted.

stefan@stefan-desktop:~$ sudo docker exec -i -t debian-test ls -l /test-data
total 102400
-rw-r--r-- 1 root root 104857600 Aug  3 13:17 100M.dat
-rw-r--r-- 1 root root         0 Aug  3 13:42 ubuntu-writes-a-file.txt

Please be aware that the docker volume is just a regular folder on the file system. Writing from both containers the same file can lead to data corruption. Also remember that you can read and write the volume files directly from the host system.

Backups and Migration

Backing up data is also an important aspect when you use named data volumes as shown above. Currently, there is no way of moving Docker containers or volumes natively to a different host. The intention of Docker is to make the creation and destruction  of containers very cheap and easy. So you should not get too attached to your containers, because you can re-create them very fast. This of course is not true for the data stored in volumes. So you need to take care of your data yourself, for instance by creating automated backups like this sudo tar cvfz Backup-data-volume-test.tar.gz /var/lib/docker/volumes/data-volume-test and re-store the data when needed in a new volume. How to backup volumes using a container is described here.

Continue reading