Monday, August 10, 2015

Image Decomposition

Looking at the swarm image we built previously, it's a couple of hundred MB in size -- for a small application. This image size will not slow down execution, since only files are touched which are needed when the application starts, but it will still be offending an aesthete's eye. Specifically for mainframers who have tried to save on every bit, because their core memory only had a couple of them (my knowledge is entirely anecdotal...). How can we kill that pain?

A container image only needs to come with the files which are really needed to execute the application in it. Typically, the base image is an entire distribution, which comes with lots of libraries, tools, and other files. And then the installation orgy starts in the Dockerfile... But it is not that hard to do better. Let's pick the swarm image and decompose it. docker save is the starting point, exporting the image's structure into a tarball.
mkdir temp
cd temp
docker save swarm | tar xf -
We'll see a pile of directories -- these directories correspond to the individual build steps. Each directory again carries a tarball with the incremental changes of that layer. We'll restore the image's contents by simply throwing all the files together:
for i in */layer.tar ; do tar xf $i ; done

Our swarm binary has been put into go/bin/swarm (visible in the Dockerfile, or when doing docker inspect on the image). We only need that one executable. Well, plus some libraries, but which ones? ldd will answer that. To use the image's files, we do a chroot to get the files the container would use.
PATH=$PATH:/bin LD_LIBRARY_PATH=/usr/local/gccgo/lib64 chroot . ldd go/bin/swarm > filelist

(Put everything into one line, in case your browser breaks it.) Now filelist looks like this: => /usr/local/gccgo/lib64/ (0x000003fffc029000) => /lib/s390x-linux-gnu/ (0x000003fffbf8a000) => /usr/local/gccgo/lib64/ (0x000003fffbf77000) => /lib/s390x-linux-gnu/ (0x000003fffbdea000) => /lib/s390x-linux-gnu/ (0x000003fffbdcb000)
/lib/ (0x000002aae1461000)
and shows the dependencies of the swarm binary. This and the swarm binary itself is all it takes for the container.
Add some sed magic to get the relative file names out of it and re-import *just* these files plus the swarm binary itself into a docker image. The longish parameter list of the docker import command mimics all the adjustments done in the Dockerfile of swarm and underlying images:
tar chvf - go/bin/swarm `cat filelist | sed 's/.*=>//' | sed 's/(.*$//' | sed 's! */!!' ` | docker import -c "ENV SWARM_HOST :2375" -c "ENV LD_LIBRARY_PATH /usr/local/gccgo/lib64:/lib/s390x-linux-gnu:/lib64" -c "EXPOSE 2375" -c "VOLUME /.swarm" -c 'ENTRYPOINT ["/go/bin/swarm"]' -c 'CMD ["--help"]' - swarm-mini

(Note: again put everything into one monster line...)

With that, the new swarm image is just under 50MB in size. How delightful.

Now that we've learned how to squeeze images to the bare minimum, one comment: the memory footprint (RAM -- main memory) will be the same for both the original and the new image. Only the files really required to run the application are really mapped into memory. I guess, this is the pragmatic Unix answer to mainframe frugality. Plus, an update is much easier done the wasteful way. But... aestethics has its right to exist, and good design is often reduced design.

No comments:

Post a Comment