Friday Apr 28, 2006

Zones Top Ten

For quite a while I've played with Solaris 10 Zones. Even made very good use of them for isolating development projects in a sandbox environment. And so far, all I needed could be done with a simple 'zonecfg', 'zoneadm install', 'zoneadm boot' and then 'zlogin'. Recently I started to dig a little deeper and ran into problems with networking between local zones. And I had issues with that, partly by not enough RTFM :-) and partly by real issues. The real problems were mainly caused by trying this on a system running Nevada with BFU-ed on top of that the BrandZ stuff. Some piece in ZFS broke a dependency in my SMF.

This article is about where I went "off the right track" while setting up my zones. What was obvious and what wasn't. Often because somewhere I read something that was dead wrong (yes, also on blogs :-) or at least incomplete. Because I'm a Letterman fan, I've decided to do this in a Top Ten format. Similar to the Late-Show, don't pay too much attention to the ranking.


This is important: during the install of a zone, never-ever start to "peek 'n poke" around in your local zone's directory to see how far things have come. Don't even think about it!! I know, it is especially tempting when the install takes a little longer, which happens on a slower system. Just by doing a 'cd' in to your zones directory tree, you will prevent nfs mounts to be unmounted and the install will mess up hopelessly. Underwater, what is happening during install is a combination of nfs mounting and hard links (not symbolic links).


There are full and sparse zones. The thing that matters here is whether you specify a "-b" at your "create" in zonecfg. The man-page says about this "Use the -b to create a blank configuration. Without arguments, create applies the Sun default settings." Now you would expect that the default is to inherit nothing and therefore create a full zone. But it is the other way round. With the default come automatic inherits for /lib, /platform, /sbin and /usr. Because /bin is linked to /usr/bin, that one is also inherited from the global zone. Now by adding the "-b" when you create a zone, you don't have these inherits and therefore you'll get a full zone, which takes roughly two gigs per zone. I always end my zonecfg with the sequence "verify", "commit" and "info". The latter shows you clearly what your inherited packages are, even when you use the default.


Many READMEs and HOWTOs on Zones only mention 'zonecfg', 'zoneadm install', 'zoneadm boot' and then 'zlogin'. What is missing there is that the first time you do a 'zoneadm boot' all the services get initialized and the system needs to be network configured. So, before booting the zone for the first time, open another window and enter 'zlogin -C thezone'. This will open a console for that zone (close it with '~.'). When you now boot, you will see that first the services get initialized and then you have to answer the typical questions about network, naming service, locale, etc. At the end the zone will reboot and now you can straight away 'zlogin' into your zone.


Instead of doing 'zoneadm -z thezone reboot' from the global zone, you can just as well do a normal Solaris 'reboot' command from within the local zone. This makes even more sense when you are connected to the zones console ('zlogin -C').


If you failed to have a console open when the zone was booted for the first time, you can do the following: use 'zlogin -C' to open a console into your zone and login as root. Execute 'sys-unconfig', this will halt the zone. Now, in another window, boot the zone again with 'zoneadm -z thezone boot'. In the console window you will now see the same questions asked as during the first boot.


The forums are full of people having trouble with not being able to ssh into their zones. For a while I had the same issues. Many places recommend that you manually create your ssh keys (when missing) with "/lib/svc/method/sshd -c". Which will create the keys, but doesn't solve the underlying problem. Which is that you didn't go trough a proper configuration process after installing the zone. See at [3].


If you've configured DNS as your naming service and you get after reboot that typical message from sendmail with "unable to qualify my own domain name", the solution is to edit your /etc/hosts file from           localhost        thezone        loghost
into           localhost        thezone        thezone.thedomain.the        loghost

IMHO it doesn't make sense, but it fixes the problem.


I discovered that hacking the zoneconfigs is pretty straightforward and fun. You know the slew of disclaimers that should follow here of course . In your global zone, go to /etc/zones. There is an 'index' and for each zone an XML file that has the zone's configuration. In the file 'index' you can change the zone directory, but I wouldn't start messing with that GUID. Also, I think the zone status field is better left alone. If you start editing the index or the XML file, keep the two consistent. This hacking is of course dangerous stuff, but I used it successfully when I couldn't delete a corrupted zone.


To change the behaviour of what happens with a 'create' and 'create -b' in zonecfg, the /etc/zones directory contains two xml files: SUNWdefault.xml and SUNWblank.xml. At least have a look at these to determine what the difference is between the two.


Somewhere along the way in Nevada, I guess around build #30, an option was added to move or clone a zone. Very cool!!

That's it for now. Happy hacking and have fun with your zones ...