Hi everyone
We had a scheduled maintenance window over the weekend while some work was carried out elsewhere in the building. As a precaution we shutdown the infrastructure that we we still keep onsite (the rest is in colo). So all we have is a 2 node ESXi cluster, managed by vCenter Server Appliance (Novell SUSE Linux Enterprise 11 64bit). It only has 6 VMs running on it as most of them have been migrated offsite.
This morning we started up the OpenFiler NFS storage box and the ESXi nodes. All VMs started ok, including the vCenter Server Appliance.
However I was unable to connect to the vCenter Appliance using vSphere Client. I could connect to the ESXi nodes directly without problems. DNS hostnames all resolve to the IP addresses without problems.
Looking at the vSphere Client logs I see the following message each time I tried to connect:
System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it 10.10.0.205:443
So I connected to the vCenter Appliance over SSH and it looks like it's run out of disk space. Here's the df and mount output:
vlevcenter01:/ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 9.8G 9.8G 0 100% /
devtmpfs 4.0G 104K 4.0G 1% /dev
tmpfs 4.0G 4.0K 4.0G 1% /dev/shm
/dev/sda1 130M 18M 105M 15% /boot
/dev/sdb1 20G 3.8G 15G 21% /storage/core
/dev/sdb2 20G 717M 18G 4% /storage/log
/dev/sdb3 20G 19G 0 100% /storage/db
vlevcenter01:/ # mount -l
/dev/sda3 on / type ext3 (rw,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
devtmpfs on /dev type devtmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,mode=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda1 on /boot type ext3 (rw,acl,user_xattr)
/dev/sdb1 on /storage/core type ext3 (rw)
/dev/sdb2 on /storage/log type ext3 (rw)
/dev/sdb3 on /storage/db type ext3 (rw)
So the database and root partitions have run out of space. Looking in /var/log/localmesages:
Jul 15 10:10:51 vlevcenter01 startproc: startproc: Empty pid file /var/run/slapd/slapd.pid for /usr/lib/openldap/slapd
Jul 15 10:10:51 vlevcenter01 startproc: startproc: exit status of parent of /usr/lib/openldap/slapd: 1
Jul 15 10:10:52 vlevcenter01 checkproc: checkproc: Empty pid file /var/run/slapd/slapd.pid for /usr/lib/openldap/slapd
It looks like it can't create the pid files for the services because there is no disk space.
We already have 60GB of vmdks allocated to this VM though we do have space on our datastore to add more. But I'm concerned that the vCenter Appliance will just grow and grow.
1) Is there any scheduled maintenance that runs on the vCenter Appliance to keep disk usage under control? This is what I would expect from an enterprise product, I'm very surprised to see such a problem happen.
2) Where should I start looking to clear some disk space so we can get the vCenter service back up and running?
I'd like to find a longer term solution to this otherwise we might have to just keep adding vmdks until our datastore runs out of space.
Cheers, B