TECHNICAL DOCUMENTATION

Hagenberg

January 2008

© Copyright 2008

Roland Dworschak, Sabine Huber

Alexander Leitner, Joachim Pöttinger

All rights reserved

Contents

1   Concept

1.1    Overview

1.1.1       Requirements

1.2    Configuration files

1.2.1     lbvm.conf

1.2.2     lbresources.conf

1.3    Load balancing algorithms

1.3.1     Resources

1.3.2     Targets

1.3.3     Examples

1.4    Processes and scripts

1.4.1     LBM

1.4.2     LB LOG

1.4.3     LB MONITOR

1.4.4     Cluster scripts

1.5    Integration with Red Hat Cluster Suite

1.5.1      rgmanager

2  Manual

2.1    Operating systems

2.1.1     Server installation

2.2    Shared storage

2.2.1    iSCSI target

2.3    Cluster

2.3.1     Packages

2.3.2     Configuration and services

2.4    Virtualization

2.4.1     OpenVZ

2.4.2     Xen

2.5    Load balancer

2.5.1     Packages

2.5.2     Configure and register services

2.5.3     Add VM

LBVM

1. Concept

1.1 Overview

The LBVM allows sharing Virtual machines among physical Servers in a predefined cluster. It is fully configurable and requires only a few Setups steps before running automatically. LBM, LB LOG and LB MONITOR are the core Scripts and perform the necessary steps.

The LBM script is the management interface to the load balancer and is used to view all balanced Virtual machines, review log flies and reports, manually migrate Virtual machines from one Server to another and to add an existing Virtual machine to the cluster.

LB LOG is a small cronjob which runs regularly on each Server to monitor predefined resources. The resource logs are stored on a shared storage and are evaluated by the load balancer.

The load balancer LB MONITOR runs as a clustered Service, uses different algorithms to decide which Virtual machines should be moved or reported and finally Starts the live migration process.

The current release of the LBVM includes configuration flies and Scripts for the virtualization technologies OpenVZ and Xen to support live migrations with zero-downtime.

For further details concerning the Scripts see section 1.4.

1.1.1 Requirements

The following requirements have to be met before the LBVM can be configured:

  • The LBVM operates on top of the Red Hat Cluster Suite. A fully configured and running Red Hat Cluster is therefore mandatory. The number of nodes does not matter, but at least a three node cluster is recommended.

  • In order to perform live migrations with zero-downtime the Virtual machines have to reside on shared data storage. This may be any kind of shared storage like iSCSI, SAN or if only two physical nodes are used DRBD. If Virtual machines are later added to the cluster, LBM automatically moves them to the shared storage and Updates the according configuration flies.

  • The load balancing Scripts and command settings for the virtualization technology have to be in place. This step is not necessary if the prebuilt RPM package is used. If you want to add your own technology, be sure to check the inline documentation of the configuration flies.

1.2 Configuration files

The LBVM searches for its main configuration file lbvm.conf in /etc/cluster/ by default. It contains all necessary parameters for the load balancer and references to the resource Script. We recommend placing the configuration and resource files on the shared data storage and symlink lbvm.conf back to /etc/cluster/lbvm.conf. Modifications of the lbvm.conf file on one node are therefore automatically distributed by the under-lying file system.

1.2.1 lbvm.conf

The lbvm.conf is the main configuration file and contains settings for all Virtual machines which are under the control of the load balancer. Following parameters are allowed:

Paths:

  • sharedstorage <dir>; Directory of the shared data storage.

  • resources <file>; Absolute path to the resource file.

  • logdir <dir>; Directory where log files and resource usages are stored.

VM settings:

  • virtualization <technology>; This is usually "openvz" or "xen" and is used by LBM when adding existing Virtual machines to the cluster.

  • balance <vm>; Comma separated list of Virtual machines that are controlled by LB MONITOR.

  • LB algorithm (see 1.3 for details):

  • Default algorithm:

  •  default {
         info ;
         algorithm {
    
         }
     }
    
  • Specific algorithm for a Virtual machine:

  •  veid  {
         info ;
         algorithm {
    
         }
     }
    

1.2.2 lbresources.conf

This file contains a list of resources that are monitored and logged by LB LOG. Each resource consists of a name that is used as reference by the load balancing algorithm, a command line that returns the appropriate usage and an optional name for the log file (default is to use the resource name).

The following example uses /proc/stat to measure the CPU usage and is later referenced as "cpu". The logfile parameter is optional:

 resource cpu {
     cmd     (cat /proc/stat |grep ^cpu\ |awk '{print $4,$2+$3+$4}' && sleep
3 && cat /proc/stat |grep ^cpu\ |awk '{print $4,$2+$3+$4}') | awk '{idle+=$1;sum
+=$2;idle=-idle;sum=-sum}END{print 1-idle/sum}';
     logfile cpu-usage;
 }

1.3 Load balancing algorithms

The load balancing algorithms are defined in lbvm.conf. Every virtual machine can have its own algorithm. If none is defined for a specific machine, the default algorithm is used. The protocol for the load balancing algorithm is a perl statement that results in a migrate() or a report(), providing an extensible interface to the resource usage of the cluster nodes. Algorithms are evaluated by the perl function eval. As example:

 my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
 if ( $hour >= 0800 && $hour <= 1800 &&  &&  )
 {
     migrate();
 }

1.3.1     Resources

The utilization of the current Server is read by the function res:

 res(, )
 res('cpu', 5) # returns the CPU usage of the last 5 minutes

Resource requirements of other possible nodes can be read by res2 with the same parameters. The names of the resources are defined in resources.conf as seen in section 1.2.2. The interval has to be an integer greater than zero.

In order to operate with a resource a single value has to be determined that can be used for further perl operations like ">", "<" or "==". One can either use the implemented mathematical operations (minimum, maximum, average and meridian) on a resource, write his or her own mathematical operation or use one of the stored values by accessing the resource array on a specific position.

 if ( min(res('cpu', 5) > 0.50 ) {
     # minimum CPU load above 50% for the last 5 minutes
 }

 my @cpu = res('cpu', 3);
 if ( average(@cpu) > 0.50 && $cpu[0] > $cpu[1] && $cpu[1] > $cpu[2] ) {
     # average CPU load above 50% and increasing
 }

1.3.2     Targets

By default all available nodes, but the current one, are evaluated by the algorithm and marked as possible new target if migrate() or report() is called. Virtual machines can however be restricted to a failover domain (a list of physical nodes that may be used for migration) which is defined in the cluster.conf of the Red Hat Cluster Suite. LBM and LB MONITOR are both aware of this option and also restrict their evaluations and operations to failover domains if defined.

Example: VM 201 is restricted to openvzl and openvz2

  <failoverdomains>
     <failoverdomain name="usb-cardreader" ordered="1" restricted="1">
         <failoverdomainnode name="openvz1.01100011.com" priority="2"/>
         <failoverdomainnode name="openvz2.01100011.com" priority="1"/>
     </failoverdomain>
 </failoverdomains>Resource requirements of o

 <service name="201" domain="usb-cardreader">
     <openvz name="201"/>
 </service>

1.3.3 Examples

Example 1: Usage of mathematical operation and a resource.

 algorithm {
    if ((average(res('cpu',5)) > 0.70)
    {
       migrate();
    }
 }

Example 2: Any combination of resources is possible.

 algorithm {
    if ((average(res('cpu',5)) > 0.70) && average(res2('cpu',2)) < 0.40)
    {
       migrate();
    }
 }

Example 3: Even perl functions can be used.

 algorithm {
    my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
    if ((average(res('cpu',5)) > 0.60 && min(res2('ram',2)) > 512) &&
        ($hour >= 0800 && $hour <= 1800))
    {
       migrate();
    }
 }

1.4 Processes and scripts

1.4.1     LBM

The load balancer manager (LBM) allows existing virtual machines to be transformed to a load balanced, high available service managed by the Red Hat Cluster Suite. This process moves the virtual machine along with its configurations to the shared storage, adapts the paths in the configuration files, registers the service for rgmanager and adds it to the list of balanced machines. LBM also offers information about the virtual machines, where they are running and what happened to them in the past. Live migrations may also be initiated manually through the LBM.

The automatic transformation currently supports OpenVZ and Xen. New virtualization technologies may be added by defining the protocol in the LBM script and setting the new virtualization technology in lbvm.conf. Once the Virtual machine has been successfully transformed it is automatically started in the cluster.

Example 1: Adding a Xen domain to the cluster

 [root@xen1 ~]# lbm -c /etc/xen/conf/centosDomU
 update xen config file ............ done
 copy image and xen config file .... done
 update lbvm.conf .................. done
 update cluster.conf ............... done

 run 'ccs_tool update' to upgrade to new cluster config file!

Example 2: Checking the history

 [root@xen1 ~]# lbm -l
 Balanced virtual machine(s):
 centos1 on node xen1.01100011.com
 centos2 on node xen2.01100011.com
 centos3 on node xen3.01100011.com

1.4.2  LB LOG

The LB LOG process provides the resource usage of a physical server as input for the load balancer. The script must run on a regular interval to provide useful Information. We recommend executing this script by cronjob every minute.

 [root@openvz1  ~]# crontab -l
 * * * * * /usr/share/lbvm/lblog

This Script queries lbvm.conf for the resources.conf file and parses every defined resource in there. The given commands are executed one after the other and stored with a timestamp in separate log files on the shared storage in the following format:

Location: <FQDN>-<RESOURCE>

Log format: <TIMESTAMP> <VALUE>\n

Example 1: LB LOG log file for CPU-resource (see 1.2.2 for details)

[root@openvz1 ~]# tail -n 3 /var/log/lb/openvz1.01100011.com-cpu
1195668541 0.01
1195668601 0.32
1195668661 0.12

1.4.3  LB MONITOR

LB MONITOR is the actual load balancer which migrates or reports a balanced virtual machine if the specified algorithm succeeds (see section 1.3 for details). We provide a clustered service script lbvm.sh that is used by the Red Hat Cluster Suite to assure the high availability of the load balancer.

The start function of lbvm.sh checks if LB MONITOR is in place. The status of this service is checked every minute by default and is used to execute the LB MONITOR. The interval can be changed by setting the according value in the cluster script (see example below). An internal lock file prevents the LB MONITOR to run more than one time, if a migration or calculation takes longer than the regular check interval. The stop function is currently unused.

<action name="status" interval="60s" timeout="10"/>

Each time the LB MONITOR is executed all virtual machines defined by balance in the lbvm.conf are checked. For each virtual machine the defined or default algorithm (if none specified) is evaluated for the current server and every possible target server (Restricted by failover domains (see section 1.3.2 for details)). Resources used in the algorithm are read from the resource log files generated by LB LOG.

If an algorithm succeeds and a new target has been found, depending on the final function, LB MONITOR only reports the possible target or also starts the live migration of the specific machine. Because they are defined as clustered services (see following section

1.4.4 for details) LB MONITOR uses the "Cluster User Service Administration Utility" to perform the relocation of a virtual machine.

clusvcadm -r <VEID> -m <TARGET SERVER>

Possible targets and actual migrations are logged in report.log and migrate.log which reside in /var/log/lb/ by default. The logs are also available in readable format through LBM (see example 2 of section 1.4.1).

1.4.4 Cluster scripts

The cluster scripts are used by rgmanager (see 1.5.1) to manage the virtual machines as clustered services. Fully working scripts are provided for OpenVZ and Xen and reside in /usr/share/cluster/ by default. Live migrations are performed by relocating a service which results in stopping the service on the current node and starting it on the new one.

openvz.sh

OpenVZ offers a limited live migration through the vzmigrate script. The virtual machine itself is copied from one node to another via rsync which takes a reasonable time depending on the size of the machine. Afterwards it is suspended on the local node, the memory dumped and also copied via rsync to the other node. There the memory dump is read and the virtual machine finally resumed.

We used the suspend / resume logic from the script and implemented it in our own openvz.sh which fully supports clustered file systems. Stopping an OpenVZ service suspends the virtual machine and dumps its memory on the clustered file system. The start process checks for a memory dump and either resumes or starts the machine. As there is no copying of the whole data needed anymore, the whole process takes a minimum of time and live migrations are feasible. The status of the service is regularly checked (default is every 30 seconds) and in case of an error either restarted on the local node or migrated to another one.

xen.sh

Xen supports live migrations directly. To migrate a domain the "Xen management user interface" xm is used with the following command:

xm migrate --live domain destination

With the --live flag xm attempts to keep the domain running while the migration is in progress, resulting in typical down times of just 60-300 ms.

To include the live migration in LBVM the stop / start function of the Red Hat vm.sh script is used. Stopping a Xen service results in calling the stop service which includes the xm live migration command if the Xen virtual machine exists in the migration folder of the shared storage. In this case the start service realizes the migration and initializes no additional start.

1.5 Integration with Red Hat Cluster Suite

1.5.1 rgmanager

The main task of rgmanager is to handle defined cluster services as well as administrator requests like service start, stop (disable), relocate and restart. It is also responsible for restarting and relocating services following a failure. Virtual machines have to be defined as a clustered service before they can be handled by rgmanager. We included scripts for the virtualization technologies OpenVZ and Xen providing live migrations with zero-downtime (see previous section for details). The virtual machines are defined in the cluster.conf as service or resource.

Example:

<service name="centos"> <openvz name="100"/> </service>

These services can be handled by rgmanager or through our load balancing manager LBM (recommended). The service definitions in the cluster.conf are automatically created if a virtual machine is added by using LBM.

2. Manual

2.1    Operating systems

2.1.1 Server installation

The server installation is performed with a Centos 5.0 CD. To accelerate the installation procedure a kickstart file was used for a minimum installation. For security purposes have a look at the running services and disable not required services.

2.2    Shared storage

In this scenario we will use a dedicated host to provide an iSCSI target as shared storage. Other solutions like SANs or DRBD (if you run a two node cluster) will also be supported.

2.2.1 iSCSI target

Perform these steps to install iSCSI Enterprise Target (http://iscsitarget.sourceforge.net/) on a dedicated host.

1.  Install the precompiled rpm package.

#  rpm -ivh iscsitarget-0.4.15-1.i386.rpm

2.  Alter the ietd.conf file in /etc and set the path to the shared hard disk (i.e. /dev/sdb).

Target iqn.2007-10.com.01100011.iscsi:iscsi
       IncomingUser
       OutgoingUser
       Lun 0 Path=/dev/sdb,Type=blockio
       Alias scsi

3.  Start the iSCSI service and add it to the runlevels.

# /etc/init.d/iscsi-target start
Starting iSCSI target service:                      [  OK  ]
# chkconfig iscsi-target on

4.  Verify the setup.

# cat /proc/net/iet/volume
tid:1 name:iqn.2007-10.com.01100011.iscsi:iscsi
lun:0 state:0 iotype:blockio iomode:wt path:/dev/sdb

2.3 Cluster

2.3.1    Packages

Install cman, rgmanager, gfs-utils and lvm2-cluster packages on all cluster nodes.

# yum install cman
# yum install rgmanager
# yum install gfs-utils
# yum install lvm2-cluster

2.3.2    Configuration and services

1.  Edit the cluster.conf to support each node in the cluster. The following is an example of a cluster with three members:

<clusternodes>
        <clusternode name="openvz1.01100011.com" nodeid="1" votes="1">
                <fence>
                        <method name="1">
                                <device name="manual" nodename="openvz1.01100011.com"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="openvz2.01100011.com" nodeid="2" votes="1">
                <fence>
                        <method name="1">
                                <device name="manual" nodename="openvz2.01100011.com"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="openvz3.01100011.com" nodeid="3" votes="1">
                <fence>
                        <method name="1">
                                <device name="manual" nodename="openvz3.01100011.com"/>
                        </method>
                </fence>
        </clusternode>
</clusternodes>
<cman/>
<fencedevices>
       <fencedevice agent="fence_manual" name="manual"/>
</fencedevices>

The fencedevice is set to manual for testing purpose. This is not practicable in a live environment.

2.  Install the iSCSI initiator on each cluster node.

(a) Install iSCSI packages.

# yum install iscsi-initiator-utils

(b)  Start iSCSI service and add it to the runlevels.

# /etc/init.d/iscsi start

# chkconfig iscsi on

(c)  Use iscsiadm utility to discover the iSCSI target.

# iscsiadm --mode discovery --type sendtargets --portal 192.168.1.100

(d)  Use iscsiadm utility to connect to the iSCSI target.

#  iscsiadm --mode node --targetname iqn.2007-10.com.01100011.iscsi:iscsi \ --portal 192.168.1.100:3260 --login

3. Create a logical volume on the iSCSI disk (see 2.2 for details) using LVM2. Perform the following commands on any cluster node.

(a) Check if the iSCSI disk is already attached.

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: VMware,  Model: VMware Virtual S Rev: 1.0
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: IET      Model: VIRTUAL-DISK     Rev: 0
  Type:   Direct-Access                    ANSI SCSI revision: 04

(b) Use whole disk as physical volume.

# pvcreate /dev/sdb
# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               cluster
  PV Size               50.00 GB / not usable 4.00 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              12799
  Free PE               5119
  Allocated PE          7680
  PV UUID               TVIJTI-4LBE-Wp1r-vQLb-mqVP-BLlO-5zpF5K

(c) Create the volume group cluster on the physical volume.

# vgcreate cluster /dev/sdb
# vgdisplay
  --- Volume group ---
  VG Name               cluster
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               50.00 GB
  PE Size               4.00 MB
  Total PE              12799
  Alloc PE / Size       7680 / 30.00 GB
  Free  PE / Size       5119 / 20.00 GB
  VG UUID               XuFRIR-Lku8-iegf-51ZT-bZUd-dzvf-JkS93n

(d) For OpenVZ create the 15GB sized logical volume openvz in the volume group.

# lvcreate -L 15G -n openvz cluster
# lvdisplay
  --- Logical volume ---
  LV Name                /dev/cluster/openvz
  VG Name                cluster
  LV UUID                rDaZ55-0o9b-PQPl-wAN9-qyNC-8AwD-MfJ1mM
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                15.00 GB
  Current LE             3840
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2

(e) If Xen is used as virtualization technology create a logical volume of 15GB named xen in the volume group cluster.

# lvcreate -L 15G -n xen cluster

4.  Edit /etc/lvm/lvm.conf and change locking type: locking_type = 3

The logical volume /dev/cluster/<openvz,xen> is now ready to be formated with the appropriate file system.

5.  Establish GFS on the shared storage (/dev/cluster/<openvz,xen>) with lock_dlm as locking protocol:

# mkfs.gfs -p lock_dlm -t openvz:openvz -j 3 /dev/cluster/openvz

# mkfs.gfs -p lock_dlm -t xen:xen -j 3 /dev/cluster/xen

Remark: GFS2 is not supported in the current XEN-kernel therefore GFS has to be installed.

6. Edit /etc/fstab and add the shared storage for the Xen or OpenVZ cluster:

/dev/cluster/xen /xen gfs defaults,noatime 0 0 /dev/cluster/openvz /vz gfs defaults,noatime 0 0

7.  Edit cluster.conf and add the LVM resource for the Xen or OpenVZ cluster.

<rm> <resources>

<lvm name="lvm" vg_name="cluster" lv_name="<xen,openvz>"/> </resources> </rm>

8.  Distribute the configuration file to all other nodes of the cluster.

# ccs_tool update /etc/cluster/cluster.conf

9.  Start the cluster services.

# service cman start
# service clvmd start
# service gfs start
# service rgmanager start

10. Activate the cluster and GFS services in the runlevels.

# chkconfig cman on
# chkconfig clvmd on 
# chkconfig gfs on
# chkconfig rgmanager on

2.4 Virtualization

In our project we used two different virtualization technologies: OpenVZ and Xen. The following sections describe the installation and configuration of these technologies.

2.4.1 OpenVZ

The following steps have to be executed on each OpenVZ node:

1.  Install OpenVZ repository.

# cd /etc/yum.repos.d
# wget http://download.openvz.org/openvz.repo
# rpm --import  http://download.openvz.org/RPM-GPG-Key-OpenVZ

2.  Install OpenVZ-patched kernel.

#  yum install ovzkernel

3.  Check if boot loader was configured automatically.

title CentOS (2.6.18-8.1.14.el5.028stab045.1)
       root (hd0,0)
       kernel /vmlinuz-2.6.18-8.1.14.el5.028stab045.1 ro root=/dev/VolGroup00/LogVol00
       initrd /initrd-2.6.18-8.1.14.el5.028stab045.1.img

4.  Change kernel parameters as followed:

net.ipv4.ip_forward = 1
net.ipv4.conf.all.rp_filter = 1
kernel.sysrq = 1
net.ipv4.conf.all.send_redirects = 0

5.  Rebooted system and installed OpenVZ tools.

# yum install vzctl vzquota

6.  OpenVZ is now ready to be started.

# service vz start
Starting OpenVZ:                                           [  OK  ]
Bringing up interface venet0:                              [  OK  ]
Configuring interface venet0:                              [  OK  ]

7. Create an OS template cache.

# yum install vzpkg vzyum vzrpm43-python vzrpm44-python

8.  Install OS template metadata (CentOS 4, Fedora Core 7).

# yum install vztmpl-centos-4

# yum install vztmpl-fedora-7

9.  Installing repository cache.

# vzpkgcache fedora-7-i386-minimal

# vzpkgcache centos-4-i386-minimal

2.4.2 Xen

The following steps have to be executed on each Xen node:

# yum install xen-libs.i386 3.0.3-25.0.4.el5
# yum install xen.i386 3.0.3-25.0.4.el5
# yum install kernel-xen.i686 2.6.18-8.1.15.el5
# yum install kmod-gfs-xen.i686 0.1.16-6.2.6.18_8.1.15.el5

After installation an entry should be added to grub.conf. This entry tells GRUB where to find Xen and what boot parameters should be passed to it. Reboot the nodes with the new grub entry to enable Xen. Detailed information about Xen can be found in the Xen Users Manual (http://bits.xensource.com/Xen/docs/user.pdf).

2.5 Load balancer

2.5.1     Packages

Install the RPM package on each cluster node.

1.   OpenVZ nodes:

rpm -ivh lbvm-1.0-1.rpm lbvm-openvz-1.0-1.i386.rpm

2.  Xen nodes:

rpm -ivh lbvm-1.0-1.rpm lbvm-xen-1.0-1.i386.rpm

2.5.2     Conflgure and register Services

1.  Edit cluster.conf to add the monitoring Service.

<service name="monitoring">
    <lbvm name="lbvm1" />
</service>

2.  Distribute the configuration file to all other nodes of the cluster. Be sure to also increase the version number.

# ccs_tool Update /etc/cluster/cluster.conf

2.5.3  Add VM

To add a virtual machine to the load balancer use the "load balancer manager" lbm on an existing Xen or OpenVZ guest. The lbm modifies the cluster.conf, therefore the distribution of the cluster configuration is required.

# lbm -c /etc/vz/conf/123.conf
	
	copy config file + link ........... done
	update lbvm.conf .................. done
	update cluster.conf ............... done

	run 'ccs_tool update' to upgrade to new cluster config file!