It’s more difficult and reliability solution in comparison with my previous one. And it’s suits for production.
From wikipedia we know that:
High-availability clusters (also known as HA Clusters or Failover Clusters) are computer clusters that are implemented primarily for the purpose of providing high availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server.
We try to make cluster for production system which provide web services and consisted of 2 nodes. Let’s call it srv1 and srv2 as hostnames.
There is ifconfig for srv1 :
eth0 Link encap:Ethernet HWaddr 08:00:27:7B:7E:40
inet addr:10.0.30.1 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::a00:27ff:fe7b:7e03/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:347 errors:1 dropped:0 overruns:0 frame:0
TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:86003 (83.9 KiB) TX bytes:8214 (8.0 KiB)
Interrupt:11 Base address:0xc020
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:560 (560.0 b) TX bytes:560 (560.0 b)
and for srv2:
eth0 Link encap:Ethernet HWaddr 08:00:27:7B:7E:03
inet addr:10.0.30.2 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::a00:27ff:fe7b:7e03/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:347 errors:1 dropped:0 overruns:0 frame:0
TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:86003 (83.9 KiB) TX bytes:8214 (8.0 KiB)
Interrupt:11 Base address:0xc020
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:560 (560.0 b) TX bytes:560 (560.0 b)
As you can see there is no virtual interfaces for now. Let’s look at hosts file it should be the same for both of nodes:
cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 srv2 localhost.localdomain localhost 10.0.30.1 srv1 10.0.30.2 srv2
Ok now we have to installed heartbeat.
The centos, fedora core and RH users able to use this command:
yum install heartbeat
For debian users this one :
apt-get install heartbeat
For other distros you can do it according to your distros package manager documentation or compile from sources. The sources are always available at official site of the project.
Copy configuration files from examples into your work directory(default for heartbeat is /etc/ha.d):
cp /usr/share/doc/heartbeat*/hareshare /etc/ha.d/ cp /usr/share/doc/ha.cf /hareshare /etc/ha.d/ cp /usr/share/doc/heartbeat*/authkeys /etc/ha.d/
And edit it like this :
cat /etc/ha.d/ha.cf debugfile /var/log/ha-debug
# set debug log
logfile /var/log/ha-log
# set common log
logfacility local0
# set syslog channel
keepalive 2
# time between checks
udpport 694
# port where heartbeat will be listened on
bcast eth0
# interface for broadcast message
auto_failback on
#listing of node. Keep in mind that names of node should be as `uname -n` for each nodes
node srv1
node srv2
All configuration files is pretty documented by developing team. Therefore I will not describe it’s so deeply. Next config file is haresources:
cat /etc/ha.d/haresources active 10.0.30.3 httpd script1 script2
this consist of three main fields:
1-st – It’s still not clear for me. (As I understand any name can be used here as well).
2-nd – the ip address for virtual interface
3-rd – the name of scripts or script located at /etc/init.d which should be brought up in case of crash active node.
Unfortunately in the event of crashing one of the defined services heartbeat doesn’t migrate cluster group to another node:( In other words if your web site is goes down the current node will stay active. Maybe this feature is available, but I have found nothing in official documentation about this issue.
And last one config file. It has quite simple configuration which consist of 2 lines. And uses for determine which encryption has to be used.
cat /etc/ha.d/authkeys auth 1 1 sha1 "HI!"
“sha1″ provide connection with encryption as well as the “md5″. To switch off encryption set “crc” instead of “sha1″.
Now try to start it:
/etc/init.d/heartbeat start
If it’s start without any errors you able run it at second node. Copy your edited config files:
scp /etc/hosts 10.0.30.2:/etc/ scp /etc/ha.d/ha* 10.0.30.2:/etc/ha.d/ scp /etc/ha.d/authkeys 10.0.30.2:/etc/ha.d/
And go to the second node to start heartbeat there:
srv2:#/etc/init.d/heartbeat start
If you done everything right -> one of the your node should be has
eth0:0
interface which indicate active node. Its looks like this:
srv1:# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 08:00:27:7B:7E:03
inet addr:10.0.30.3 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:11 Base address:0xc020
The last one step in our configuration is checking the cluster. Go to active node and run:
reboot
In several seconds you will see that httpd processes were started and
eth0:0
interface is appears at another node.
srv2:# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 08:00:27:7B:7E:03
inet addr:10.0.30.3 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:11 Base address:0xc020
ps uax | grep http | grep -v grep root 2300 0.0 3.6 23004 9424 ? Ss 02:37 0:01 /usr/sbin/httpd apache 2323 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2324 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2325 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2326 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2327 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2328 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2331 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd apache 2332 0.0 1.8 23004 4780 ? S 02:37 0:00 /usr/sbin/httpd
Looks like everything done well.