Two sle11sp4 were installed on different sites with a stretched network. The SuSE HA solution can not be implemented there because of the unreliable non-duplicated connection between the sites. Therefore, we will configure the HANA System Replication. The fail-over will be performed manually using the prepared scripts. Virtual IP will follow the active instance using additional scripts.
Configure NTP and name resolution. Even if you have a reliable DNS, put everything associated with the cluster in /etc/hosts, and copy it between the nodes:
# cat /etc/hosts 127.0.0.1 localhost 192.168.80.31 vhana01.domain.local vhana01 192.168.80.32 vhana02.domain.local vhana02 192.168.80.33 vhana.domain.local vhana
Exchange root SSH keys and copy host SSH keys from one to another.
The installation of SAP products must be performed using the hdblcm tool. You can also use this tool to perform other post installation tasks, such as renaming a host.
Locate the hdblcm tool:
root@vhana01:~ # find /mnt/cdrom -type f -name hdblcm /mnt/cdrom/HANA/DATA_UNITS/SAP HANA DATABASE 1.0 FOR B1/LINX64SUSE/SAP_HANA_DATABASE/hdblcm root@vhana01:~ # HDBLCM=$(find /mnt/cdrom -type f -name hdblcm)
The first command found the only tool on my installation media, so I assigned the result to the variable to shorten the commands.
You can generate the configuration file template as follows:
root@vhana01:~ # "$HDBLCM" --action=install --dump_configfile_template=install.ini perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "C.utf8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). SAP HANA Lifecycle Management - SAP HANA 1.00.122.05.1481577062 *************************************************************** Scanning Software Locations... Detected components: SAP HANA Database (1.00.122.05.1481577062) in /mnt/cdrom/HANA/DATA_UNITS/SAP HANA DATABASE 1.0 FOR B1/LINX64SUSE/SAP_HANA_DATABASE/server Config file template '/root/install.ini' written Password file template '/root/install.ini.xml' written Configuration file template created.
Now you can edit it. The default values can be omitted. Here is an example of my file:
root@vhana01:~ # cat /root/install.ini [General] component_root=/mnt/cdrom/HANA/DATA_UNITS components=all ignore=check_signature_file [Server] root_password=ROOT_PASSWORD sid=PRD number=00 sapadm_password=SAPADM_PASSWORD password=INSTANCE_OWNER_PASSWORD system_user_password=SYSTEM_PASSWORD autostart=y isc_mode=standard [Action] action=install
Install SAP HANA on both nodes with same SID (PRD) and NUM (00):
# "$HDBLCM" --configfile=install.ini --ignore=check_signature_file -b .. Note: Deployment of SAP Host Agent configurations finished with errors
I will not show here the whole output, it is very long. Check the log files for errors. Despite the error message about the Host Agent, everything seems to be working. Make sure that users created by the installer have the same UID on both servers.
A full backup is required to initialize the replication. Create it on the node, which should be the master.
root@vhana01:~ # su - prdadm vhana01:~> hdbsql -u SYSTEM -i 00 "BACKUP DATA USING FILE ('/hana/backup/FULL')" Single Sign-On authentication failed Password: 0 rows affected (overall time 6328.477 msec; server time 6327.586 msec)
Enable system replication on the master node, give it a logical name related to the location of the site:
vhana01:~> hdbnsutil -sr_enable --name=PRODSITE checking for active nameserver ... nameserver is active, proceeding ... successfully enabled system as system replication source site done. vhana01:~> hdbnsutil -sr_state checking for active or inactive nameserver ... System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: primary site id: 1 site name: PRODSITE Host Mappings: ~~~~~~~~~~~~~~ done.
Stop the secondary database:
root@vhana02:~ # su - prdadm vhana02:~> HDB stop hdbdaemon will wait maximal 300 seconds for NewDB services finishing. Stopping instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400 18.12.2017 14:21:51 Stop OK Waiting for stopped instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2 18.12.2017 14:22:11 WaitforStopped OK hdbdaemon is stopped.
Register the secondary system:
vhana02:~> hdbnsutil -sr_register \ --remoteHost=vhana01 \ --remoteInstance=00 \ --replicationMode=async \ --operationMode=logreplay \ --name=DRSITE adding site ... checking for inactive nameserver ... nameserver vhana02:30001 not responding. collecting information ... updating local ini files ... done. vhana02:~> HDB start .. vhana02:~> hdbnsutil -sr_state checking for active or inactive nameserver ... System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: async site id: 2 site name: DRSITE active primary site: 1 Host Mappings: ~~~~~~~~~~~~~~ vhana02 -> [PRODSITE] vhana01 vhana02 -> [DRSITE] vhana02 primary masters:vhana01 done.
The hdbnsutil -sr_state command also works on primary site.
vhana01:~> hdbnsutil -sr_state --sapcontrol=1 checking for active or inactive nameserver ... SAPCONTROL-OK: <begin> online=true mode=primary site id=1 site name=PRODSITE mapping/vhana01=PRODSITE/vhana01 mapping/vhana01=DRSITE/vhana02 SAPCONTROL-OK: <end> done.
Much more information you can get with the another command:
vhana01:~> hdbcons -e hdbindexserver "replication info"
There are many interesting features for tuning the replication connection.
First, it is the ability to replicate not only the data, but also the settings in the "ini" files. This is configured in the global.ini file in the [inifile_checker] section, the replicate parameter should be set to "true" when the default is "false" You can change it online on the master node using HANA Studio or hdbsql
vhana01:~> hdbsql -u SYSTEM -i 00 \ "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') SET ('inifile_checker','replicate') = 'true'" Single Sign-On authentication failed Password: 0 rows affected (overall time 1467 usec; server time 698 usec) vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini [inifile_checker] replicate = true [persistence] basepath_datavolumes = /hana/data/PRD basepath_logvolumes = /hana/log/PRD [system_replication] mode = primary actual_mode = primary site_id = 1 site_name = PRODSITE
Another feature is traffic compression. This can be very important for a connection between two sites. Two enable_log_compression and enable_data_compression parameters are part of the [system_replication] section in the global.ini file. This section of the file is not replicated by the turned on earlier replication. According to the documentation, the definitions should be on the secondary side, but if you think about it, when you switch between sites, the primary node becomes secondary. Therefore, we manually add definitions on both sides.
Stop both instances, fix global.ini file and start both instances.
vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini .. [system_replication] enable_log_compression = true enable_data_compression = true enable_log_retention = auto ..
The third enabled parameter enable_log_retention is good to set auto to keep unshipped logs on primary node. This can explode FS, but will ensure the data integrity.
Take-over is performed on the secondary node. This command will not attempt to connect to the main site, stop the main database, or switch it to the slave mode. It will only promote the secondary node, it will become a standalone primary server. As a result, if the previous node is still alive, you will get two active databases with a potential split brain.
So, a correct takeover procedure should looks like:
Now we will do this manually, then will use the scripts.
root@vhana01:~ # su - prdadm vhana01:/usr/sap/PRD/HDB00> HDB stop
root@vhana02:~ # su - prdadm vhana02:/usr/sap/PRD/HDB00> hdbnsutil -sr_takeover checking local nameserver ... done.
root@vhana01:~ # su - prdadm vhana01:/usr/sap/PRD/HDB00> hdbnsutil -sr_register \ --remoteHost=vhana02 \ --remoteInstance=00 \ --replicationMode=async \ --operationMode=logreplay \ --name=PRODSITE adding site ... checking for inactive nameserver ... nameserver vhana01:30001 not responding. collecting information ... updating local ini files ... done. vhana01:/usr/sap/PRD/HDB00> HDB start
A semi-automatic fail-over script will be here.
If logs becomes full due to secondary failed for long time and you want to drop replication:
Unregister the secondary system:
hdbnsutil -sr_unregister
On primary system disable system replication:
hdbnsutil -sr_disable