Back to home page

Enduro/X

 
 

    


0001 SYNOPSIS
0002 --------
0003 *exsinglesv*
0004 *exsingleckl*
0005 *exsingleckr*
0006 
0007 DESCRIPTION
0008 -----------
0009 *exsinglesv* is a special XATMI server used for providing lock services
0010 for singleton process groups. For each singleton group, one copy of the server
0011 is required to be configured.
0012 
0013 The lock server supports file system-based fnctl() locking. When process
0014 acquires such lock, it reports the status to the shared memory, so that
0015 *ndrxd(8)* and *cpmsrv(8)* can do the booting or failover sequences. At the
0016 lock time, secondary "heartbeat" lock file is updated, with the maximum value
0017 of counters seen in the file and incrementing that by some value larger than *1*
0018 (e.g. set by *NDRX_SGLOCKINC* *ex_env(5)* parameter).
0019 This ensures that if for some error reason another node continues to
0020 do the heartbeats, it would see that it has lost the lock.
0021 
0022 The lock server periodically checks the lock health, by doing the following actions:
0023 
0024 * Checking of the primary lock file and additionally doing regular counter increments
0025 stored in secondary heartbeat (ping) lock file. Each cluster node stores two checksumed
0026 values of current counter for it's own node id. Currently active *exsinglesv*
0027 instance verifies that it's counter is highest of all counters found in the file.
0028 If condition is false, group is unlocked and process exits with failure.
0029 
0030 * Checks include lock state verification against process state and shared memory
0031 state.
0032 
0033 In case if any checks fail, a singleton process group in shared memory
0034 (if was locked) and lock files are unlocked, and the process exits with failure.
0035 
0036 Lock provider distinguishes two locking modes:
0037 
0038 1. If the process is started by normal boot (i.e. it does not respawn after crash),
0039 the process locks the primary file and immediately reports the locking status to the shared memory.
0040 
0041 2. If the process is re-spawned after the crash, it locks the primary file, however, it
0042 does not report the status immediately to the shared memory. Instead, it waits
0043 for the number of lock health checks (basically time delay) and after that,
0044 the group becomes locked. This is done for the reason, of letting the
0045 other (previously active) Enduro/X node kill the processes for the group that lost the lock.
0046 
0047 The status for the process groups can be checked by running the following *xadmin(8)*
0048 command:
0049 
0050 --------------------------------------------------------------------------------
0051 $ xadmin psg
0052 --------------------------------------------------------------------------------
0053 
0054 *exsinglesv* has two user exits which may be set in the configuration
0055 file, parameters: *exec_on_bootlocked* and *exec_on_locked*. The first program is executed in
0056 case if lock was acquired during the normal startup, and the second (*exec_on_locked*)
0057 is executed in case if failover has happened
0058 (lock files which were locked by other node, were unlocked) or if *exsinglesv*
0059 did crash, was restarted and acquired the lock. If programs
0060 specified by these flags return non *0* status, the *exsinglesv* would unlock
0061 any resources it locked and exits with failure.
0062 
0063 *exsingleckl* provides cluster consistency check service used by *tmsrv(8)*,
0064 in case if in *ndrxconfig.xml(5)* tag *<procgroups>/<procgroup>* attribute
0065 *sg_nodes_verify* is set to *Y*. For load balancing service can be started in
0066 multiple copies. Separate process instances are required for each of the
0067 singleton groups. 
0068 
0069 Advertised service name is "@SGLOC-<node_id>-<grpno>". When request is received
0070 by the service, it check the if local node process group is locked, and if
0071 so it tries to access network services for other nodes, to verify group
0072 statuses (they all must not be locked). If network service is not available
0073 for the node, checks are done in read-only mode against heartbeat file (*lockfile_2*).
0074 The current group must have the highest counter number for the whole cluster.
0075 If that is not met, the group is unlocked and service reports group not locked.
0076 
0077 *exsingleckr* provides remote check services for the locks. Requests are
0078 received from *exsinglesv* and *exsingleckl* servers via *tpbridge(8)*.
0079 Remote check service read local shared memory singleton process group table
0080 and provides group status back to the caller.
0081 Service name "@SGREM-<node_id>-<grpno>" is advertised. Several instances of
0082 the server can be started. This service is optional, however for performance reasons,
0083 it is recommended to configure *exsingleckr*, regardless of the
0084 *sg_nodes_verify* attribute value, as *exsinglesv* during the periodic checks,
0085 always attempts to perform network call firstly.
0086 
0087 Binaries *exsinglesv*, *exsingleckl* and *exsingleckr* uses the same
0088 configuration section. For these XATMI servers, tag
0089 *<procgrp_lp>* must be set to group name to which they provide
0090 locking services. However sections may be separated by CCTAGs.
0091 
0092 For more detailed setup of the singleton groups, functions and limitations,
0093 please see more ex_adminman(guides)(Enduro/X Administration Manual, High availability features) section.
0094 
0095 Binary is available for Enduro/X version *8.0.10*.
0096 
0097 CONFIGURATION
0098 -------------
0099 
0100 Process requires common configuration (ini-files) to be set up for the Enduor/X instance,
0101 where *exsinglesv* is booted.
0102 
0103 Process reads settings from "[*@exsinglesv*[/<NDRX_CCTAG>]]" section.
0104 
0105 *lockfile_1=*'FILE_PATH'::
0106 Primary lock file-path on which fcntl() advisory write lock is acquired.
0107 parameter is mandatory to be set. If file is missing, it will be created.
0108 For distributed servers, path to file must be located on the shared file system,
0109 such as GPFS, GFS2 or any other shared file system which supports fcntl().
0110 
0111 *lockfile_2=*'FILE_PATH'::
0112 Heartbeat lock file, which periodically (by the *chkinterval*) performs lock
0113 counter update to the file. Additionally file is used for checks for remote
0114 node lock status, in case if remote check service (@SGREM) is not available,
0115 or *noremote* parameter is set to *1*. File system shall be writable by the process.
0116 If file is missing, it will be created. File is expected to point to the
0117 same directory where *lockfile_1* is located.
0118 
0119 *exec_on_bootlocked*='FILE_PATH'::
0120 Optional user exit, which is executed in case if lock was 
0121 acquired during the normal startup. The program file is executed by system()
0122 call. If the program returns non *0* status, the *exsinglesv* would unlock files and
0123 restart.
0124 
0125 *exec_on_locked*='FILE_PATH'::
0126 Optional user exit, which is executed in case if lock was 
0127 acquired during the failover or after the crash. The program file is executed by system()
0128 call. If the program returns non *0* status, the *exsinglesv* would unlock files and
0129 restart.
0130 
0131 *chkinterval*='NUMBER_OF_SECONDS'::
0132 Interval in seconds between lock checks. Parameter is optional.
0133 The default value is extracted from *NDRX_SGREFRESH* environment variable, 
0134 which value is divided by *3*. The final parameter value must be greater than *0*. 
0135 Additionally ULOG warning will be generated
0136 in case if check interval multiplied by 3 is longer time specified in *NDRX_SGREFRESH*.
0137 setting. Note that the default *NDRX_SGREFRESH* value is *30* seconds, in such case
0138 without specified parameter default check interval is calculated as *10* sec.
0139 
0140 *locked_wait*='NUMBER_OF_CHECKS'::
0141 The number of check cycles after which file locked status is reported to the
0142 shared memory of the Enduro/X application domain. This parameter is only
0143 used in case if it acquired the *lockfile_1* lock when binary was already running
0144 (e.g. failover happens for other node to this) or *exsinglesv* by it self
0145 for some reason was reloaded to due to crash (failed exit, etc). At normal
0146 application domain boot, this wait does not apply and file lock is reported
0147 immediately to the shared memory. The parameter is optional, and the default
0148 value is calculated as *NDRX_SGREFRESH* value divided by *chkinterval* and multiplied by 2.
0149 If using all defaults, then this value is set to *6*. This means that after
0150 the failover, that singleton group depending on this lock provider 
0151 will be booted only after *60* seconds.
0152 
0153 *noremote*='NO_REMOTE_SETTING'::
0154 If set to *1*, *exsinglesv* and *exsingleckl* checks cluster lock status
0155 in heartbeat file, instead of doing remote call to remote machines. For such
0156 file systems as GPFS, this might give performance benefit (as local checks
0157 on this file system is faster than doing remote call). Need for setting the flag
0158 shall be evaluated on test basis for the give system configuration. Default
0159 value is *0*, meaning that firstly cluster check will attempt to call remote
0160 services and if that fail, do checks is Heartbeat file.
0161 
0162 
0163 LIMITATIONS
0164 -----------
0165 
0166 If doing manual *xadmin(8)* command based *exsinglesv* start (or restart/sreload)
0167 on booted application, the return from command might delay, as depending on current
0168 *ndrxd(8)* sanity cycle, the singleton process group might pass for startup and
0169 only then xadmin will return results to command line back. However this is subject
0170 to change for future releases, where after the first boot all start/stop operations
0171 on the *exsinglesv* might be considered as failover recovery. Currently manual start
0172 stops, are assumed and locked as fresh boot operations (i.e. doing immediate lock
0173 of the group).
0174 
0175 
0176 EXIT STATUS
0177 -----------
0178 *0*::
0179 Success
0180 
0181 *1*::
0182 Failure
0183 
0184 
0185 EXAMPLE
0186 -------
0187 
0188 This section demonstrates simple configuration for one group. Note that
0189 such configuration shall match an all involved cluster nodes which
0190 serves the given singleton group.
0191 
0192 *ndrxconfig.xml* demonstrates configuration for the group named "GRPV":
0193 
0194 ---------------------------------------------------------------------
0195 <?xml version="1.0" ?>
0196 <endurox>
0197     <procgroups>
0198         <procgroup grpno="5" name="GRPV" singleton="Y" sg_nodes="1,4" sg_nodes_verify="Y"/>
0199     </procgroups>
0200     <servers>
0201 
0202         <!-- lock provider for group 5 -->
0203         <server name="exsinglesv">
0204             <!-- only one lock provider for the group! -->
0205             <min>1</min>
0206             <max>1</max>
0207             <srvid>10</srvid>
0208             <sysopt>-e ${NDRX_ULOG}/exsinglesv.log -r</sysopt>
0209             <procgrp_lp>GRPV</procgrp_lp>
0210             <cctag>GRPVCCT</cctag>
0211         </server>
0212 
0213         <!-- support servers, local -->
0214         <server name="exsingleckl">
0215             <min>10</min>
0216             <max>10</max>
0217             <srvid>15</srvid>
0218             <sysopt>-e ${NDRX_ULOG}/exsingleckl.log -r</sysopt>
0219             <procgrp_lp>GRPV</procgrp_lp>
0220             <cctag>GRPVCCT</cctag>
0221         </server>
0222 
0223         <!-- support servers, remote -->
0224         <server name="exsingleckr">
0225             <min>10</min>
0226             <max>10</max>
0227             <srvid>30</srvid>
0228             <sysopt>-e ${NDRX_ULOG}/exsingleckr.log -r</sysopt>
0229             <procgrp_lp>GRPV</procgrp_lp>
0230             <cctag>GRPVCCT</cctag>
0231         </server>
0232         
0233         <!-- banksv1 is configured as singleton in the cluster -->
0234         <server name="banksv1">
0235             <min>1</min>
0236             <max>1</max>
0237             <srvid>120</srvid>
0238             <sysopt>-e ${NDRX_ULOG}/banksv1.log -r</sysopt>
0239             <procgrp>GRPV</procgrp>
0240         </server>
0241         
0242         ...
0243 
0244         <!-- for demo purposes, we show configuration for client daemon processes too -->
0245         <server name="cpmsrv">
0246             <min>1</min>
0247             <max>1</max>
0248             <srvid>9999</srvid>
0249             <sysopt>-e ${NDRX_ULOG}/cpmsrv.log -r -- -k3 -i1</sysopt>
0250         </server>
0251         
0252     </servers>
0253     <clients>
0254         <!-- bankcl is also singleton in the cluster -->
0255         <client cmdline="bankcl" procgrp="GRPV">
0256             <exec tag="BANK1" subsect="1" autostart="Y" log="${NDRX_ULOG}/bankcl-1.log"/>
0257         </client>
0258         ...
0259     </clients>
0260 </endurox>
0261 ---------------------------------------------------------------------
0262 
0263 *app.ini*
0264 
0265 ---------------------------------------------------------------------
0266 ...
0267 [@exsinglesv/GRPVCCT]
0268 lockfile_1=/path/to/shared/file/system/GRPV_lock_1
0269 lockfile_2=/path/to/shared/file/system/GRPV_lock_2
0270 ...
0271 ---------------------------------------------------------------------
0272 
0273 BUGS
0274 ----
0275 Report bugs to support@mavimax.com
0276 
0277 SEE ALSO
0278 --------
0279 *ex_env(5)* *ndrxconfig.xml(5)* *xadmin(8)* *ex_adminman(guides)*
0280 
0281 COPYING
0282 -------
0283 (C) Mavimax, Ltd