Tuning Redundant Engine attributes
- Last UpdatedAug 14, 2025
- 6 minute read
Multiple variables (I/O points, number of objects, number of historized attributes, DIObject distribution) are involved in the detection and execution of a Redundant AppEngine Failover. The following tables describes some key Engine attribute values that can be modified to ensure proper failover performance.
AppEngine Object Settings
|
Parameter |
Forced failover timeout |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.ForcedFailoverTimeout |
|
Description |
The maximum allowed time, in milliseconds, for a standby engine to become active after a forced failover has been initiated using the ForceFailoverCmd attribute. If the standby engine does not become active within this time period, the engine reverts to the active engine. |
|
Default |
90,000 ms (90 seconds) |
|
Tuning |
30,000 ms (less than 3,000 I/O) 45,000 to 240,000 ms (from 3,000 I/O at to 40,000 I/O) 300,000 ms (more than 40,000 I/O) |
|
Notes |
I/O values represent the load on the individual AppEngine, not the Galaxy size. If setting is too small, forced failover will not succeed. If setting is too large, failure will not be detected in a timely manner. Tuning values represent a range that can be adjusted as required. |
|
Parameter |
Maximum checkpoint deltas buffered |
|
Editor Tab |
Not shown, edit Attribute value if necessary |
|
Attribute |
Redundancy.CheckpointDeltasBufferedMax |
|
Description |
The maximum number of checkpoint deltas that can be buffered before a full checkpoint synchronization is performed. |
|
Default |
0 |
|
Tuning |
N/A |
|
Notes |
N/A |
|
Parameter |
Maximum alarm state changes buffered |
|
Editor Tab |
Parameter not shown, edit Attribute value if necessary |
|
Attribute |
Redundancy.AlarmStateChangesBufferedMax |
|
Description |
The maximum number of alarm state changes that can be buffered before a full snapshot of the alarm state changes for the engine is performed. |
|
Default |
0 |
|
Tuning |
N/A |
|
Notes |
N/A |
|
Parameter |
Active engine heartbeat period |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.ActiveHeartbeatPeriod |
|
Description |
The time interval, in milliseconds, at which heartbeats are sent by the failover service on the active engine to the failover service on the standby engine via RMC. |
|
Default |
1000 ms (1 second) |
|
Tuning |
May be increased to avoid false failovers. |
|
Notes |
N/A |
|
Parameter |
Standby engine heartbeat period |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.StandbyHeartbeatPeriod |
|
Description |
The time interval, in milliseconds, at which heartbeats are sent by the failover service on the standby engine to the failover service on the active engine via RMC. |
|
Default |
1000 ms (1 second) |
|
Tuning |
May be increased to avoid false failovers. |
|
Notes |
N/A |
|
Parameter |
Maximum consecutive heartbeats missed from Active engine |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.ActiveHeartbeatsMissedConsecMax |
|
Description |
The maximum number of heartbeats from the active engine that can be missed before a bad connection is assumed by the standby engine via RMC. For example, if the maximum consecutive heartbeats missed from active engine is configured as 5, and the active engine heartbeat period is configured as 1000 milliseconds, then the standby engine will assume a bad connection from the active engine if no heartbeats are received within five seconds. |
|
Default |
5 |
|
Tuning |
5 (less than 3,000 I/O) 10 to 30 (from 3,000 I/O to 40,000 I/O) ~60 (more than 40,000 I/O) |
|
Notes |
I/O values represent the load on the individual AppEngine, not the Galaxy size. Setting this value too low produces false failovers. Setting this value too high results in slow detection of a required failover. |
|
Parameter |
Maximum consecutive heartbeats missed from Standby engine |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.StandbyHeartbeatsMissedConsecMax |
|
Description |
The maximum number of heartbeats from the standby engine that can be missed before a bad connection is assumed by the active engine. If a bad connection is detected, the active engine will switch to the "Active - Standby Not Available" state via RMC. For example, if the maximum consecutive heartbeats missed from the standby engine configured as 5, and the standby engine heartbeat period is configured as 1000 milliseconds, then the active engine assumes a bad connection from the standby engine if no heartbeats are received within five seconds. |
|
Default |
5 |
|
Tuning |
5 (less than 3,000 I/O) 10 to 30 (from 3,000 I/O to 40,000 I/O) ~60 (more than 40,000 I/O) |
|
Notes |
I/O values represent the load on the individual AppEngine, not the Galaxy size. Setting this value too low produces false failovers. Setting this value too high results in slow detection of a required failover. |
|
Parameter |
Maximum time to maintain good quality after failure |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.StandbyActivateTimeout |
|
Description |
The maximum time period, in milliseconds, after the active engine fails before subscribed references to it are set to "uncertain." |
|
Default |
15,000 ms (15 seconds) |
|
Tuning |
15,000 ms (less than 3,000 I/O) 120,000 ms (from 3,000 I/O to 40,000 I/O) 150,000 ms (more than 40,000 I/O) |
|
Notes |
I/O values represent the load on the individual AppEngine, not the Galaxy size. Assuming remote I/O, setting the value too low causes all I/O references to unsubscribe, then resubscribe on failover. The optimum setting ensures that remote I/O references are preserved for failover. This behavior also applies in the RDI Object context. |
|
Parameter |
Maximum time to discover partner |
|
Editor Tab |
Redundancy |
|
Attribute |
Redundancy.PartnerConnectTimeout |
|
Description |
The maximum time period, in milliseconds, allowed for the connection to the failover partner to be established before the failover partner state is set to "unknown." |
|
Default |
15,000 ms (15 seconds) |
|
Tuning |
N/A |
|
Notes |
N/A |
|
Parameter |
Restart engine when it fails |
|
Editor Tab |
Parameter not shown, can be viewed in Attribute tab |
|
Attribute |
Engine.RestartOnFailure |
|
Description |
The AppEngine object automatically attempts to restart if a failure occurs. |
|
Default |
True |
|
Tuning |
N/A |
|
Notes |
This behavior cannot be changed, even if the attribute is set to false. |
|
Parameter |
Checkpoint period |
|
Editor Tab |
General |
|
Attribute |
Scheduler.CheckpointPeriod |
|
Description |
Checkpointing saves run-time attribute values. The checkpoint period is the time, in milliseconds, at which checkpointing is performed. The default checkpoint period is 10,000 ms. If set to 0, the checkpoint period defaults to the scan period, but may occur at a slower rate (it is done as fast as possible as a background task). The minimum checkpoint interval for retentive attributes is 10,000 ms. Retentive attributes are defined as those attributes configured as calculated retentive, or object- or user-writeable. If the checkpoint period is set to less than 10,000 ms, retentive attributes will not be saved at every checkpoint. For example, if the checkpoint period is set to 4,000 ms, retentive attribute values will only be saved at every third checkpoint (4,000 x 3 = 12,000 ms). Retentive attributes retain the last value set during run time, and the run-time value is saved across redeployments. Non-retentive attributes revert to their configured values at redeployment. |
|
Default |
10,000 ms (10 seconds) |
|
Tuning |
10,000 ms (up to 3,000 I/O 20,000 ms (up to 20,000 I/O) 60,000 ms (more than 20,000 I/O) |
|
Notes |
I/O values represent the load on the individual AppEngine, not the Galaxy size. Setting this value too low results in high resource usage. Setting this value too high means that if both partners fail, checkpointed data may not be current. |
WinPlatform Object Settings
|
Parameter |
NMX heartbeat period |
|
Editor Tab |
General |
|
Attribute |
NetNMXHeartbeatPeriod |
|
Description |
The time interval, in milliseconds, at which heartbeats are sent to other platforms. Heartbeats will only be established between platforms if a publish/subscribe relationship exists between engines on the platforms. For example, if an engine on WinPlatformA is subscribed to data from an engine on WinPlatformB, then heartbeats will be sent between WinPlatformA and WinPlatformB. WinPlatformA will send heartbeats to WinPlatformB at the rate specified by the WinPlatformA NetNMXHeartbeatPeriod attribute. WinPlatformB will send heartbeats to WinPlatformA at the rate specified by the WinPlatformB NetNMXHeartbeatPeriod attribute. |
|
Default |
2,000 ms (2 seconds) |
|
Tuning |
Use the default value a platform object with a low I/O count (up to 3,000). |
|
Notes |
I/O values represent the load on individual AppEngines, not the Galaxy size |
|
Parameter |
Consecutive number of missed NMX heartbeats allowed |
|
Editor Tab |
General |
|
Attribute |
NetNMXHeartbeatsMissedConsecMax |
|
Description |
The maximum number of consecutive heartbeats that are allowed to be missed from a platform before a platform communication error is generated for that platform. For example, assume an engine on WinPlatformA is subscribed to data from an engine on WinPlatformB. If the NetNMXHeartbeatsMissedConsecMax attribute on WinPlatformB has a value of 5, then WinPlatformA will generate a platform communication error when it misses six consecutive heartbeats from WinPlatformB. If the NetNMXHeartbeatsMissedConsecMax attribute on WinPlatformA has a value of 2, then WinPlatformB will generate a platform communication error when it misses three consecutive heartbeats from WinPlatformA. |
|
Default |
3 |
|
Tuning |
Small configuration (up to 10,000 I/O per engine): 3 Larger configurations (more than 10,000 I/O per engine): 6 |
|
Notes |
I/O values represent the load on individual AppEngines, not the Galaxy size. Missed consecutive heartbeats determines the number of missed heartbeats that will trigger the redundant engine to act. Setting the values smaller makes the engines more sensitive to network failure. Setting the values larger makes the engines more tolerant of high CPU loads that can cause missed heartbeats. Specifying a value of 0 is not recommended, as this may trigger false communication errors that can deteriorate the system performance. |
Failover services talk between themselves using the RMC and determine the communication status between the two nodes. The status is provided by monitoring Heartbeat attributes.
Message Channel Heartbeat settings control the heartbeat intervals; i.e., how often the redundant platforms send each heartbeat through the RMC.