Inter-node communications
- Last UpdatedAug 12, 2024
- 3 minute read
The following section considers platform communication when deployed across a widely-distributed and/or intermittent network (SCADA). A brief summary is included for context and is not intended as a recommendation, but rather as a pointer for the developer to begin tuning the communications to accommodate the needs, and mitigate the effects, of a SCADA system.
The information assumes multiple platforms are deployed on multiple nodes in a SCADA topology.
Communication summary
Communication between distributed platforms occurs at two levels: Heartbeats and messages (data change requests and replies, subscriptions, status updates/replies, etc.). Messages are handled by Message Exchange (MX) services.
Application Server monitors heartbeats and messages (sends/receives) on a regular, configurable basis. Several attributes can be used to monitor and tune the system to avoid problems in a SCADA environment; for example, heartbeats missed because of an intermittent network may cause all subscriptions to be dropped and re-initiated, saturating the network and preventing successful reconnection with remote nodes.
The actual settings depend on the particular network environment.
Tune the following attributes when implementing Redundant Platforms/Engines within a SCADA environment:
|
NmxSvc Attributes |
Primitive |
Default Value |
Remarks |
|---|---|---|---|
|
NMXMsgMxTimeout |
WinPlatform |
30,000 ms |
Can set at config-time and run-time if platform is Off Scan. Specifies how long engine waits for response from another egine before declaring timeout. |
|
NetNMXHeartbeatPeriod |
WinPlatform |
2000 ms |
Can set at config-time and run-time. Specifies how frequently the NmxSvc sends heartbeats to remote Nmx services connected to it. |
|
NetNMXHeartbeatsMissedConsecMax |
WinPlatform |
3 |
Can set at config-time and run-time. Specifies how many heartbeats are allowed to be missed before remote NmxSvc declares the connection broken. |
|
DataNotifyFailureConsecMax |
Engine |
0 |
Determines the number consecutive Data Change Notification failures that will be allowed before the subscription is torn down by the publisher engine. |
These attributes can be set to balance correct and timely error notification with a stable system performance. For example, the DataNotifyFailureConsecMax value of 0 means that the system will begin tearing down subscriptions (and rebuilding them) if a Data Change Notification failure occurs at any time. Initiating this action means that the network is then flooded with subscription messages both when tearing them down and rebuilding them.
This action may not be realistic in an environment in certain connections are sporadically intermittent.
Using NetNMXHeartbeatsMissedConsecMax and NetNMXHeartbeatPeriod together provides the total time elapsed since the last heartbeat before the connection is declared broken. The formula is:
(NetMNXHeartbeatsMissedConsecMax + 1) * NetNMXHeartbeatPeriod
Setting the values to smaller numbers should discover broken connections faster, but may also provide "false" broken connections because the Nmxsvc doesn’t get enough CPU time to process incoming messages.
Note: These attributes do not directly affect failover. They specify when Message Exchange will declare communication errors.
Note that recovery time on a distributed network or from an outside disaster is longer on a redundant system.
Note: The redundant pair must be at the same physical location; they cannot be geographically separate.
Redundancy for Application Object Server engines may be applied as needed at remote sites. The primary and backup nodes must include individual NICs for their RMC channels and must use a simple crossover cable between them. The only impact upon network traffic will be some amount of additional packets during deployment from the central GR node to both the primary and backup nodes.
Load balancing
Load balancing is relevant only in the central supervisory setting. This is because load balancing implies moving traffic to another CPU at the same location; SCADA systems have a physically distributed architecture. In a central location, use a cluster of Application Servers to distribute processing activities.