Node Apoptosis: How Shardeum Regulates Malfunctioning Nodes

Node Apoptosis: How Shardeum Regulates Malfunctioning Nodes

This guide delves into the process where Shardeum nodes, analogous to biological cells, voluntarily exit the network to maintain system...

Back to top

Shardeum is at the forefront of blockchain innovation, striving to address the scalability trilemma through multiple innovations via its protocol, Shardus, while also focusing on critical capabilities that are equally vital for achieving its goals. These efforts are designed to make decentralization both affordable and accessible to everyone. If you are new to our updates, we encourage you to explore the articles about the cutting-edge solutions our engineering team has developed, which have made Shardeum feature-complete. These insights not only provide a deeper understanding of our transformative project but also have the potential to inspire and deepen your engagement with blockchain technology. One of these capabilities is node apoptosis, also called as node self-destruct – although no node actually self-destructs but rather voluntarily exit the active validator set.

So, what is node apoptosis and what does an obscure term like apoptosis even mean in general? The term apoptosis is derived from Greek and refers to a process of programmed cell death in biological organisms, where cells intentionally die without causing harm to the organism. In the context of Shardeum, node apoptosis refers to a process in which an out of sync node on the network voluntarily removes itself from the network to prevent further issues. Analogously, nodes on Shardeum can be viewed as cells in a larger system, and just as cells undergo apoptosis to preserve the health of an organism, nodes in Shardeum can voluntarily “self-destruct” and remove themselves to maintain the efficiency of the network. This self-regulation ensures that the network remains robust and secure, mirroring the natural processes that keep biological organisms healthy and functional.

Key Terminologies

For the purpose of understanding this blog better, we would like to start by sharing the key words that we will be using in the upcoming sections.

  • Cycle: A cycle is a duration of time on Shardus/ Shardeum that is approximately 60 seconds in which sequences of different operations occur.
  • Cycle Marker: A cycle marker is a cryptographic hash used as a checkpoint embedded in Shardus/ Shardeum at regular intervals (each cycle). It serves to verify the integrity and consistency of the data up to that point, facilitating efficient data synchronization and validation across the network.
  • Lost Nodes: Nodes that have become unresponsive, disconnected or have sufficient downtime as active nodes are flagged on Shardeum as lost nodes, without triggering the formal apoptosis process. These nodes are no longer active participants in the network’s consensus or transaction processing.
  • resyncChain(): This is a function within a node’s software that attempts to resynchronize its local copy of the ledger with the consensus version held by the rest of the network. It is used when discrepancies or data mismatches are detected.
  • repairChain(): A function that attempts to fix broken data links within the ledger’s history stored by a node. If successful, it restores continuity and integrity to the node’s ledger, allowing it to remain a part of the network’s ongoing consensus processes.

6 Phases of Node Apoptosis on Shardeum

As stated, node apoptosis in the context of Shardeum refers to a mechanism where an individual node decides to exit the network under certain conditions, specifically being unable to effectively sync. This concept is crucial for maintaining network integrity and performance. Below, we will discuss the 6 major phases of node apoptosis and potential future enhancements:

  1. Trigger for Apoptosis
  2. Apoptosis Message Broadcasting
  3. Security Considerations
  4. Additional Message Information
  5. Handling Edge Cases
  6. Self-Destruct Scenarios

Let’s now break down the each of the listed technical aspects of node apoptosis to gain a more comprehensive understanding.

1. Trigger for Apoptosis

Nodes in Shardeum constantly synchronize data with one another. For example, standby nodes, once randomly selected to become active nodes, must first sync the most up-to-date state data. Syncing within the network is a prerequisite to having a fully operational node that can engage in consensus. However, a node triggers apoptosis when it detects that its data is persistently out of sync with the rest of the network and cannot be repaired effectively. There are also specific points in the node’s operational code where checks for data synchronization are performed. If the node is significantly out of sync at these points, it can decide to exit the network.

2. Apoptosis Message Broadcasting

The node initiates apoptosis by creating and broadcasting an “apoptosis” message. This message includes the current cycle number to prevent attacks such as replay attacks and is digitally signed by the node to prove its authenticity. Afterwards, there is a handling and verification process for security. Finally, the message is propagated across the network using a gossip protocol, and these nodes receive the apoptosis message during specific periods of cycle called quarters (roughly 15 second intervals in a 60-second cycle), save and further disseminate the message. On a deeper level level the process is as follows:

  • Cycle Number: The “apoptosis” message includes a field named “whenwhich represents the cycle number at the time of the message’s creation. This inclusion is essential for preventing replay attacks, as it contextualizes the message to a specific cycle, reducing the risk of its misuse from a different cycle.
  • Digital Signature: Each “apoptosis” message is signed digitally. The message contains a sign object which includes details about the owner of the signature and the signature itself (owner and sig). This digital signature proves the authenticity of the message and confirms that the node declaring its departure is the legitimate sender of the message.
  • Handling and Verification: When the message is received, there is a process in place to ensure that the sender ID matches the ID in the signed message. Additionally, the system checks whether the cycle number in the message falls within an acceptable range of the current cycle (not more than one cycle ahead or behind). These checks help prevent inappropriate or malicious use of old messages.
  • Gossip Protocol Use: The “apoptosis” message is propagated using a gossip protocol, ensuring that the information reaches all active nodes efficiently. The message is stored and then gossiped in the next available “quarter 1” cycle window. Propagating the apoptosis message in Quarter 1 (Q1) ensures efficient dissemination, reduces the risk of replay attacks, and allows immediate verification and action by nodes. This strategy maintains network integrity and operational continuity.

These features of the “apoptosis” message – specifically the cycle number inclusion and digital signature – provide robust protections against threats like replay attacks. This method ensures that messages are not only tied to a specific time frame but also verified for authenticity, maintaining the integrity and trustworthiness of the network’s operations.

3. Security Considerations

On Shardeum, security is paramount, therefore our security is designed into the system at every level. A potential vulnerability could be an attacker deliberately causing nodes to become out of sync through DDoS (Distributed Denial of Service) attacks, triggering apoptosis. They could also combine this with additional attacks in an attempt to gain more control over the network. To prevent this, nodes must have mechanisms to detect and mitigate DDoS attacks. Upon detecting a DDoS attack, nodes log a fatal error. This is then accompanied by reporting to a network monitoring server, alerting the relevant parties to the attack.

This immediate notification ensures that the relevant parties are immediately informed about serious issues, allowing for prompt intervention to mitigate damage or rectify the problem. In addition to these measures, the system facilitates rapid incident response, enabling a thorough analysis of the underlying issues. This, in turn, informs the development of strategic responses designed to prevent similar incidents in the future.

4. Additional Message Information

Nodes undergoing apoptosis due to DDoS attacks include information about the attack in their apoptosis message. This serves as an alert to other nodes and operators about potential network-wide attacks.

5. Handling Edge Cases

Firstly, in cases where both a lost node message and an apoptosis message are present, the apoptosis message takes precedence. Prioritizing apoptosis messages helps the network quickly adapt by acknowledging the permanent removal of a node, rather than waiting for a possibly recoverable node to come back online, which could delay network operations and efficiency. Additionally, prioritizing apoptosis messages over lost node messages provides a clear and unambiguous communication to the rest of the network about the node’s status. It ensures that other nodes can immediately take necessary actions, such as reallocating workloads, gauging whether additional nodes are required etc. without the ambiguity that might come with lost node messages.

Secondly, in scenarios where the network is operating at its minimum node capacity, and a node undergoes apoptosis, the network halts new data transactions until the minimum node count is restored. Therefore nodes are not allowed to voluntarily exit the network. This serves several functions. Firstly, it prevents the remaining nodes from becoming overloaded with transactions they cannot process effectively, which could lead to performance degradation or errors. Secondly, it ensures that data consistency and accuracy are preserved, as a reduced number of nodes might not be able to uphold the network’s standard data validation and consensus mechanisms. Finally, it provides a buffer period for the network to recover, by adding new nodes from the standby set into the network, thus restoring the network to its full operational capacity.

6. Self-Destruct Scenarios

A key question we can ask is what sync scenarios constitute apoptosis or trigger node self-destruct? Before we can answer that, we must first understand that Shardeum has a repair process which involves resolving inconsistencies or discrepancies that can occur during the transaction validation process at the state level. This repair mechanism is crucial for maintaining the integrity of the ledger. This repair mechanism essentially functions as a form of self-healing for the network, allowing it to continue functioning correctly despite errors or misalignments in the data propagated across the network. To go back to the original question “what sync scenarios constitute apoptosis or trigger node self-destruct?”, the answer is the following: nodes engage in apoptosis if data remains out of sync beyond a repairable extent.

The failure can occur during initial data synchronization or if the data repair system itself fails. Regarding initial data synchronization, this process is initiated when a node, designated as a standby, is selected to transition into an active role in consensus. Before participating, the node must first synchronize with the most current state of the network, functioning as a syncing node . If a node’s attempt to resynchronize its chain (by two functions known as resyncChain and repairChain) fails, it triggers apoptosis. This can happen if the discrepancies in cycle markers suggest a break in the chain that predates the earliest available backup state, making restoration impossible.

Future Considerations and Improvements

Looking ahead, Shardeum is committed to continually refining its node apoptosis mechanism. Future enhancements could include more sophisticated sync detection technologies and the development of scalable solutions tailored for enterprise applications. These advancements will focus on optimizing long-term operational efficiency and broadening the applicability of our systems to meet diverse business needs.

Conclusion

In conclusion, node apoptosis in Shardeum represents a critical self-regulatory mechanism that enhances network integrity by allowing nodes to voluntarily exit when they cannot maintain synchronization. This feature prevents the spread of errors in the network, non-performant nodes and supports overall network health through secure message broadcasting, robust security measures against potential attacks, and precise handling of operational scenarios. Shardeum’s approach ensures operational reliability and sustainability, embodying advanced design and proactive network management.


16
The Shard

Sign up for The Shard community newsletter

Stay updated on major developments about Shardeum.