Monday, March 22, 2010

Posted by Jitu at 1:00 AM Labels: Crash, Failure Detector 0 comments

Failure Detectors and it's properties

Failure detection is essential for many algorithms. And any failure detector is composed of two basic properties.

Completeness
Accuracy

There can be 2 types of completeness property

Strong completeness
weak completeness

Strong completeness

Every crashed node is eventually detected by all correct nodes.

Weak completeness

Every crashed node is eventually detected by some correct nodes.

Actually the weak and the strong completeness is equivalent. Because given a weak completeness we can achieve strong completeness. As some correct nodes has the information about all crashed nodes it can disseminate[using BEB, and as the some nodes are correct so all the correct nodes will get the information] the list of crashed nodes and every one will be aware of the crashed nodes.

Correctness is a liveness property. It can be proved as if some node do not detect the crashed node yet then there is possibility that it will be detected at some point is the future :)

Now let's talk about the accuracy property. It can be classified into 4 as,

Strong accuracy
Eventual Strong accuracy
weak accuracy
Eventual weak accuracy

Strong accuracy

No correct node is ever suspected.

This is really a strong property and this is safety property as well. Because if you detect a correct node as failed then the property is violated for any execution after that.

Eventual Strong accuracy

Eventually no correct node is suspected.

So now this property becomes liveness property because if a correct node is suspected now then there is still a possibility that in future execution no correct node will be detected as failed.

Weak accuracy

There exists a correct node which is never suspected by any node.

So this is a safety property because if none of the node is recognized as correct node then the property can not be satisfied in any future execution.

Eventual Weak accuracy

Eventually there exists a correct node which is never suspected by any node. And this is a liveness property.

So any failure detector is a combination of completeness property and any of the accuracy properties.

So here are the possible failure detectors(considering that strong and weak completeness is equivalent)

Accuracy/Completeness	Strong
Strong	Perfect Failure Detector
Eventually Strong	Eventual Perfect Failure Detector
Weak	Strong Failure Detector
Eventually Weak	Eventual Strong Failure Detector

References

Introduction to Reliable Distributed Programming by Rachid Guerraoui and Luís Rodrigues

Friday, February 12, 2010

Posted by Jitu at 5:58 PM Labels: Crash 1 comments

Node failure

In a distributed system a node can fail/crash in a lot of way and it is very important that we understand which failure model we are supporting while defining an algorithm. Because execution of an algorithm can change depending on the failure model.

Failure model

A node can fail in four way.

Crash Stop
Omissions
Crash Recovery
Byzantine / Arbitrary

The properties of each failure model is following,

Crash Stop

Node stop doing anything(sending/receiving/processing)
Once fail never recover

Omissions

It can be of 2 type as,

Sending omission - not sending data where node is suppose to send according to algorithm
Receiving omission - not receiving data when other node has sent message

Omission is actually a temporary state which eventually turns to a crash or a crash recovery state. Because it may recover from that state which will make it crash recovery and if it do not recovers then it becomes a crash stop.

Crash Recovery

A node might crash
It recovers after crashing and initiates a recovery process

Some crash recovery model also has stable storage. Nodes store state information in the stable storage and it can retrieves necessary information while it is in recovery state.

Byzantine / Arbitrary

A node may behave arbitrary (sending/receiving messages that are not specified by algorithm). This can be a malicious attack to the system.

Fault tolerance hierarchy

There is a hierarchy in the crash models. Like crash is a special type of omission, where the process never recovers. Omission is an intermediate state for crash recovery, where process runs it's recovery procedure and eventually recovers from the omission state.

And crash recovery can be a special case of byzantine failure since it allows any kind of deviation from the regular algorithm. So any kind of crash can be a special case of byzantine failure.

Figure: Fault tolerance hierarchy

References

Introduction to Reliable Distributed Programming by Rachid Guerraoui and Luís Rodrigues
Slides from the lectures of Professor Seif Haridi, SCS/ICT/KTH.

Ashraf on Distributed Computing

Monday, March 22, 2010

Failure Detectors and it's properties

Strong completeness

Weak completeness

Strong accuracy

Eventual Strong accuracy

Weak accuracy

Eventual Weak accuracy

Friday, February 12, 2010

Node failure

Failure model

Crash Stop

Omissions

Crash Recovery

Byzantine / Arbitrary

Fault tolerance hierarchy

Figure: Fault tolerance hierarchy

References

About Me

Labels

Search

Blog Archive

Ashraf on Distributed Computing

Monday, March 22, 2010

Failure Detectors and it's properties

Strong completeness

Weak completeness

Strong accuracy

Eventual Strong accuracy

Weak accuracy

Eventual Weak accuracy

Friday, February 12, 2010

Node failure

Failure model

Crash Stop

Omissions

Crash Recovery

Byzantine / Arbitrary

Fault tolerance hierarchy

Figure: Fault tolerance hierarchy

References

About Me

Labels

Search

Blog Archive

Subscribe To