Friday, February 19, 2010

Posted by Jitu at 2:27 PM Labels: broadcast 0 comments

Reliable Broadcast

Reliable broadcast is a necessary component for many distributed algorithm. For client server architecture we are lucky that there is TCP :) to support reliable message delivery, which handles packet loss, packet retransmission, duplicate messages etc. But unfortunately distributed architecture is much complex then that, where we have multiple recipients and our target is to scale as the recipients grow.

There are lots of ways to broadcast messages. Each provides some property and cost so that we can pick according to our need.

We can classify reliable broadcast in many categories(in general),

Best-effort broadcast
Reliable broadcast
Uniform reliable broadcast
Lazy reliable broadcast
Eager reliable broadcast
Causal reliable broadcast

Here are the properties, pros and cons of each type,

Best-effort broadcast (BEB)

Properties

BEB1: Validity	If a correct process pi broadcasts a message m, then pi eventually delivers m
BEB2: No duplication	No duplicate message is delivered
BEB3: No creation	No message delivered unless broadcast

Pros

Simpler implementation
Works if only sender is reliable

Cons

Sender centric algorithm, so it is not scalable
Does not work if sender crashes

Implementation

Send the message to all the nodes using perfect channel. Perfect channel takes care of the Best-effort-Validity.

Reliable broadcast (RB)

Property

RB1: Validity	BEB1
RB2: No duplication	BEB2
RB3: No creation	BEB3
RB4: Agreement	If a message m is delivered by some correct process p_i, then m is eventually delivered by every correct process p_j

Pros

Carries all pros from BEB
Provides atomicity(either all or none of the correct node delivers)

Cons

Carries all pros from BEB

Uniform Reliable broadcast (URB)

Property

URB1: Validity	BEB1
URB2: No duplication	BEB2
URB3: No creation	BEB3
URB4: Agreement	If a message m is delivered by any process p_i, then m is eventually delivered by every correct process p_j

Lazy Reliable Broadcast

Requires: Perfect failure detector

Property

The is a non zero probability if the sender and receiver is correct then eventually the receiver will receive the message.

Basic idea

If any node detects that the sender has crashed then RB deliver a copy of the message to all node.

Eager Reliable Broadcast

Property

Same as Lazy Reliable Broadcast. But it does not use any failure detector. It work is fail silent model

Basic idea

It does not wait for node to crash and does the RB deliver to all node.

Causal reliable broadcast

Till now we did not consider order of the messages. But if the broadcast messages are related and has a order such that m₁ precedes m₂ then,

Properties

if m1 and m2 is broadcasted from the same process then m1 broadcasted before m2
if m1 is delivered by process p then p broadcasted m2 after delivering m1
the precedence is transitive, so if m1 precedes m' and m' precedes m2 then m1 precedes m2

References

Introduction to Reliable Distributed Programming by Rachid Guerraoui and Luís Rodrigues
Slides from the lectures of Professor Seif Haridi, SCS/ICT/KTH.

Friday, February 12, 2010

Posted by Jitu at 5:58 PM Labels: Crash 1 comments

Node failure

In a distributed system a node can fail/crash in a lot of way and it is very important that we understand which failure model we are supporting while defining an algorithm. Because execution of an algorithm can change depending on the failure model.

Failure model

A node can fail in four way.

Crash Stop
Omissions
Crash Recovery
Byzantine / Arbitrary

The properties of each failure model is following,

Crash Stop

Node stop doing anything(sending/receiving/processing)
Once fail never recover

Omissions

It can be of 2 type as,

Sending omission - not sending data where node is suppose to send according to algorithm
Receiving omission - not receiving data when other node has sent message

Omission is actually a temporary state which eventually turns to a crash or a crash recovery state. Because it may recover from that state which will make it crash recovery and if it do not recovers then it becomes a crash stop.

Crash Recovery

A node might crash
It recovers after crashing and initiates a recovery process

Some crash recovery model also has stable storage. Nodes store state information in the stable storage and it can retrieves necessary information while it is in recovery state.

Byzantine / Arbitrary

A node may behave arbitrary (sending/receiving messages that are not specified by algorithm). This can be a malicious attack to the system.

Fault tolerance hierarchy

There is a hierarchy in the crash models. Like crash is a special type of omission, where the process never recovers. Omission is an intermediate state for crash recovery, where process runs it's recovery procedure and eventually recovers from the omission state.

And crash recovery can be a special case of byzantine failure since it allows any kind of deviation from the regular algorithm. So any kind of crash can be a special case of byzantine failure.

Figure: Fault tolerance hierarchy

References

Introduction to Reliable Distributed Programming by Rachid Guerraoui and Luís Rodrigues
Slides from the lectures of Professor Seif Haridi, SCS/ICT/KTH.

Friday, February 5, 2010

Posted by Jitu at 2:37 AM 0 comments

Distributed solution why and where

Distributed approach for solving problem is an emerging and hot topic now a days. It is really a powerful solution. But we should all know first what is the strength and weakness of this approach before we consider of choosing this.

Disadvantage

Well in short the disadvantage of this approach is that it is fairly complex and thus costly for development and maintenance. Skilled people are needed for this kind of solution development.

Then where should we use it

In general there are 2 kind of situation where distributed approach is proffered.

Some solutions are only possible by using distributed solution
Some solutions are inherently has characteristic of distributed solution

If you want to provide system which survives hardware failure, you need to provide replica which is only possible via distributed solution by having more than one node to support hardware failure. Now if resources are distributed then you have no option but to use distributed solution. Another reason to use distributed approach is to improve performance(by distributing independent tasks to more than one nodes).

Let us think about skype. Skype does not use traditional client server model rather it uses peer to peer model. This is why it can provide scalability without using costly centralized client server structure.

Ashraf on Distributed Computing

Friday, February 19, 2010

Reliable Broadcast

Best-effort broadcast (BEB)

Properties

Cons

Implementation

Reliable broadcast (RB)

Property

Cons

Uniform Reliable broadcast (URB)

Property

Lazy Reliable Broadcast

Property

Basic idea

Eager Reliable Broadcast

Property

Basic idea

Causal reliable broadcast

Properties

References

Friday, February 12, 2010

Node failure

Failure model

Crash Stop

Omissions

Crash Recovery

Byzantine / Arbitrary

Fault tolerance hierarchy

Figure: Fault tolerance hierarchy

References

Friday, February 5, 2010

Distributed solution why and where

Distributed approach for solving problem is an emerging and hot topic now a days. It is really a powerful solution. But we should all know first what is the strength and weakness of this approach before we consider of choosing this.

Disadvantage

Then where should we use it

About Me

Labels

Search

Blog Archive

Ashraf on Distributed Computing

Friday, February 19, 2010

Reliable Broadcast

Best-effort broadcast (BEB)

Properties

Cons

Implementation

Reliable broadcast (RB)

Property

Cons

Uniform Reliable broadcast (URB)

Property

Lazy Reliable Broadcast

Property

Basic idea

Eager Reliable Broadcast

Property

Basic idea

Causal reliable broadcast

Properties

References

Friday, February 12, 2010

Node failure

Failure model

Crash Stop

Omissions

Crash Recovery

Byzantine / Arbitrary

Fault tolerance hierarchy

Figure: Fault tolerance hierarchy

References

Friday, February 5, 2010

Distributed solution why and where

Distributed approach for solving problem is an emerging and hot topic now a days. It is really a powerful solution. But we should all know first what is the strength and weakness of this approach before we consider of choosing this.

Disadvantage

Then where should we use it

About Me

Labels

Search

Blog Archive

Subscribe To