Review

31 Dec 2023 . distributed basic .

Distributed System

In a distributed system, there is no shared memory but only message passing – it’s a fundamental difference from a shared memory non- distributed system.

Transparency

access transparency: local and remote resources are accessed using the same operations.
location transparency: remote resources are accessed using location independent names.
concurrency transparency: processes can access resources without interfering with each other.
failure transparency: failures are concealed for the users of a resource.
replication transparency: users of a resource access it as if it was not replicated.

Network

quality of service

reliability
security
performance

Internet, network

A network: a hardware and software data communication system that provides interconnection of computers and other devices.
The Internet*: Internet is a set of networks connected with routers. The largest internet that includes commercial, military, university, and other networks with different physical links and various protocols, including IP (Internet Protocol)

client-server, peer-to-peer

client-server: client is the active part.
peer-to-peer: all nodes are active and can be the initiator of operations.

synchronous, asynchronous

synchronous: upper bound of time to perform an operation is known.
asynchronous: time to perform an operation does not have an upper bound.
difference: possibility of perfect failure detectors.

UDP, TCP

UDP

small message that should be sent with little delay.

best effort delivery of messages to a process.

TCP

large messages or a sequence of messages.

full-duplex stream between two processes.
port numbers can be found in the TCP header.
window size a limiting factor for high capacity communication: over long fat communication links.

active/passive replicated server

active replicated server

Multiple servers actively process client requests simultaneously.
Each server maintains an identical copy of the state and is capable of processing client requests independently.
The servers continually communicate to ensure that their states remain consistent. Replicas are in a consistent state because they reliably receive all requests in a total order.

passive replicated server

Only one server (the primary or master) actively processes client requests, while the others (the backups or replicas) remain in a standby mode.
The primary server processes client requests and updates its state accordingly.
The state updates are then replicated to the backup servers to ensure that they have an up-to-date copy of the data. Use view-synchronous group communication so that all or none of the backup servers have received an update before a new primary is elected.
If the primary server fails, one of the backup servers is elected as the new primary, and it takes over processing client requests.

RPC(remote procedure call)

invocation semantic: at least once
An idempotent operation is an operation that can be performed repeatedly with the same effect as if it had been performed exactly once.

Java RMI(remote method invocation)

argument pass: remote objects as reference, all other as copies.

reason: the object contains a mutable state that should not be duplicated.
invocation semantic: at most once
level of transparency: location transparency

At-most-once or At-least-once

At-most-once

The request has been executed once.
For non-idempotent operations.
Implemented using a history or simply not resending requests.
If an operation fails, the request might have been executed at most once.

At-least-once

The request has been executed at least once.
For idempotent In re-sending.
No need for a history; simply resend requests until a reply is received.
If an operation fails, the request might have been executed at least once.

Erlang

process communicate: asynchronous message passing
provide location transparency: use a process identifier without having to know the address of the node where the process lives.
destination in send operation: process identifier or a local or remote registered name.
circular construction: processes can refer to each other in a circular way.

File system

Descriptors, table entries and i-nodes

i-node: Inode holds the metadata for this file and, traditionally, the pointers to the disk blocks where the file resides on the hard drive.

soft link, hard link

soft link: a path that is resolved to another path.
hard link: a mapping of a name to a file identifier.

NFS(Network File System)

client-side cache entry validation: if the validity was checked less than t seconds ago or if the server modification time is equal to the client modification time.
authentication control: NFS is a stateless server, so user’s identity and access right must be checked by the server on each request.
what position to read and write an opened file: each read and write operation holds the position.

AFS

client side caching implementation: server promise to notify the client if a file is modified by another client. Client receive a call-back promise. The file is valid if the promise exists and is not too old.
inconsistent view of a file: when callback message is lost, an old version of a file may be opened after it has been updated by another client.

URL, URN, URI

URL and URN are subsets of URI.
URL: location and access method of resource.
URN: assign a unique name for resource, regardless of location.

DNS

inconsistencies of the resolver cache: each entry has a time-to-live.
inconsistent cached entries removed: all entries have an expiration time set when cached.

Time and Clock

Christian’s algorithm

Berkeley algorithm

perform internal synchronization.

Leader Selection: One of the nodes in the network is designated as the leader or coordinator. This leader is responsible for coordinating the time synchronization process.
Time Measurement: The leader periodically sends a time request message to all other nodes in the network. Each node records its current local time and sends it back as a time response message to the leader.
Time Calculation: The leader collects the time response messages from all nodes. It then calculates the average time offset (difference) between its own clock and the clocks of all other nodes.
Time Adjustment: The leader sends time adjustment messages to all nodes, instructing them to adjust their local clocks based on the calculated average offset. This adjustment aims to bring the clocks into approximate alignment.
Clock Adjustment: Each node, upon receiving the time adjustment message, adjusts its local clock accordingly.

NTP(Network Time Protocol)

Client-Server Model: computers (clients) periodically request time information from NTP servers. The servers respond with highly accurate time data.
Clock Synchronization: The NTP client compares the time received from the server with its own local clock. It then applies a correction to adjust its clock to match the server’s clock more closely.
Primary servers are connected directly to a time source; secondary servers are synchronized with primary servers. The servers are connected in a logical hierarchy called a synchronization sub-net, whose levels are called strata. Primary servers occupy stratum 1: they are at the root. Stratum 2 servers are secondary servers that are synchronized directly with the primary servers; stratum 3 servers are synchronized with stratum 2 servers, and so on.

Lamport, Vector

only the vector clock gives a complete description of the happened before order.

happened-before

A happened before B ⇒ A must have occurred in real-time before B.
A occurred in real-time before B ⇒ A could have happened before B.
A caused B ⇒ A happened before B.

Lamport

if L(a) < L(b) then a could have caused b
if L(a) ≮ L(b) then a did not happened before b
when implementing mutual-exclusion: prevent to deadlock

Vector

problem: a dynamic set of processes.

Global State

A global state corresponds to the initial prefixes of the individual process histories.
A *cut **is a subset in the global history up to a specific event in each history.
consistent cut: if e is in the cut and f happened-before e then f is in the cut. A consistent cut ***corresponds to a **consistent global state.
A run ***is a total ordering of all events in a global history **consistent with each local history.
A linearization **or **consistent run **is a run that describes transitions between **consistent global states.
stable global state: if a system enters a state where the predicate holds true it will remain true in all future states.

example: deadlock.
unstable global state: the predicate could hold true in a state but then be false in future states.

global state not local node

Deadlock Detection: If we do not compare the global state with the local state of the nodes, we could determine a deadlock which isn’t really one (“phantom deadlock”).
Garbage Collection: In distributed garbage collection it could happen, that we remove an object whose reference is still on the way to the receiver.
Lost Token Detection: The liveness of ring-based networks mainly depends on the availability of the token. The token is passed further by each node in the ring. If we capture a global state while the token is on the way from one node to the other, we could mistakenly determine the loss of the token.

Chandy and Lamport algorithm

there is a linearization from the original state to the final state that passes through the snapshot.

itiation of Snapshot: The snapshot algorithm begins when a process, called the Initiator, decides to take a snapshot. The Initiator does not inform other processes about this decision.
Marker Messages: The Initiator initiates the process by sending a special marker message along all outgoing channels. When a process receives a marker message for the first time, it records the state of its local variables (this is the “local snapshot”).
Snapshot Collection: Once has received marker messages on all incoming channels, it has captured a consistent global snapshot.

Coordination

Broadcast: No one keeps track of who listens in a broadcast service.
Multicast: In a multicast service, the sender sends a message to a specific group; the system keeps track of who should receive it.

Bully algorithm

better turnaround time.

Leader Detection Trigger: The algorithm starts when a process detects that the current leader has failed or when a new process joins the network and needs to elect a leader.
Initiating an Election: When a process detects the absence of a leader, it initiates an election process by sending an election message to all processes with higher IDs.
Receiving OK Messages: If the initiating process receives an “OK” message from all processes with higher IDs, it becomes the new leader and sends a victory message to inform all processes of its leadership.

ring-based election

nodes can easily change priority in-between elections.

Initiation: The election process is initiated when a triggering event occurs, such as the failure of the current leader node or a request to elect a new leader.
Message Passing: The initiation node (initiator) sends an “election” message containing its own identifier to its successor node. The message includes the initiator’s identifier and a flag indicating that an election is in progress.
Processing at Nodes:
- When a node receives an “election” message, it compares the received identifier with its own.
- If the received identifier is greater, the node updates the message with its own identifier and continues to pass it along to the next node.
- If the received identifier is smaller, it means that the message has completed the full circle and reached the initiator, signifying the end of the election.
Election Completion: Once the initiator receives the message back with its own identifier, it knows that it has the highest identifier and declares itself the new leader.

reliable multicast

messages are guaranteed to be delivered to all correct processes.
Implement a reliable multicast using only basic multicast: by resending each received message to all other nodes.

FIFO

Messages are delivered to processes in the same order in which they were sent.
This means that if process A sends message X and then message Y to process B, process B will receive and process X before Y.

casual order

It ensures that messages are delivered in an order that respects the causal relationships between events in the system.
If event A causally precedes event B, then any message related to A must be delivered before any message related to B.
Causal ordering does not strictly imply FIFO.

total order

In total order, every process receives messages in the same order. This ensures that all processes agree on the exact sequence of message deliveries.

Uniform agreement

Uniform agreement: if any node, correct or incorrect, delivers a message, then all correct nodes will deliver the message.
Non-uniform agreement: if a correct node delivers a message, then all correct nodes will deliver the message.

Transaction

problem

dirty read: reading a value that has not been committed.
premature writes: writing a value that has not been committed.
lost update: both transactions read the same object and attempt to update the same data concurrently, and one of the updates is lost, resulting in an incorrect final state.
inconsistent retrieval: both transactions write to the same object and different transactions reads different value for the same object.

serially equivalent

The combined effect is the same as if the transactions had been performed one at a time in some order is a serially equivalent interleaving.

The use of serial equivalence as a criterion for correct concurrent execution prevents the occurrence of lost updates and inconsistent retrievals.
Two transactions are serially equivalent if, and only if, all pairs of conflicting operations of the two transactions are executed in the same order on all the objects they both access.

conflicting operations

write-read: conflict (dirty read)
read-write: conflict (unrepeatable read)
write-write: conflict (overwriting uncommitted data)

phantom deadlock

one that is detected but not stable.

optimistic concurrency control

hard to provide liveness.

Execution Phase: The transaction performs its operations on the data it has read. This includes any updates, deletions, or insertions.
Validation Phase: Once the transaction has completed its operations, it must validate that no other transaction has modified the data it read before it commits.
Conflict Resolution: If a conflict is detected during validation, the system must resolve it. This can be done by: Abort the current transaction and restart it, allowing it to re-read the data and try the operations again.
Commit Phase: If no conflicts are detected during validation, the transaction is ready to commit. It applies its changes to the database.

two-phase locking

Growing Phase: In this phase, a transaction can acquire locks but cannot release any. Once a transaction releases a lock, it cannot acquire any more locks.
Shrinking Phase: After a transaction releases its first lock, it enters the Shrinking Phase. In this phase, a transaction can release locks but cannot acquire any more.

two-phase commit

phase one (voting): ask participants to vote for commit or abort
- if voting for commit, one has to be able to commit even after a node crash
- if anyone aborts, all must abort
phase two (completion): inform all participants of the result

forward/backward validation

Forward validation: checks the transaction undergoing validation with other later(future) transactions, which are still active.
Backward validation: checks the transaction undergoing validation with other preceding(past) overlapping transactions – those that entered the validation phase before it.

Timestamp ordering

Each operation in a transaction is validated when it is carried out
The timestamp defines its position in the time sequence of transactions.
A transaction’s request to write an object is valid only if that object was last read and written by earlier transactions.
A transaction’s request to read an object is valid only if that object was last written by an earlier transaction.

acid

atomicity: either all or no operations in the transaction are performed
isolation: intermediate results must not be visible to other transactions
durability: the effects of the transaction will remain even if we have a server crash

Group Membership

view: the set of processes belonging to a group.
a process that enters the group will deliver: all messages starting from the view when it enters.
different from reliable multicast: a message is sent and delivered only by processes belonging to the same view.

DHT(Distributed Hash Table)

limit the number of hops to reach any key: at most log(N)
leaf set: shortcut the look-up when is close to the destination key.

leaf set includes nodes less than self and nodes greater than self.

Mutual exclusion

Ricart and Agrawalas altorithm

Requesting Access: When a process wants to access a critical section (a shared resource), it sends a “request” message to all other processes in the system.The “request” message includes the process’s unique identifier and a timestamp representing the order in which requests were made.
Receiving Requests:
- When a process receives a “request” message from another process, it compares the received request’s timestamp with its own.
- If the received request has a higher priority (earlier timestamp), the receiving process defers.
- If the received request has a lower priority (later timestamp), the receiving process continues processing and does not grant access to the critical section.

Central Solution

In a central solution, there is a single central entity that has the authority to make decisions and coordinate activities.