Distributed System Design - Service Discovery

Client-side Discovery

The client queries a service registry then uses a load‑balancing algorithm to select one of the available service instances and makes a request.

Example:

Netflix Eureka (service registry) + Netflix Ribbon (IPC client that works with Eureka to load balance requests across the available service instances)
Amazon Cloud Map

Server-side Discovery

The client makes a request to a service via a load balancer. The load balancer queries the service registry and routes each request to an available service instance.

Example:

The AWS Elastic Load Balancer (ELB)
Kubernetes runs a proxy on each host in the cluster. The proxy plays the role of a server‑side discovery load balancer.

Pros:

details of discovery are abstracted away from the client. Clients simply make requests to the load balancer.
some deployment environments provide this functionality for free.

Cons:

if the load balancer is not provided by the deployment environment, it is another highly available system component to maintain.

Service Registry

A database containing the network locations of service instances.

A service registry needs to be highly available and up to date.

Examples:

etcd
consul
Apache Zookeeper

Kubernetes, and AWS do not have an explicit service registry. It is a built‑in part of the infrastructure.

Tools

Build your own:
- Consul: from HashiCorp, uses gossip protocol agent to exchange data between the cluster nodes.
- etcd: distributed reliable key-value store for the most critical data of a distributed system.
- Zookeeper: from Hadoop.
Built-in: Kubernetes, AWS

Consul vs Zookeeper/etcd

https://www.consul.io/intro/vs/zookeeper.html

Consul has native support for multiple datacenters.

All of these systems have roughly the same semantics when providing key/value storage: reads are strongly consistent and availability is sacrificed for consistency in the face of a network partition.

ZooKeeper et al. provide only a primitive K/V store and require that application developers build their own system to provide service discovery. Consul, by contrast, provides an opinionated framework for service discovery and eliminates the guess-work and development effort.

Zookeeper

Zookeeper is

an atomic messaging system that keeps all of the servers in sync.
a distributed lock server.
a fast, highly available, fault tolerant, distributed coordination service.
a distributed, hierarchical file system that facilitates loose coupling between clients and provides an eventually consistent view of its znodes, which are like files and directories in a traditional file system.

It’s hard to use correctly.

One server acting as a leader while the rest are followers. On start of ensemble leader is elected first and all followers replicate their state with leader. All write requests are routed through leader and changes are broadcast to all followers. Change broadcast is termed as atomic broadcast.

It provides basic operations such as creating, deleting, and checking existence of znodes. It provides an event-driven model in which clients can watch for changes to specific znodes, for example if a new child is added to an existing znode. ZooKeeper achieves high availability by running multiple ZooKeeper servers, called an ensemble, with each server holding an in-memory copy of the distributed file system to service client read requests.