How did I build a custom dynamic ZooKeeper management and migration framework and an asynchronous job framework for Tableau Server 11

 

First of all, Tableau Server 11 is not out yet. Tableau Server 10 is just released yesterday.

Product Strategy Overview

Tableau Server is an enterpise data analytics and visualization platform, and our customers are business people. In order to make the project usage experience as smooth as possible, we believe we should enable business people to manage Tableau Server on their own without any help from enterprise IT.

Therefore, one of the most critical goals for developing Tableau Server is that Tableau will provide GUIs and built-in workflows to automate as many managing processes as possible.

Project and Feature Requirements

Tableau core server team is building a new Java-based Tableau Server architecutre for Tableau Server 11 (target release, might change in the future) to replace the old jruby one.

Notably, old jruby Tabadmin persisting cluster-wide configuration and topology information in a local file (workgroup.yml). The role of ZooKeeper (referenced as ZK) is only used a distributed lock and a temp data store – it keeps some vilatile data that can be lost or wiped out and no one cares about it. This characteristic also makes managing and migrating ZooKeeper very very easy – you only need to clean up all data in existing ZooKeeper nodes, deploy the ZooKeeper bits to new nodes, and bring the new cluster up.

Unlike the old world, new Tabadmin will persist all essential configuration and topology data in ZK as a single source of truth, which makes managing, expanding, and migrating ZK much harder than before.

The Problem

The problem we are facing is how to maintain ZK’s data and expand/shrink ZK cluster, as well as making this process super reliable and easy for users.

Let’s elaborate more about the problem.

Say we start with a single Tableau Server node A, which has a single node ZK cluster runs on it. Then we want to expand the Tableau Server cluster from 1 node (A) to 3 nodes (A, B, C), as well as the ZK cluster runs on top of it. How to achieve that?

(A) -> (A, B, C)

Non-Goal

For customers who already have a ZK cluster in house and want to reuse that cluster, Tableau Server provides configurations that enable them to do so. That is not part of the project, and will not be discussed here. This project is focusing on smoothing the experience of managing Tableau Server’s built-in ZooKeeper cluster.

Naive Approach

The naive approach is to shut down (A), deploy ZK bits on B and C, change their configuration files, and bring up (A, B, C). Well, this can lead to disasters. The reason being that ZK runs on a quorum mode, means that if more than half nodes in a ZK cluster form a quorum, their decision will be the final decision that all nodes should follow. In this case, if B and C come up first and form a quorum, their decision will be “there’s no data in our cluster”. When A joins them, B and C will force A to delete its data, causing data loss.

ZooKeeper’s built-in Approach

Actually all developers using ZooKeeper faces the same problem. Think about running a ZooKeeper in your data center. If a ZooKeeper node fails, the common practice is that admin will try to bring it back manually/through scripts. If the machine dies, admins have to bring up a new machine, change its IP to match the failed one, and join it to the cluster. And you must know that machine failure is more than normal.

So ZooKeeper 3.5 starts to build a native rolling based approach to solve this common problem. The principle is to bring up a new node each time, join it to the existing cluster, wait them to sync up and become stable, bring up another new node.

(A) -> (A, B) -> (A, B, C)

The problem is that ZK is not originally developed to support this use case. It involves lots of refactoring, produces lots of bugs, and is not stable for production.

Take a look at ZK’s release page. 3.5.0 was released around 3 years ago. Apache then release 3.5.1 and 3.5.2, both are alpha version. They themselves even don’t recommend using it for production.

 

 

Other Open-Sourced Approach / Framework

The closet solution we found online is Netflix’s Exebitor library. Exhibitor is a Java supervisor system for ZooKeeper built and opensourced by Netflix

Though Exhibitor is an off-the-shelf zookeeper-managing library from Netflix, we found Exhibitor does not fit our needs because it also takes the way of rolling reconfiguration. What’s worse is that it’s not even a native solution, it’s only a wrapper. It’s so prone to errors that even Netflix doesn’t recommend using it.

Netflix has warned users “Experience has shown that rolling config changes can lead to runtime problems with the ZooKeeper ensemble.” as stated here

Our conclusion is that Exhibitor is more of a monitoring tool with good UI, rather than a true automated ZK management framework.

Our Tableau Approach

We took a completely new approach that is different from any solution on the market.

The core idea behind the scene is that Tableau Server will run two ZooKeeper clusters side by side to facilitate migration. If they overlap on a machine, they will run on different ports. Besides, this migration framework relies on our async job framework on top of ZooKeeper.

Our New Asynchronous Job Framework in Tableau Server 11

First thing first, the new Async Job Framework that I’m talking about is different from Async Background Jobs.

Async Background Jobs are taken care of by backgrounder processes in Tableau Server. Those jobs are all kinds of scheduled jobs that can be run in parallel by multiple backgrounders.

Async Jobs in new Tableau Server are a serial of configuring and initializing jobs that have to be executed in order and one at a time, to perform changes to cluster Configuration and Topology.

Each of our Async Job is created as PERSISTENT_SEQUENTIAL node under a ZK node /asyncjobs, so that name of each async job node looks like asyncjob00000000001, asyncjob00000000002, and keep on increasing. How that works is discussed here.
Other Components involved are:

  • Configuration / Topology

  • key-values pairs stored as string binary in a ZK node

  • Agent

  • Runs on every machine . All Agents are the same.

  • Does NOT make any decision but execute whatever commands it receives, from either Configuration/Topology in ZooKeeper or commands issued by users on terminal
  • Controls the actual life cycle of ZK

  • Start/stop local ZK instance on a given port

  • Create/delete/update zoo.cfg and myid files to support cluster migration
  • Restart itself when a new ZK connection string appear in Configuration

  • Controller

  • Runs on every machine. Only one Controller across the cluster is the Controller Leader (selected by Leader Election with Curator).

  • Any Controller can create Async Jobs, but only Controller Leader can execute Async Jobs
  • The Controller Leader will curate the whole ZK migration process

  • Read/write Topology (which machines run ZK) and Configuration (ZK connection string) from/to both old and new ZK clusters

  • Trigger rollback or recover from a broken situation if anything goes wrong

High Level Steps

The high level steps for migrating from (A) to (A, B, C) are like this:

  1. Take a global lock of ZK, so that no one can write/update important data during migration

  2. First identify the machines where new cluster is targeted to sit on, and create a new Async Job to cluster (A1) – say AsyncJob(A, B, C) in our case. The async job is composed of several substeps

  3. Substep 1. Controller Leader will add the target cluster as a config key to Configuration in ZooKeeper. Deploy ZK bits to B and C, and create data dirs and configurations for (A2, B2, C2)

  4. Substep 2. Bring up new ZK cluster (A2, B2, C2), migrate all data (all persisitent data excluding async jobs. Will discuss in detail later in this post) from (A1) to (A2, B2, C2)

  5. Substep 3. Migrate all persistent data excluding asyncjobs by backup() and restore(). Do a live check by iterating trees in both ZK cluster. At the end, create the same AsyncJob(A, B, C) in new ZK cluster

  6. Substep 4. Begin a two phase commmit

  7. Ask all clients to try to connect to the new ZK cluster. If anyone is missing, roll back

  8. If all shows up, update all clients’ ZK connection string. Clients have built-in mechanism to watch its configuration file, and will restart automatically when its ZK connection string is updated
  9. If any client doesn’t show up in new ZK after the switch, manual intervention is required. Will discuss later in the post.

  10. When Controller Leader restarts and comes up, it ensures all clients show up in new ZK (A2, B2, C2).

  11. If so, it proceed to substep 5 to clean up the old ZK cluster (A1), and mark the migration as Succeed.

  12. If not, an error message will be shown to users to ask for manual intervention. The old ZK cluster won’t be deleted.

  13. Release the global lock so clients can resume operations

(A1) -> ((A1, A2), B2, C2)

More Details that are Important to Keep In Mind

  • Controller Leader need to lock() both ZK cluster to prevent data being modified during the migration, and releases locks after the migration finishes.

  • The # of ZK instances is required to be an odd number less or equal to 7, because 1) ZK quorum mechanism so that even number of ZK instances doesn’t make sense 2) too many ZK will not improve the stability of TS and may bring troubles (thinking of fsync issue).

So ZK migration scenarios are as followed:
The # of old ZK cluster <-> # of new ZK cluster (machines here may overlap or be totally different ones)

1 3
1 5
3 3
3 5
5 5
  • Why Not migrate all async jobs?
  • Old jobs are not necessary to be there. As I mentioned above, Async Jobs in new Tableau Server are a serial of configuring and initializing jobs that have to be executed in order and one at a time, to perform changes to cluster Configuration and Topology. Thus, anytime you migrate ZK, it means the Topology has changed, and you need to rerun all those serial jobs again. Thus, there’s no need to migrate them.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s