Installing Tachyon (In-Memory-File-System) As A Cluster

This post shows how to setup and configure Tachyon as a cluster.

Quick Pointers:

We are using version 0.6.4

1 – Layout

We will have one master and 5 workers (master is also doubling as a worker!).  

2 – Process

  1. configure files on master
  2. push config files from master to all workers
  3. start Tachyon cluster from master

3 – Installation Pre-Requisites

3.1 –  Linux Machines

We have provisioned Linux  machines on Amazon AWS.  Here is our configuration

  • OS : CentOS 6.4 (AMI image ???)
  • instance type : m3.large
  • CPU : 2 cores
  • Memory : 7G

We will refer the machines as

  • master / worker1
  • worker2
  • worker3
  • worker4
  • worker5

3.2 – Setting up Machines for Password-less SSH

Only do this, if you don’t have any existing SSH setup between machines.  Master should be able to login to all workers via SSH.

Step 1 : Login to master

Step 2 : Create SSH key

$ ssh-keygen

Step 3 : Setup password less login

$ ssh-copy-id    worker2

enter credentials.

If you do not have the login with username/password credentials, but must login with the key (as is the case with fresh EC2 instances), you can use this command

$ cat ~/.ssh/ | ssh -i your-key worker1 "mkdir -p ~/.ssh && cat >>  ~/.ssh/authorized_keys"
The command above assumes that you have temporarily copied your-key, which you used to login into the master instance, to its home directory. Remove this key after you are done.

now repeat this for all workers
$ ssh-copy-id  worker2
 ..... so on....

You may have to add the key manually for the 'master' host - that one on which you do the work. Then
$ ssh localhost
will also work.

3.3 – Testing SSH-password-less login

Step 1 :
Let’s create a $HOME/hosts file with the following contents.  We want to fill ip-addresses of all hosts in the cluster

master ip address
worker2 ip address
worker3 ip address
worker4 ip address
worker5 ip address

We have a utility script called ‘‘ .  It will execute commands on remote nodes.

Get the code and execute it

$  wget 
$  chmod 755 
$  ./   -h hosts  ls

This would execute `ls` command on all nodes specified on hosts files.  The following is  a sample output…

==== worker 1 ==== 
 ==== worker 2 === 
 .... and so on

(you may have to add the host to /etc/hosts)

Getting, Installing and Configuring Tachyon

from master node

step 1 : get tachyon

$   wget 
$   tar xvf tachyon-0.6.4-bin.tar.gz 
$   mv tachyon-0.6.4/ tachyon 

So our tachyon installation directory is   $HOME/tachyon

Step 2 : Configure Tachyon

We will configure Tachyon master and push changes out to all workers nodes

Step 2A :

Use config script to generate a config file Config file : $HOME/tachyon/conf/

$ ~/tachyon/bin/tachyon bootstrap-conf <tachyon_master_hostname> 

For this use master’s public IP address (not internal IP address)

Here is my sample config file Three things that needs to be set

  1. JAVA_HOME  : make sure to set this on top of the script!
  2. TACHYON_MASTER_ADDRESS : set by bootstrap-conf
  3. TACHYON_WORKER_MEMORY_SIZE : set by bootstrap-conf

look for XXX tag in config file

Note: if you don’t have Java installed (like on fresh EC2 instances), then now is a good time to install it on all hosts.

Step 2B : Edit  $HOME/tachyon/conf/workers

This file will have ip addresses of all worker nodes, one line at a time.  In our case, this is the same file as $HOME/hosts.

$ cp    ~/hosts   ~/tachyon/conf/workers

Step  3 : Distribute Config files to all nodes

For this we are going to use a handy script.
Here is the script on github

Execute it like this

$ wget
$ chmod 755
$  ./

This will push out tachyon directory to all nodes.
After this all nodes will have tachyon installed at : $HOME/tachyon

Step 4 : Let’s verify tachyon files are installed on all nodes

$ ./   -h hosts   ls

and you should see tachyon directory listed on all nodes.

Now we are all set to run Tachyon!

Running Tachyon

step 1: format Tachyon file system

This is a destructive command, as in you will loose all files stored in Tachyon File System.  Beware!

$ ~/tachyon/bin/tachyon format

Your output may look like this….

Formatting Tachyon Worker @ worker1
Formatting Tachyon Worker @ worker2

Step 2 : Manual fix for a format bug

Format command does not create data directory in storage directory.  We will create it manually

$ ~/  -h hosts "mkdir -p $HOME/tachyon/underfs/tmp/tachyon/data"

Step 3 : Let’s start Tachyon!

$  ~/tachyon/bin/ all SudoMount

If every thing went OK, we will see RAMDISK mounts on all nodes.

$ ~/ -h hosts  "(mount | grep ramdisk)"

We should see output like this…

====== worker1 ======
ramfs on /mnt/ramdisk type ramfs (rw,size=5009266kb)
====== worker2 ======
ramfs on /mnt/ramdisk type ramfs (rw,size=5009266kb)

Step 4 : Check UI

Tachyon Master UI is available on port 19999 of Master

So go to :   http://master-host-ip-address:19999

in your browser.  You will see something like below….



Yay! we have Tachyon up and running


Testing / Kicking The Tires

Step 1: Let’s copy some files into Tachyon.

For this, we will use   $HOME/tachyon/bin/tachyon command-line client.

$ ~/tachyon/bin/tachyon tfs copyFromLocal  ~/hosts   /hosts

Inspect the file using File Browser


Step 2: Creating some test data

We will create some large enough data files to copy into Tachyon

# the following will create a 1G file
$  dd if=/dev/zero of=1G bs=1M count=1000
# 2G file
$  dd if=/dev/zero of=2G bs=1M count=2000

copy to tachyon

$ ~/tachyon/bin/tachyon tfs copyFromLocal  1G    /1a
# copy again
$ ~/tachyon/bin/tachyon tfs copyFromLocal  1G    /1b

Check File Browser in Tachyon UI.


As you can see the file is only copied to ONE Tachyon node.  The ‘copyFromLocal’ command only copies to local node.

Let’s test this:

Let’s login to another worker node and copy a file

$  ssh worker2
# create a 1G file
$ dd if=/dev/zero of=1G bs=1M count=1000
$ ~/tachyon/bin/tachyon tfs copyFromLocal  1G    /2a

And checkout the UI again. As you can see the files on local nodes only.







In this post, we have showed you how to install Tachyon as a cluster and use it.



Tim Fox
Written by:

Tim Fox

Tim Fox is an AI and Data Engineering consultant focused on engineering solutions in Artificial Intelligence, Machine Learning, Big Data Architecture, Data Science, and Analytics.

Leave a Reply

Your email address will not be published. Required fields are marked *