Introduction
Currently, my technology stack is using Python to blend and clean data before pushing the data up to Tableau Online (SaaS version of Tableau Server). Most of the dashboards need to be refreshed daily, some hourly. Today we’re going to walk through how to automate the data refresh with Tableau Server Client & Pandelau. Let’s jump in!
Getting Started
- You will need typical packages setup already (python, pandas, numpy, etc), if not, easiest way is to get Anaconda.
- Install Tableau’s Extract API 2.0, instructions on Tableau.
- Download the .zip file
- Unzip,
cd
to directory and (*nix) thensudo python setup.py install
- Note: we are only going to be working with .hyper extracts here.
- Install Tableau Server Client
pip install tableauserverclient
- Install pandelau
pip install pandleau --no-deps
- Had issues with one of the dep packages, so I ran it with –no-deps to bypass it.
Let’s start automating.
Import what we need:
import pandas as pd
import tableauserverclient as TSC
from pandleau import *
First, load in your data, then change the dataframe to a pandleau object, and then write out a .hyper extract. If you have spatial data you’ll have to specify which column, see the README.md on how to do that.
df = pd.read_csv('yourdata.csv')
df = pandleau(df)
df.to_tableau('/mydir/mydata.hyper', add_index=False)
Authenticate to your Tableau Server.
# Don't recommend saving your credentials in the script, store elsewhere
tableau_auth = TSC.TableauAuth('eric.kahei.chan@gmail.com',
'password123', site_id = 'mysite')
server = TSC.Server('https://us-east-1.online.tableau.com/',
use_server_version=True)
server.auth.sign_in(tableau_auth)
Find your project’s id, we use TSC.Pager
here because the .get() method will only returns the top 100 records.
projectname = 'Business Intelligence'
print([(project.name, project.id) for project in TSC.Pager(server.projects) if project.name.startswith(projectname)])
Now use the project id we just identified and specify a datasource name and push the .hyper file we created up to Tableau Server.
#If Datasource with the same name exist it will overwrite.
# Put the projectid you just found here, and name the datasource you want to publish
mydatasourceitem = TSC.DatasourceItem('myprojectid_12345',
name='myfirstdatasource')
item = server.datasources.publish(mydatasourceitem,
'/mydir/mydata.hyper', 'Overwrite')
print("{} published with id: {}".format(item.name, item.id))
server.auth.sign_out()
Wrapping up
Automation - Setup a cron job (crontab -e
) and schedule the script to create the .hyper and publish it up to Tableau Server.
You can put everything in a function, something like this:
import pandas as pd
import tableauserverclient as TSC
from pandleau import *
def publishtotableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):
"""
Login to Tableau Online and publish a pandas dataframe
Assumes the following pages are imported:
- tableauserverclient as TSC
- pandleau import *
- pandas as pd
Args:
df: dataframe to publish
folder_path: folder to store temp.hyper file generated
projectid: Tableau Server Project ID
datasource_name: Name of the datasource to publish
auth_list: List-like with username on index 0, password on index 1
site: Tableau server site
Returns:
None
"""
pandleau(df).to_tableau(folder_path+'temp.hyper', add_index=False)
tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id = site)
server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)
with server.auth.sign_in(tableau_auth):
mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)
item = server.datasources.publish(mydatasourceitem,folder_path+'temp.hyper', 'Overwrite')
print("{} successfully published with id: {}".format(item.name, item.id))