Pushing Python's pandas DataFrames to Tableau Server/Online


Currently, my technology stack is using Python to blend and clean data before pushing the data up to Tableau Online (SaaS version of Tableau Server). Most of the dashboards need to be refreshed daily, some hourly. Today we’re going to walk through how to automate the data refresh with Tableau Server Client & Pandelau. Let’s jump in!

Getting Started

  • You will need typical packages setup already (python, pandas, numpy, etc), if not, easiest way is to get Anaconda.
  • Install Tableau’s Extract API 2.0, instructions on Tableau.
    • Download the .zip file
    • Unzip, cd to directory and (*nix) then sudo python setup.py install
    • Note: we are only going to be working with .hyper extracts here.
  • Install Tableau Server Client
    • pip install tableauserverclient
  • Install pandelau
    • pip install pandleau --no-deps
    • Had issues with one of the dep packages, so I ran it with –no-deps to bypass it.

Let’s start automating.

Import what we need:

import pandas as pd
import tableauserverclient as TSC
from pandleau import *

First, load in your data, then change the dataframe to a pandleau object, and then write out a .hyper extract. If you have spatial data you’ll have to specify which column, see the README.md on how to do that.

df = pd.read_csv('yourdata.csv')
df = pandleau(df)
df.to_tableau('/mydir/mydata.hyper', add_index=False)

Authenticate to your Tableau Server.

# Don't recommend saving your credentials in the script, store elsewhere
tableau_auth = TSC.TableauAuth('eric.kahei.chan@gmail.com',
                               'password123', site_id = 'mysite')
server = TSC.Server('https://us-east-1.online.tableau.com/',

Find your project’s id, we use TSC.Pager here because the .get() method will only returns the top 100 records.

projectname = 'Business Intelligence'
print([(project.name, project.id) for project in TSC.Pager(server.projects) if project.name.startswith(projectname)])

Now use the project id we just identified and specify a datasource name and push the .hyper file we created up to Tableau Server.

#If Datasource with the same name exist it will overwrite.
# Put the projectid you just found here, and name the datasource you want to publish
mydatasourceitem = TSC.DatasourceItem('myprojectid_12345',
item = server.datasources.publish(mydatasourceitem,
                                  '/mydir/mydata.hyper', 'Overwrite')
print("{} published with id: {}".format(item.name, item.id))


Wrapping up

Automation - Setup a cron job (crontab -e) and schedule the script to create the .hyper and publish it up to Tableau Server.

You can put everything in a function, something like this:

import pandas as pd
import tableauserverclient as TSC
from pandleau import *

def publishtotableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):
    Login to Tableau Online and publish a pandas dataframe
    Assumes the following pages are imported:
        - tableauserverclient as TSC
        - pandleau import *
        - pandas as pd

        df: dataframe to publish
        folder_path: folder to store temp.hyper file generated
        projectid: Tableau Server Project ID
        datasource_name: Name of the datasource to publish
        auth_list: List-like with username on index 0, password on index 1
        site: Tableau server site

    pandleau(df).to_tableau(folder_path+'temp.hyper', add_index=False)

    tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id = site)
    server = TSC.Server('https://us-east-1.online.tableau.com/', use_server_version=True)

    with server.auth.sign_in(tableau_auth):
        mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)
        item = server.datasources.publish(mydatasourceitem,folder_path+'temp.hyper', 'Overwrite')
        print("{} successfully published with id: {}".format(item.name, item.id))

Thoughts, comments?:

Reach out on LinkedIn or email.