New open source software to securely automate deployment of data scientists’ interactive visualizations in all languages and frameworks
For example, Volià, described in its simplest form, displays a Jupyter notebook as a user-friendly web app — code cells are hidden, and the notebook is run automatically from top-to-bottom without the user having to shift+enter their way through it. Python-based widget libraries provide simple user controls such as sliders and dropdowns.
Data Scientists and Analysts can start using these frameworks by following a short tutorial. And they clearly have the technical ability to find a way to host the resulting web apps, perhaps on their own laptop temporarily, or by deploying through an AWS account.
But this simple deployment step turns out to be the most common blocker for the adoption of dashboarding frameworks in an organization.
Although the process can be straightforward, it is unrewarding in itself — some tedious commands to run just when the data scientist proudly finished their brand-new analysis. Even worse, it is error-prone and can leave sensitive data exposed.
Just as automated testing and continuous integration make releasing new features fun rather than a chore in traditional software development, any barrier to sharing the data dashboard stifles innovation and discourages new iterations.
Leaving the deployment choices to your data scientists is unwise, as technically astute as they undoubtedly are. They aren’t going to have time to secure the servers, and will likely choose the easiest authentication system. For secure regular deployment, you really need a unified approach to hosting across your data team that can be approved wholesale by your information security department.
These dashboards are technically web-apps. They don’t need IT to spend three months auditing them for security, but there needs to be an approved method for deploying them that wouldn’t horrify the IT department as much as the ad-hoc methods that your data scientists will use if left to their own devices.
You need to know where all your dashboards are running and how they are authenticated. Otherwise, there may be outdated and insecure servers running out there in the wild exposing your networks and sensitive data. When employees leave your organization, you need to be able to terminate their access to these dashboards. If the employee happens to be the one running a handful of dashboards on their personal AWS account, you don’t want to rely on them to remember what was running where so you can turn them off or transfer ownership!
The ContainDS Product Suite
Free open source ContainDS software products can provide a unified deployment platform for your data scientists, allowing them to share dashboards based on open source frameworks in an automated, secure, and reproducible way.
Any open source dashboarding framework can be utilized. Supported as standard are Voilà, Streamlit, Plotly Dash, Bokeh, Panel, and R Shiny. Those should be a good starting point for any projects your data science team is likely to face!
ContainDS is a collection of two main products and related open source technologies.
ContainDS Dashboards is a platform for hosting and sharing dashboards over the internet or an internal network, with named authenticated users — perhaps specific colleagues or clients.
Sometimes even this is too open, so where dashboards need to be shared offline due to lack of internet or for contractual reasons why the data can’t be accessed over a network, ContainDS Desktop is an app for your Windows or Mac computer allowing you to run dashboards on your local machine and share them with others as single flat files.
Here we’ll focus on the online dashboards software.
Online Dashboards through JupyterHub
ContainDS Dashboards is an extension for the popular JupyterHub software. This makes it especially easy to install if you already have a JupyterHub in use, but setting one up for the first time is not too complicated and it can be useful to have anyway.
JupyterHub is a way to centrally manage Jupyter notebook environments for your whole team. The standard installation allows each user to spin up their own Jupyter notebook, and the ContainDS Dashboards extension allows them to directly start user-friendly dashboards instead, sharing them with other authenticated users.
Different ‘distributions’ of JupyterHub provide different approaches to maintenance and scalability. There are lots of bespoke options, but the two main paths are Zero 2 JupyterHub which runs on Kubernetes, allowing seamless scaling of resources over multiple machines for large numbers of users or projects; and The Littlest JupyterHub to set up a single VM to run JupyterHub (given the range of VMs available on cloud providers these days, this can still support surprisingly heavy usage!).
Loading your app’s files
If you are already heavy users of Jupyter notebooks, and perhaps just want to deploy notebooks as Voilà or Panel apps, then it might make sense to use the first option — Jupyter Tree. You can edit your notebooks as normal, then once you’re happy head to the Dashboards menu in JupyterHub to enter the path to your notebook and see it deployed automatically as a new dashboard.
Even better, there is now a companion Jupyter extension so you can create a dashboard directly from JupyterLab or Notebook with one click.
Alternatively, if you are used to editing your Streamlit, R Shiny, Plotly Dash apps etc on your local machine, it might be more convenient to check your code into a Git repo and then instruct ContainDS Dashboards to pull it straight from your repo and deploy it. You can use public or private Git repos, and GitHub integration means you can one-click login to JupyterHub through your GitHub account and automatically grant access to your repos in the process.
Using either file source method, you can also select from multiple Conda environments if you’ve made them available to your JupyterHub users.
It’s also important to select the correct ‘framework’ from the dropdown to ensure the right mechanism is used to serve the dashboard. As already listed, Voilà, Streamlit, Plotly Dash, Bokeh, Panel, and R Shiny are currently supported out-of-the-box, but it is easy to add any custom framework that works as a web app.
Once deployed, dashboards are really just like separate Jupyter servers, but instead of running Jupyter notebook they run directly the server software of your chosen framework. If you’ve ever tried the Voilà Preview button in a Jupyter notebook, you will be familiar with the end result — but in the case of ContainDS Dashboards the deployment has no Jupyter front-end at all. Your apps will be deployed as pure web apps. This is exactly what you need for apps that you are going to share with others… the end users should not be able to run arbitrary code on your server.
Sharing with Other Users
The new Dashboards menu that is added to JupyterHub is not only used to register a new dashboard for deployment, but also serves as a list of contents for any dashboards that have been shared with you.
When you create the dashboard, you can choose whether to make it available to all users in your JupyterHub, or just to selected named users. JupyterHub allows a wide range of authentication methods — so, for example, using LDAP or Google Single-sign-on, all your colleagues can easily access your dashboards through an account that will be automatically created for them.
Authorized users can click into any dashboard that has been shared with them, click to confirm the OAuth consent screen, then immediately start interacting with the dashboard.
Extendable and Configurable
Everything about JupyterHub is highly configurable: from where you host it (Kubernetes, on a cloud VM, or on your internal network) to how users authenticate at login.
The same applies to ContainDS Dashboards — you have full control over the way it behaves, and you can even plug in your own dashboarding visualization frameworks (e.g. Flask-based web apps) just by editing the configuration files.
That was a quick overview of ContainDS Dashboards, explaining how easy it is for a data scientist to deploy a new interactive visualization to share with clients or colleagues.
Your data scientists are already experimenting with the new visualization frameworks that have appeared on the open source landscape over the last few years. Their apps work great on their development machine, but it’s always a pain when they need to deploy it.
Often, they revert to exporting a PDF or just copy-and-pasting graphs into emails instead. This is a real missed opportunity to allow decision-makers to truly immerse themselves in the data models.
If the dashboard does end up being deployed, it’s often not in an IT-approved manner, with simple authentication steps and hosting on arbitrary cloud servers.
For medium-to-large data science teams, different projects have different needs — and data scientists want to choose the open source frameworks that make sense to their own skills and the project’s requirements.
To overcome these problems, ContainDS Dashboards provides a unified deployment and sharing model that can be administered by IT and used effortlessly by data science teams whatever technologies they are using to drive their analyses.
For installation details see ContainDS Dashboards documentation.