Creating Interactive Dashboards
23 Dec 2019
Last updated
23 Dec 2019
Last updated
I am pleased to have another guest post from Duarte O.Carmo. He wrote series of posts in July on report generation with Papermill that were very well received. In this article, he will explore how to use Voilà and Plotly Express to convert a Jupyter notebook into a standalone interactive web site. In addition, this article will show examples of collecting data through an API endpoint, performing sentiment analysis on that data and show multiple approaches to deploying the dashboard.
Since this is a long article, here are the table of contents for easier navigation:
Jupyter notebooks are one of my favorite tools to work with data, they are simple to use, fast to set up, and flexible. However, they do have their disadvantages: source control, collaboration, and reproducibility are just some of them. As I illustrated in my prior post, I tend to enjoy seeing what I can accomplish with them.
An increasing need is the sharing of our notebooks. Sure, you can export your notebooks to html, pdf, or even use something like nbviewer to share them. But what if your data changes constantly? What if every time you run your notebook, you expect to see something different? How can you go about sharing something like that?
But what if your data changes constantly? What if every time you run your notebook, you expect to see something different? How can you go about sharing something like that?
In this article, I’ll show you how to create a Jupyter Notebook that fetches live data, builds an interactive plot and then how to deploy it as a live dashboard. When you want to share the dashboard, all you need to share with someone is a link.
Let’s have some fun with the data first.
We will use Reddit as the source of data for our dashboard. Reddit is a tremendous source of information, and there are a million ways to get access to it. One of my favorite ways to access the data is through a small API called pushshift. The documentation is right here.
Let’s say you wanted the most recent comments mentioning the word “python”. In python, you could use requests to get a json version of the data:
You can add a multitude of parameters to this request, such as:
in a certain subreddit
after a certain day
sorted by up votes
many more
To make my life easier, I built a function that allows me to call this API as a function:
Using the payload
parameter and kwargs
I can then add any payload I wish as a function. For example,
returns the json response. Pretty sweet right?
To answer the above question, we start by getting the data with our function:
The aggs
keyword asks pushshift to return an aggregation into subreddits, which basically means, group the results by subreddit. (read about it in the documentation)
Since the json response is pretty nested, we’ll need to navigate a bit inside of the dictionary.
And we transform the list of dictionaries returned into a pandas DataFrame, and get the top 10.
Here’s what our DataFrame looks like:
doc_count
key
0
352
learnpython
1
220
AskReddit
2
177
Python
3
139
learnprogramming
Let’s plot our results with the Ploty Express library. Plotly Express is great, in my opinion, if you want to:
Here’s all the code you need:
Yes, perhaps a bit more verbose than matplotlib, but you get an interactive chart!
All of the details are included in the notebook for this article.
To answer this question, our function will again come in handy. Let’s aggregate things a bit.
Don’t get scared, this is a one liner that will produce similar results to above:
To make a DataFrame column clickable you can can apply the following function to it:
The above code will return the top 10 most upvoted comments of the last 7 days:
author
subreddit
sc or e
body
permalink
0
Saiboo
learnpyth on
11 1
Suppose you create the following python file calle…
Link
1
Kompakt
Programme rHumor
92
Some languages don’t have switch statements…look…
Link
2
clown_ world_ 2020
MrRobot
47
Just goes to show that Esmail isn’t the only brill…
Link
3
Leebert ysauce
AnimalsBe ingBros
28
They wont even be mad when the python decide to ta…
Link
4
Kompakt
Programme rHumor
23
Yep it’s true, and depending on the design of the …
Link
5
niceboy 4431
Cringetop ia
23
I have a theory (someone prove me wrong if you kno…
Link
6
kinggur u
Denmark
22
Brug af Python: +1 Brug af Python 3: +2 …
Link
7
MintyAr oma
totalwar
20
We really need Bretonnian Men-At-Arms shouting Mon…
Link
8
aspirin gtobeme
gifsthatk eepongivi ng
19
Amazing. Brought [this Monty Python clip](…
Link
9
Crimson Spooker
TwoBestFr iendsPlay
19
“Why can’t Three Houses be gritty and “realistic” …
Link
Alright, the final analysis is a bit more complicated. We want to see the sentiment in the /r/python subreddit in some sort of time line.
First, we already now how to retrieve the most up voted comments of the past 2 days:
This gives us a pandas DataFrame with the columns specified in columns_of_interest
. But how do we get the sentiment of every comment?
Enter TextBlob. A simple library that makes it ridiculously easy to get the sentiment of a sentence. Textblob returns two values, the sentiment polarity (-1 is negative; 0 is neutral; and 1 is positive) and the sentiment subjectivity (0 is objective and 1 is subjective)
Here’s an example:
Read more about the library here.
Now that we know how to extract sentiment from a piece of text, we can easily create some other columns for our DataFrame of comments:
Finally, it’s time to plot our figure with the help of Plotly Express:
And here’s the output!
In this view, we can see the comments made in /r/python in the last 48 hours. We can see that most comments are rather on the positive side, but some are also negative. In your own notebook you’ll notice that you can hover over the comments and read the preview to see why they were classified as negative or positive.
The cool thing here is that if you run the same script tomorrow, you’ll get a different output.
So how can we have this in some place that “automatically” is updated whenever we see it?
Voilà has a simple premise: “Voilà turns Jupyter notebooks into standalone web applications.”
Let’s back up a bit, and get everything you need running in your system. First step is to have a working setup with everything above, for that, follow these instructions .
Once that is done, you should be able to launch the dashboard with:
Now, you should be able to see a web like application in a new tab in your browser from the notebook we created!
Feel free to modify this notebook according to your interests. You’ll notice that I have created some general variables in the first notebook cell, so you can fire up Jupyter Lab, and modify them and see what comes out!
Here are the general modifiable cells:
Once you have modified your dashboard, you can launch Voilà again to see the results.
The most important thing about Voilà is that every time it runs, it actually re-runs your whole code, which yes, makes things a bit slow, but also means that the results get updated every time the page is refreshed! :tada:
Binder helps you turn a simple GitHub repo into an interactive notebook environment. They do this by using docker images to reproduce your GitHub repo’s setup.
We don’t really care about all that. We just want to publish our Voilà dashboard. To do that, follow these steps:
Create a public GitHub repo
Add the notebooks you want to publish as dashboards to it
Add a requirements.txt
file just as I have in the example repo with all of your dependencies.
Go to mybinder.org
In the GitHub
field add your repo’s URL.
In the GitHub branch, tag, or commit
field, add “master”, otherwise, you probably know what you are doing.
In the Path to a notebook field
add /voila/render/path/to/notebook.ipynb
the path/to/render
should be the location of your notebook in your repo. In the example, this results in voila/render/notebooks/Dashboard.ipynb
In the Path to a notebook field
toggle URL
(instead of the default file
option)
Hit launch
Your dashboard will automatically launch :open_mouth: :tada:
You can share the link with others and they will have access to the dashboard as well.
Here is the running example of our reddit dashboard. (Takes a bit to build for the first time..)
WARNING: This option is not 100% safe, so make sure to only use it for testing or proof of concepts, particularly if you are dealing with sensitive data!
If you want to have your dashboard running on a typical URL (such as mycooldash.com for example), you probably want to deploy it on a Linux server.
Here are the steps I used to accomplish that:
Set up your virtual private server - this Linode guide is a good start.
Make sure port 80 (the regular http port) is open
Once you have your repo in GitHub or somewhere else, clone it to your server.
You should already have python 3 installed. Try typing python3
in your console. If that fails then these instructions will help you.
Make sure you can run your dashboard, by creating a virtual environment and installing the dependencies.
Now, if you type in your console the Voilà command, and specify the port:
You can probably navigate to your server’s IP and see the dashboard. However, as soon as you exit your server, your dashboard will stop working. We are going to use a nifty trick with a tool called tmux.
Tmux is a “terminal multiplexer” (wow, that’s a big word). It basically allows us to create multiple terminal sessions at the same time, and then (yes you guessed it), keep them running indefinitely. If this sounds confusing, let’s just get to it.
Install tmux:
Once installed we create a new terminal session:
You are now inside a new terminal session. Let’s get Voilà running there.
You should see the dashboard in your browser
And now, for the magic, in your terminal hit ctrl
+ b
and then d
on your keyboard. This will “detach” you from that terminal where Voilà is running.
You are now back to your original terminal session. Notice that your dashboard is still running. This is because your voila
terminal session is still running.
You can see it by listing the terminal sessions with:
And then attach to it via:
And you’ll see your Voilà logs outputting.
This is arguably a bit of a hack to have things running, but it works - so no complaints there.
Tmux is an awesome tool, and you should definitely learn more about it here.
There are a million ways of deploying, and Voilà also has good documentation on these.
That was a long post! But we are finally done! Let’s summarize everything we learned:
We learned how to transform an API endpoint into a function with *kwargs
We learned how to analyze reddit data with python and Plotly Express
We learned how to analyze sentiment from sentences with TextBlob
We learned how to transform a jupyter notebook into a dashboard using Voilà.
We learned how to deploy those dashboards with Binder.org
We learned how to use tmux to deploy these kinds of tools in a server.
That was a lot of stuff, and probably there are a lot of bugs in my notebook, or explanation so make sure to:
Visit the GitHub repo were both the code and post are stored.
If there is something wrong in the code please feel free to submit an issue or a pull request.
Reference : https://pbpython.com/interactive-dashboards.html
Hey everyone! My name is Duarte O.Carmo and I’m a Consultant working at Jabra that loves working with python and data. Make sure to visit my website if you want to find more about me
These are the names of the subreddits where the word python
appears most frequently in their comments !
create figures quickly.
create figures that are a bit more interactive than matplotlib.
don’t mind a bit more installation and (imo) a bit less documentation.
In the notebook, you can click the link column to be taken right into the comment. Hooray!
Visit my website if you want to learn more about my work