Building Blog for Your Portfolio
Last updated
Last updated
Data science blogs can be a fantastic way to demonstrate your skills, learn topics in more depth, and build an audience. There are quite a few examples of data science and programming blogs that have helped their authors land jobs or make important connections. Writing a data science blog is thus one of the most important things that any aspiring programmer or data scientist should be doing on a regular basis. (This is the second in a series of posts on how to build a Data Science Portfolio. You can find links to the other posts in this series at the bottom of the post.) Unfortunately, one very arbitrary barrier to blogging can be knowing how to set up a blog in the first place. In this post, we’ll cover how to create a blog using Python, how to create posts using Jupyter notebook, and how to deploy the blog live using GitHub Pages. After reading this post, you’ll be able to create your own data science blog, and author posts in a familiar and simple interface.
Fundamentally, a static site is just a folder full of HTML files. We can run a server that allows others to connect to this folder and retrieve files. The nice thing about this is that it doesn’t require a database or any other moving parts, and it’s very easy to host on sites like GitHub. It’s a great idea to have your data science blog be a static site, because it makes maintaining it very simple. One way to create a static site is to manually edit HTML, then upload the folder full of HTML to a server. In this scenario, you would at the minimum need an
index.html
file. If your website URL was thebestblog.com
, and visitors visited http://www.thebestblog.com
, they would be shown the contents of index.html
. Here’s how a folder of HTML might look for thebestblog.com
:
On the above site, visiting
http://www.thebestblog.com/first-post.html
would show you the content in first-post.html
, and so on. first-post.html
might look like this:
You might immediately notice a few problems with manually editing HTML:
Manually editing HTML is incredibly painful.
If you want to make multiple posts, you have to copy over styles, and other elements, like the title and footer.
If you want to integrate comments or other plugins, you’ll have to write JavaScript.
Generally, when you’re blogging, you want to focus on the content, not spend time fighting with HTML. Thankfully, you can create a data science blog without hand editing HTML using tools known as static site generators.
Static site generators allow you to write blog posts in simple formats, usually markdown, then define some settings. The generators then convert your posts into HTML automatically. Using a static site generator, we’d be able to dramatically simplify
first-post.html
into first-post.md
:
This is much easier to manage than the HTML file! Common elements, like the title and the footer, can be placed into templates, so they can be easily changed. There are a few different static site generators. The most popular is called
Jekyll, and is written in Ruby. Since we’ll be making a data science blog, we want a static site generator that can process Jupyter notebooks. Pelican is a static site generator that is written in Python that can take in Jupyter notebook files and convert them to HTML blog posts. Pelican also makes it easy to deploy our blog to GitHub Pages, where other people can read our blog.
Before we get started,
here’s a repo that’s an example of what we’ll eventually get to. If you don’t have Python installed, you’ll need to do some preliminary setup before we get started. Here are setup instructions for Python. We recommend using Python 3.5
. Once you have Python installed:
Create a folder — we’ll put our blog content and styles in this folder. We’ll refer to it in this tutorial as jupyter-blog
, but you can call it whatever you want.
cd
into jupyter-blog
.
Create a file called .gitignore
, and add in the content from this file. We’ll need to eventually commit our repo to git, and this will exclude some files when we do.
Create and activate a virtual environment.
Create a file called requirements.txt
in jupyter-blog
, with the following content:
Run pip install -r requirements.txt
in jupyter-blog
to install all of the packages in requirements.txt
.
Once you’ve done the preliminary setup, you’re ready to create your blog! Run
pelican-quickstart
in jupyter-blog
to start an interactive setup sequence for your blog. You’ll get a sequence of questions that will help you setup your blog properly. For most of the questions, it’s okay to just hit Enter
and accept the default value. The only ones you should fill out are the title of the website, the author of the website, n
for the URL prefix, and the timezone. Here’s an example:
After running
pelican-quickstart
, you should have two new folders in jupyter-blog
, content
, and output
, along with several files, such as pelicanconf.py
and publishconf.py
. Here’s an example of what should be in the folder:
Pelican doesn’t support writing blog posts using Jupyter by default — we’ll need to install a
plugin that enables this behavior. We’ll install the plugin as a git submodule to make it easier to manage. If you don’t have git installed, you can find instructions here. Once you have git installed:
Run git init
to initialize the current folder as a git repository.
Create the folder plugins
.
Run git submodule add git://github.com/danielfrg/pelican-ipynb.git plugins/ipynb
to add in the plugin.
You should now have a
.gitmodules
file and a plugins
folder:
In order to activate the plugin, we’ll need to modify
pelicanconf.py
and add these lines at the bottom:
These lines tell Pelican to activate the plugin when generating HTML.
Once the plugin is installed, we can create the first post:
Create a Jupyter notebook with some basic content. Here’s an example you can download if you want.
Copy the notebook file into the content
folder.
Create a file that has the same name as your notebook, but with the extension .ipynb-meta
. Here’s an example.
Add the following content to the ipynb-meta
file, but change the fields to match your own post:
Here’s an explanation of the fields:
Title
— the title of the post.
Slug
— the path at which the post will be accessed on the server. If the slug is first-post
, and your server is jupyter-blog.com
, you’d access the post at http://www.jupyter-blog.com/first-post
.
Date
— the date the post will be published.
Category
— a category for the post (this can be anything).
Tags
— a space-separated list of tags to use for the post. These can be anything.
Author
— the name of the author of the post.
Summary
— a short summary of your post.
You’ll need to copy in a notebook file, and create an
ipynb-meta
file whenever you want to add a new post to your blog. Once you’ve created the notebook and the meta file, you’re ready to generate your blog HTML files. Here’s an example of what the jupyter-blog
folder should look like now:
In order to generate HTML from our post, we’ll need to run Pelican to convert the notebooks to HTML, then run a local server to be able to view them:
Switch to the jupyter-blog
folder.
Run pelican content
to generate the HTML.
Switch to the output
directory.
Run python -m pelican.server
.
Visit localhost:8000
in your browser to preview the blog.
You should be able to browse a listing of all the posts in your data science blog, along with the specific post you created.
GitHub Pages is a feature of GitHub that allows you to quickly deploy a static site and let anyone access it using a unique URL. In order to set it up, you’ll need to:
Sign up for GitHub if you haven’t already.
Create a repository called username.github.io
, where username
is your GitHub username. Here’s a more detailed guide on how to do this.
Switch to the jupyter-blog
folder.
Add the repository as a remote for your local git repository by running git remote add origin git@github.com:username/username.github.io.git
— replace both references to username
with your GitHub username.
A GitHub page will display whatever HTML files are pushed up to the
master
branch of the repository username.github.io
at the URL username.github.io
(the repository name and the URL are the same). First, we’ll need to modify Pelican so that URLs point to the right spot:
Edit SITEURL
in publishconf.py
, so that it is set to http://username.github.io
, where username
is your GitHub username.
Run pelican content -s publishconf.py
. When you want to preview your blog locally, run pelican content
. Before you deploy, run pelican content -s publishconf.py
. This uses the correct settings file for deployment.
If you want to store your actual notebooks and other files in the same Git repo as a GitHub Page, you can use git branches.
Run git checkout -b dev
to create and switch to a branch called dev
. We can’t use master
to store our notebooks, since that’s the branch that’s used by GitHub Pages.
Create a commit and push to GitHub like normal (using git add
, git commit
, and git push
).
We’ll need to add the content of the blog to the
master
branch for GitHub Pages to work properly. Currently, the HTML content is inside the folder output
, but we need it to be at the root of the repository, not in a subfolder. We can use the ghp-import tool for this:
Run ghp-import output -b master
to import everything in the output
folder to the master
branch.
Use git push origin master
to push your content to GitHub.
Try visiting username.github.io
— you should see your page!
Whenever you make a change to your data science blog, just re-run the
pelican content -s publishconf.py
, ghp-import
and git push
commands above, and your GitHub Page will be updated.
Comments are one way to interact with your guests. Disqus is a good tool for this, and it integrates with Pelican seamlessly. Follow these steps:
Go to the Disqus site and register.
Click “Get Started”, then choose “I want to install Disqus on my site”.
Enter your Website Name. This will serve as a unique key to link Disqus to your blog, by passing it into the publishconf.py
file. (More on this in a future step.)
Choose a Disqus subscription plan — a basic plan is perfect for a personal blog.
When Disqus asks which platform your site is on, scroll down and choose “I don’t see my platform listed, install manually with Universal Code”.
On the Universal Code page, scroll down again and click “Configure”.
On the Configure page, fill in the “Website URL” section with your actual website address (https://username.github.io
). You can also add information about your comment policy (if you don’t have one, Disqus gives suggestions), and enter a description for your site. Click “Complete Setup”.
You’ll now have the ability to configure your site’s community settings. Click into this section and look around. Among other things, you’ll be able to control whether guests can comment, and activate ads.
In the toolbar on the left, click “Advanced” and add your website into Trusted Domains as username.github.io
.
Lastly, update publishconf.py
. Make sure to specify DISQUS_SITENAME = "website-name"
, where “website-name” comes from step 3.
Now rerun the
pelican content -s publishconf.py
, ghp-import output -b master
and git push origin master
commands to update your GitHub Page. Refresh your website and you’ll see Disqus appearing under each post.
The Pelican community offers a variety of themes at
pelicanthemes.com. You can choose any theme you like, but here are some quick tips:
Keep it simple. The design should not distract from the actual content.
Remember the “Rule of Three Colors”. According to University of Toronto study, most people prefer combinations of two to three colors. This way colors don’t fight for attention.
Pay attention to the width of your page — it should be enough to contain infographics and code that you may want to publish.
Once you’ve picked a theme, go to the folder where you wish to store your theme and create a repo:
git clone --recursive https://github.com/getpelican/pelican-themes pelican-themes
. Create a THEME
variable in your pelicanconf.py
file and set its value to the location of the theme: THEME = 'E:\\Pelican\\pelican-themes\\flex
Here, we are using a nice flex theme by Alexandre Vincenzi. Run the usual finishing commands — pelican content
, ghp-import
, and git push
— and enjoy a new look!
We’ve come a long way! You now should be able to author blog posts and push them to GitHub Pages. Anyone should be able to access your data science blog at
username.github.io
(replacing username
with your GitHub username). This gives you a great way to show off your data science portfolio. As you write more posts and gain an audience, you may want to dive more into a few areas:
Your own custom URL.
Using username.github.io
is nice, but sometimes you want a more custom domain. Here’s a guide on using a custom domain with GitHub Pages.
Plugins
Check out the list of plugins here. Plugins can help you setup analytics, commenting, and more.
At
Dataquest, our interactive guided projects are designed to help you start building a data science portfolio to demonstrate your skills to employers and get a job in data. If you’re interested, you can signup and do our first module for free.
If you liked this, you might like to read the other posts in our ‘Build a Data Science Portfolio’ series:
Reference : https://www.dataquest.io/blog/how-to-setup-a-data-science-blog/