Google Colab for Data Analytics

The Ultimate Cloud Based Interactive Notebook for Data Scientists

I have too much Python installation in my laptop! I am confused.

It takes too much time to set up Tensorflow!

Sending Jupyter Notebook versions are time waster!

Have you ever wished you have a ready launched environments where you directly kickstart your analysis without installing libraries and dependencies?

In this tutorial I would like to share with you about the amazing products that I regularly use in my day job — Google Colab.

Google Colab (Colaboratory) is a data analysis tool which combines code, output, and descriptive text into one document — source

Essentially, you are provided with TPU and GPU processing which are already optimized for data analysis pipeline in the cloud.

This allows you to directly run Deep Learning using popular libraries such as Theano, Tensorflow, and Keras. Similarly, you can generate graphics and share analysis outputs just like docs and sheets.

Interested? Let’s get started!!

What is Google Colab?

Google Colab is Jupyter Notebook + Cloud + Google Drive

Colaboratory is a data analysis tool which combines code, output and descriptive text into one document (interactive notebook).

Colab provides GPU and is totally free. By using Google Colab, you can:

  • Build your analytics products quickly in a standardized environments.

  • Facilitates popular DL libraries on the go such as PyTorch, and TensorFlow

  • Share code & results within your Google Drive

  • Save copies and create playground modes for knowledge sharing

  • Colab is runnable on the cloud or on local server with Jupyter

The free version of colab are not guaranteed and limited; the usage limits fluctuate frequently which is necessary for Colab to provide these resources for free.

The good news is that, you can subscribe to Colab Pro or run a dedicated server with Google Cloud Product. Both options are easy and deployable just by within 20 minutes.

Feel free to use this Colab file if you want to follow along with the following tutorial. Run it and discover how Colab simplify your job as data analyst/scientist.

Getting Google Colab Ready to Use

With Colab you could code a shareable Python Interactive Notebook in Google Drive. In this tutorial, we are going to conduct 6 simple practices to get our hands dirty with Google Colab.

Opening Google Colab

Finding Google Colab

  1. Open your google drive

  2. Right click and open Google Colab

  3. Rename your notebook

  4. Done! Enjoy your notebook.

Installing Libraries

Thankfully, Colab has already provided numerous python libraries in each runtime.

Occasionally, you might encounter needs for additional libraries. In this case you should download them with pip or apt package manager bash commands in Colab.

All you need to do is to start bash commands with ! This tells Colab that this notebook cell is not a python code but a command line script.

To run bash commands, you should add ! precedes the code line. For example !pip freeze or !pip install <DL Library>

the free version of Google Colaboratory runtime is not persistent as it jumps for each terminated session. This means you need to reinstall libraries every time you connect to your Colab Notebook.

To get what libraries are already available you can use !pip freeze

Note that top Deep Learning Libraries including Keras, Tensorflow, and Theano already exist and ready to use.

All without installing a single library again and again.

Run Codes and Visualizations

Run them as usual in python notebook. Import the libraries and use it right away. You do not need to add %matplotlib inline to display the visualization in Colab.

These have been configured within Colab Runtime.

Visualization from Colab Runtime

Shortcuts

The shortcuts are similar to Jupyter Notebook with Additional CTRL/CMD + M As the prefix to Jupyter Notebook shortcuts.

Looking at shortcut lists CTRL/Command + M + H

Creating and running new cells

  • CTRL/CMD M + A → Creating cell above selection

  • CTRL/CMD M + B → Creating cell below selection

  • CTLR/CMD M +M → Toggle to Markdown mode

  • CTRL/CMD M + Y → Toggle to code edit mode.

  • CTRL/CMD M + D→ Delete cells

There are also similar shortcuts for running the cells

  • SHIFT + ENTER → Run and move to cell below/add if does not exist

  • ALT+ ENTER → Run and add cell below

  • CTRL + ENTER → Run and stay at the selected cell

Magics

Colab and IPython both have “magics”, which allow you to mix Python with other languages (such as shell or Dremel SQL).

These are some of the magics which I use frequently to improve my efficiency.

One important caveat: Cell magics (those starting with %%) must be the first non-comment line in a cell. This means you can't import and use a cell magic in a single cell.

%%shell

Shell magic is useful to view the current server environment specs. To improve the current server specs, you need to launch Google Cloud Platform (GCP) Virtual Machine (VM) and redirect the connection to Colab runtime (port forwarding).

%%HTML

Building HTML commands in Colab is handy to run visualizations that are not available in Python libraries. It also allows us to adapt well with web platform in HTML and CSS. In this case we use SVG to draw scalable vector graphs.

%%time

We always need to start our exploration with small samples.

Before scaling it to our production volume, it makes sense to gauge the time on how long the algorithm runs. We will need to optimize then scale it for faster analytics on a large production dataset.

And there are more

Build Interactive Form

Interactive forms enhance presentation/users visualization for your clients. You should use the form to provide means for your clients to interact with data.

To do this, you need to start with the following syntax

----Form starter---

@title {}

----Form Item --- variable = #@param {}

The form starter will render interactive form in Google Colab.

The form items is intuitive. Just by adding #@param with the attribute, you can control your interactions. Google Colab will inject the values to your variables, then run the cell everytime you change its values.

Versatile and easy to build interactive forms

Store/Download/Insert to Github

There are 3 exports which you can make outside of Colab:

  1. Download to .ipynb, .py, or even .html

  2. Store in Google Drive and extract the share links

  3. Directly connect it to your Github repository: This is an easy way to develop your portfolio. It uploads the notebook to your Github repository.

Really handy to develop your data scientist portfolios

Github Link Created

Reference : https://towardsdatascience.com

Last updated