Google Colab for Data Analytics
The Ultimate Cloud Based Interactive Notebook for Data Scientists
Last updated
The Ultimate Cloud Based Interactive Notebook for Data Scientists
Last updated
I have too much Python installation in my laptop! I am confused.
It takes too much time to set up Tensorflow!
Sending Jupyter Notebook versions are time waster!
Have you ever wished you have a ready launched environments where you directly kickstart your analysis without installing libraries and dependencies?
In this tutorial I would like to share with you about the amazing products that I regularly use in my day job — Google Colab.
Google Colab (Colaboratory) is a data analysis tool which combines code, output, and descriptive text into one document — source
Essentially, you are provided with TPU and GPU processing which are already optimized for data analysis pipeline in the cloud.
This allows you to directly run Deep Learning using popular libraries such as Theano, Tensorflow, and Keras. Similarly, you can generate graphics and share analysis outputs just like docs and sheets.
Interested? Let’s get started!!
Google Colab is Jupyter Notebook + Cloud + Google Drive
Colaboratory is a data analysis tool which combines code, output and descriptive text into one document (interactive notebook).
Colab provides GPU and is totally free. By using Google Colab, you can:
Build your analytics products quickly in a standardized environments.
Facilitates popular DL libraries on the go such as PyTorch, and TensorFlow
Share code & results within your Google Drive
Save copies and create playground modes for knowledge sharing
Colab is runnable on the cloud or on local server with Jupyter
The free version of colab are not guaranteed and limited; the usage limits fluctuate frequently which is necessary for Colab to provide these resources for free.
The good news is that, you can subscribe to Colab Pro or run a dedicated server with Google Cloud Product. Both options are easy and deployable just by within 20 minutes.
Feel free to use this Colab file if you want to follow along with the following tutorial. Run it and discover how Colab simplify your job as data analyst/scientist.
With Colab you could code a shareable Python Interactive Notebook in Google Drive. In this tutorial, we are going to conduct 6 simple practices to get our hands dirty with Google Colab.
Finding Google Colab
Open your google drive
Right click and open Google Colab
Rename your notebook
Done! Enjoy your notebook.
Thankfully, Colab has already provided numerous python libraries in each runtime.
Occasionally, you might encounter needs for additional libraries. In this case you should download them with pip or apt package manager bash commands in Colab.
All you need to do is to start bash commands with !
This tells Colab that this notebook cell is not a python code but a command line script.
To run bash commands, you should add !
precedes the code line. For example !pip freeze
or !pip install <DL Library>
the free version of Google Colaboratory runtime is not persistent as it jumps for each terminated session. This means you need to reinstall libraries every time you connect to your Colab Notebook.
To get what libraries are already available you can use !pip freeze
Note that top Deep Learning Libraries including Keras, Tensorflow, and Theano already exist and ready to use.
All without installing a single library again and again.
Run them as usual in python notebook. Import the libraries and use it right away. You do not need to add %matplotlib inline
to display the visualization in Colab.
These have been configured within Colab Runtime.
Visualization from Colab Runtime
The shortcuts are similar to Jupyter Notebook with Additional CTRL/CMD + M
As the prefix to Jupyter Notebook shortcuts.
Looking at shortcut lists CTRL/Command + M + H
Creating and running new cells
CTRL/CMD M + A → Creating cell above selection
CTRL/CMD M + B → Creating cell below selection
CTLR/CMD M +M → Toggle to Markdown mode
CTRL/CMD M + Y → Toggle to code edit mode.
CTRL/CMD M + D→ Delete cells
There are also similar shortcuts for running the cells
SHIFT + ENTER → Run and move to cell below/add if does not exist
ALT+ ENTER → Run and add cell below
CTRL + ENTER → Run and stay at the selected cell
Colab and IPython both have “magics”, which allow you to mix Python with other languages (such as shell or Dremel SQL).
These are some of the magics which I use frequently to improve my efficiency.
One important caveat: Cell magics (those starting with
%%
) must be the first non-comment line in a cell. This means you can't import and use a cell magic in a single cell.
Shell magic is useful to view the current server environment specs. To improve the current server specs, you need to launch Google Cloud Platform (GCP) Virtual Machine (VM) and redirect the connection to Colab runtime (port forwarding).
Building HTML commands in Colab is handy to run visualizations that are not available in Python libraries. It also allows us to adapt well with web platform in HTML and CSS. In this case we use SVG to draw scalable vector graphs.
We always need to start our exploration with small samples.
Before scaling it to our production volume, it makes sense to gauge the time on how long the algorithm runs. We will need to optimize then scale it for faster analytics on a large production dataset.
And there are more
Interactive forms enhance presentation/users visualization for your clients. You should use the form to provide means for your clients to interact with data.
To do this, you need to start with the following syntax
----Form starter---
----Form Item --- variable = #@param {}
The form starter
will render interactive form in Google Colab.
The form items
is intuitive. Just by adding #@param with the attribute, you can control your interactions. Google Colab will inject the values to your variables, then run the cell everytime you change its values.
Versatile and easy to build interactive forms
There are 3 exports which you can make outside of Colab:
Download to .ipynb, .py, or even .html
Store in Google Drive and extract the share links
Directly connect it to your Github repository: This is an easy way to develop your portfolio. It uploads the notebook to your Github repository.
Really handy to develop your data scientist portfolios
Reference : https://towardsdatascience.com