Collaborative Python Coding with Google Colab

 

 

Google Colab provides a hosted Jupyter Notebook service which makes collaborative coding with Python much easier. Here is a short introduction on how to use it with Google Drive and GitHub. A guest post by Francesco Ottaviani from Università degli Studi di Urbino “Carlo Bo”.

Jupyter Notebook is a simplified notebook authoring application, part of Project Jupyter, a large project centered around the goal of providing tools for interactive computing with computational notebooks. A computational notebook is a shareable document that combines computer code, plain language descriptions, data, 3D model visualization, charts, graphs, figures, and interactive controls. A notebook, along with an editor like Jupyter Notebook, provides a fast interactive environment for prototyping and explaining code, exploring, visualizing data, and sharing ideas with others.

Google Colab offer the possibility to run Python code in a ready environment without install software in the personal computer and provide collaborative coding. It’s also possible use Github with Google Colab using Google Drive. We will discuss about Github in this post, a complete guide is available here. Colab it’s a useful tool for all kind of scientists.

Often can occur the situation that we need to access a data file or file containing a function. Google Colab allow to upload files, but they will be deleted when the session will be close, users typically use to upload file in a Google Drive and they access the data from Colab to avoid this problem. In this post we will explore how to link Google Colab with Google Drive to read data file and file containing functions stored in a Google Drive folder and how to use Github with Google Colab.

Load a data file

As an example, using the file provided with the book Python Recipes for Earth Sciences (Trauth, 2022), if the data file that we want to load is agedepth_1.txt from Google Drive, we need at first to connect the two platform, it can be done using the following code. Note that the code automatically connects to the user’s Drive, so you need to use the same account for both the services.

from google.colab import drive
drive.mount('/content/gdrive')

When you connect the two platforms a new folder will appear in Google Drive called Colab Notebooks, you can upload agedepth_1.txt file here. It’s not mandatory use this folder, it’s possible to link any folder. Once that the data file is uploaded in the folder, we to load the file in Google Colab using the following code.

agedepth = np.loadtxt('/content/gdrive/
MyDrive/Colab Notebooks/agedepth_1.txt')

Now we can access the data contained in the file and the file is stored in a permanent manner.

Load a file containing function

In this part we will read the function canc() from the file canc.py. This function is an adaptive filter for 20 iterations. The output variables from canc() are the filtered primary signal z, the extracted noise e, the mean squared error mer for the number of iterations it performed with step size u, and the filter weights w for each data point in yn1 and yn2. This function is used in the Chapter 6.9 of the PRES book.

First, we upload the file in the Google Drive folder named Colab Notebook. Now we can access the folder from Google Colab using the following code.

%reset -f
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)
libdir = "/content/gdrive/MyDrive/Colab\Notebooks/"

Then we can read the canc.py file.

%run {libdir}canc.py

without use

from canc import canc

and lastly we can call the function written in the canc.py file using directly the function canc().

Use Github with Google Colab and Google Drive

Here we will discuss on how to use Github in Google Colab. Github is a platform that allow users to collaborate at the same code and take trace of code variations without compromise the integrity of the original project. It’s possible to use Github in Google Colab using Google Drive to store user’s data. To do this, first we need a new blank page in Colab so on the website click on New Notebook button in the menu. Now we need to mount Google Drive.

from google.colab import drive
drive.mount('/content/gdrive')

Once Google Drive is mounted it’s possible to navigate through the directories. We create a Github folder in the Drive and we access from Colab using the following command.

%cd /content/drive/MyDrive/Github

Now we need to generate a GitHub token to access the GitHub API, to do that, we must enter the GitHub website and log our account, in the panel Settings, Developer settings clicking on Personal access tokens is possible to generate a new token, now click on the Generate new token button on the right corner on top of the page. Now select repo in the Select scopes section and click on Generate token button and the page will generate the token.

Create a new git repository

Before to start its necessary create a new repository in our account in GitHub website, call the repository “try”. We need to initialize git in Google Colab, to do this, in the Github directory, use the following command to initialize a repository called “try”.

!git init try

Then move to that directory and list the files and folder.

%cd try
%ls -a

You should be able to see .git/.

Local commits

Every time that you modify files in the Google Drive folder those changes are registered by GitHub and its possible trace them and save the new version of the file. To check the changes in the folder we need to use the command:

!git status

To add the new changes to GitHub we use

!git add .

where the point “.” means all the new changes, if you want to add only one file you must write here the file name. Now you can commit the work using the following command, in the <message> place you can write a short sentence that explain the changes.

!git commit -m “<message>”
Upload on GitHub

To be able to commit on GitHub from Google Colab we need to create three new variables:

    • username – your GitHub username;
    • repository – the created repository;
    • git_token – the token generated before.

Those variables are needed in the command to add the remote folder in GitHub, here called “try”.

!git remote add <remote folder>
https://{git_token}@github.com/
     {username}/{repository}.git

To commit on Github you can use this command, in the <remote-name> place you can use “try”, in this example and instead of <branch-name>  a name for the new branch.

!git push -u <remote folder> <branch-name>

If you want to clone a folder from GitHub you can use the code below, as a “git_token” you must use your personal token.

!git clone
https://{git_token}@github.com/
     {username}/{repository}

Now you can access the folder writing

%cd <repository>

and commit the changes as it showed before.

In summary, the post expounds upon the seamless integration of Google Colab, Google Drive, and GitHub to establish an environment conducive to efficient Python programming, sans local software installation. This workflow ensures project shareability and safety, fostering collaboration and version control.

References

Trauth, M.H. (2022) Python Recipes for Earth Sciences – First Edition. Springer International Publishing, 403 p., Supplementary Electronic Material, Hardcover, ISBN 978-3-031-07718-0. (PRES)