Google Summer of Code '18 with GraphSpace, NRNB

Friday, July 13, 2018

Writing Documentation on Read the Docs - Part 2

In the previous blog - Writing Documentation on Read the Docs - Part 1 we installed sphinx and configured it to use Markdown and sphinx-rtd-theme. In this blog, we will find out how to write docs and add contents to it.

Writing the docs

Make sure you have setup your project to use Markdown and have configured conf.py to use sphinx-rtd-theme, if not then please have a look at Part 1 of this blog.

We will be using Atom IDE to write our documentation, though you can use any other Text Editor like Notepad, or Sublime Text.

Create a new page and add it to your project :

Add a new file in your docs/ directory and give it an extension *.md , in the example shown below, I have created a file named Writing-Documentation.md .

Add new file in docs/ folder and give it extension *.md

To make this page appear in the Contents section of your project, add an entry in the index.rst as shown in the figure above. Note that the name should be exactly the same as your filename without the extension '.md' . Take note of type-case sensitivity.

Basic Formatting

You can use basic formatting options like add Headers, Lists, add images and Links in Markdown.
Github has a list of common formatting options available in their Markdown-cheatsheet.

Markdown Cheatsheet - Img Courtesy guides.github.com

Adding images and gifs

All images and gifs needs to be placed inside the docs/_static directory of your project.
Once the files have been place in the _static directory, we can link the images in the documentation using the following syntax -

![CPU Close-up](_static/cpu-close-up.jpg)

Here I have added a close photograph of a modern Intel CPU.

Adding image in your documentation

Image of a Modern-day CPU

Adding Permalinks to Headers

Headers are automatically numbered by sphinx, this makes it easy to divide pages into sections and sub-sections.

The topmost header (h1) corresponds to # in Markdown, to define sub-sections you can just add more headers using double hashes ## and further sub-sections using ### and so on.

To add permalink to your sub-sections, you need to use the markup for Link and in the parenthesis write the name of the heading in the following way :

So if you've a header 'Introduction to Read the Docs' in your page then to add permalink to it, you need to write

- [Introduction to Read the Docs](#introduction-to-read-the-docs)

NOTE : The text inside the parenthesis must always be in small case and words separated by hypen (-). The text must always begin with # even when adding link to header (h2, h3 or any other).

Building your project as HTML Document

To build your project as HTML Document run the following command from docs/ directory of your project -

$ make html

All HTML files have been generated in the docs/_build/html directory of your project.

You can view your generated documentation by opening index.html in the browser of your choice.

The image on the left shows how your docs/_build/html directory looks like after building your project.

Monday, July 9, 2018

Writing Documentation on Read the Docs - Part 1

Read the Docs is an Open-Source Project which makes hosting Documentation for the Open-Source Community easy and hassle-free. Read the Docs supports formatting documentation in reStructuredText and Markdown. It is possible to use reStructuredText and Markdown together in the same sphinx project.
The User manual for GraphSpace is hosted using Read the Docs, it uses Markdown and a custom theme called sphinx-rtd-theme.
In this tutorial, we will find out how to build docs using Markdown and the sphinx-rtd-theme.

GraphSpace uses the classic sphinx-rtd-theme

Installing Sphinx and other dependencies

Make sure you have the following prerequisites installed in your system -

*pip is already installed if you are using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org

Now, we are ready to setup Sphinx. Sphinx is a tool that makes it easy to create beautiful documentation. Assuming you have Python already, install Sphinx:

$ pip install sphinx sphinx-autobuild

Install recommonmark to use Markdown with Sphinx.

$ pip install recommonmark

Install sphinx-rtd-theme, we will be adding this theme later in the configuration file.

$ pip install sphinx-rtd-theme

Create a directory inside your project to hold your docs:

$ cd /path/to/project
$ mkdir docs

Next, navigate to the docs directory just created and run sphinx-quickstart :

$ cd docs
$ sphinx-quickstart

Configuring sphinx

The quick start will walk you through the basic configurations, you can accept the default values or change any specific config you want. When it is done, a default project structure will be created along with these 2 files - index.rst and conf.py.

In order enable Markdown support in our local build, we need to add the following lines in conf.py
and comment any source_suffix in the config file -

from recommonmark.parser import CommonMarkParser

source_parsers = {
    '.md': CommonMarkParser,
}

source_suffix = ['.rst', '.md']

Next we need to add the sphinx-rtd-theme we installed earlier to our project, to do this add the following line in the conf.py file -

import sphinx_rtd_theme

html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

Now your project has been successfully setup to use Markdown and sphinx-rtd-theme.

Check part 2 of this blog to find out how to add content and images to your project.

Sunday, June 17, 2018

Alembic migrations in Django with SQLAlchemy Database Toolkit

In order to understand Alembic migrations in Django, we first need to understand the need of SQLAlchemy when Django already provides its own ORM (If you only want to read about migrations using Alembic then please skip this section).

Why use SQLAlchemy instead of Django's ORM?

Django ORM uses what is called Active Record implementation, what it means is that each record in the database is directly mapped to an python object in the code and vice-versa. The Django ORM allows developers to quickly deploy business-level data relationships, basic CRUD (Create, Read, Update and Delete) functions are very easy to implement using Django ORM.
SQLAlchemy on the other uses Data Mapper implementation and allows writing complex queries within the application. SQLAlchemy makes it possible to write complex join queries between multiple tables based on foreign key relationships, data filters can also be applied on the tables involved in the joins. Now, it is possible to write such complex queries in Django ORM as well, but it requires some degree coding to manipulate the python objects to get the desired result (in this case complex joins).
The project GraphSpace (under NRNB - National Resource for Network Biology) which I'm working on this summer as a GSOC '18 Student makes use of SQLAlchemy to carry out the all the CRUD operations as well as many complex joins & filtering.

What is Alembic?

Alembic is a lightweight database migration tool for usage with the SQLAlchemy Database Toolkit for Python. - Official, description.

If you are familiar with Django ORM then you might have noticed the Migrations directory which maintains a list of data model changes made over the entire development period. These migration files are used internally by Django to apply data model changes in the database. Alembic is the migration tool for SQLAlchemy, similar to Django migrations it also maintains a list of data model changes. Alembic also supports upgrading and downgrading between different states of data model using the migration files.

Creating a Alembic script

[I have assumed that the Alembic environment has been set up and that you have switched into the virtual environment you might be using for your project].
To create a new alembic script we use the alembic revision

$ alembic revision -m "create user table"
Generating /path/to/yourproject/alembic/versions/840db85c5bce_create_user_table.py...done;

The new file 840db85c5bce_create_user_table.py

"""create_user_table

Revision ID: 840db85c5bce
Revises: 
Create Date: 2018-06-23 02:27:29.434000

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = '840db85c5bce'
down_revision = 'f8f6ba9712df'
branch_labels = None
depends_on = None


def upgrade():
    pass
 
def downgrade():
    pass

We will update the upgrade() and downgrade() functions to match the set of changes in our database.

CRUD changes in migration script

Lets say we have defined a table named app_version with the following columns :
Table app_version{
id TYPE Integer, Primary Key,
name TYPE String, Constraints[Not-Null, Unique],
version_id TYPE Integer, Constraints[Not-Null], FOREIGN KEY references app.id,
owner_email TYPE String, Constraints[Not-Null], FOREIGN KEY references user.email,
json_data TYPE String, Constraints[Not-Null],
description TYPE String
}

There are two foreign key references

version_id references id column of the app table
owner_email references email column of the user table

The upgrade() and downgrade() function for the above table will look as follows -

upgrade():
    op.create_table(
  'app_version',
  sa.Column('id', sa.Integer, primary_key=True),
  sa.Column('name', sa.String, nullable=False, unique=True),
  sa.Column('app_id', sa.Integer, nullable=False),
  sa.Column('owner_email', sa.String, nullable=False),
  sa.Column('json_data', sa.String, nullable=False),  
  sa.Column('description', sa.String, nullable=True),
 )
op.add_column('app_version', sa.Column('created_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))
op.add_column('app_version', sa.Column('updated_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))

# Create New Index
op.create_index('app_version_idx_name', 'app_version', ['name'], unique=True)

# Add new foreign key reference
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_id_fkey 
            FOREIGN KEY (app_id) REFERENCES "app" (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_owner_email_fkey 
            FOREIGN KEY (owner_email) REFERENCES "user" (email) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')

downgrade():
    op.drop_table('app_version')

Now, let us breakdown the code to understand what is actually happening

Adding the created_at and updated_at columns

op.add_column('app_version', sa.Column('created_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))
op.add_column('app_version', sa.Column('updated_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))

The line above creates the created_at and updated_at columns (which are created by default when using Django ORM) We have created an Index on the name column of the table, as follows :

Creating Index

op.create_index('app_version_idx_name', 'app_version', ['name'], unique=True)

If we need to create an index over multiple column then we can just append the name of the column in the 3rd parameter like so - ['name', 'id'] will create an Index over columns Name and Id.

Creating Foreign Key References and other SQL operations

For more complex operations like adding foreign keys with one or many constraints we can use the op.execute() function and pass a custom SQL statement. In our case it looks like -

op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_id_fkey 
            FOREIGN KEY (app_id) REFERENCES "app" (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_owner_email_fkey 
            FOREIGN KEY (owner_email) REFERENCES "user" (email) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')

Here, we have added foreign key references with the following constraints - ondelete = CASCADE & onupdate = CASCADE By writing custom SQL statements to the op.execute() function we can execute complex operations on our table.

Wednesday, June 13, 2018

GSoC '18 with National Resource for Network Biology : GraphSpace

This year 1264 students from 62 countries around the world were selected to participate in Google Summer of Code. My proposal for GraphSpace under National Resource for Network Biology (NRNB) was accepted on April 23, 2018 and I joined the league of selected few students of GSOC.

This summer, I will be implementing GIT for Graphs for GraphSpace - an in-application version control and sharing feature under guidance of my mentors, Aditya Bharadwaj & Jing Cui.

The coding period started on May 14th, 2018 and I've made good progress so far. My 1st milestone was to implement the 'Fork' feature in GraphSpace. Fork allows users to create a copy of a public or shared graph for themselves. I had presented the 'Fork' feature for the first evaluation, and I received good feedback from my mentors & NRNB. Currently, I'm working on the Version Control feature - and will update my blog again before the 2nd Milestone.