Google Summer of Code '18 with GraphSpace, NRNB: Alembic migrations in Django with SQLAlchemy Database Toolkit

In order to understand Alembic migrations in Django, we first need to understand the need of SQLAlchemy when Django already provides its own ORM (If you only want to read about migrations using Alembic then please skip this section).

Why use SQLAlchemy instead of Django's ORM?

Django ORM uses what is called Active Record implementation, what it means is that each record in the database is directly mapped to an python object in the code and vice-versa. The Django ORM allows developers to quickly deploy business-level data relationships, basic CRUD (Create, Read, Update and Delete) functions are very easy to implement using Django ORM.
SQLAlchemy on the other uses Data Mapper implementation and allows writing complex queries within the application. SQLAlchemy makes it possible to write complex join queries between multiple tables based on foreign key relationships, data filters can also be applied on the tables involved in the joins. Now, it is possible to write such complex queries in Django ORM as well, but it requires some degree coding to manipulate the python objects to get the desired result (in this case complex joins).
The project GraphSpace (under NRNB - National Resource for Network Biology) which I'm working on this summer as a GSOC '18 Student makes use of SQLAlchemy to carry out the all the CRUD operations as well as many complex joins & filtering.

What is Alembic?

Alembic is a lightweight database migration tool for usage with the SQLAlchemy Database Toolkit for Python. - Official, description.

If you are familiar with Django ORM then you might have noticed the Migrations directory which maintains a list of data model changes made over the entire development period. These migration files are used internally by Django to apply data model changes in the database. Alembic is the migration tool for SQLAlchemy, similar to Django migrations it also maintains a list of data model changes. Alembic also supports upgrading and downgrading between different states of data model using the migration files.

Creating a Alembic script

[I have assumed that the Alembic environment has been set up and that you have switched into the virtual environment you might be using for your project].
To create a new alembic script we use the alembic revision

$ alembic revision -m "create user table"
Generating /path/to/yourproject/alembic/versions/840db85c5bce_create_user_table.py...done;

The new file 840db85c5bce_create_user_table.py

"""create_user_table

Revision ID: 840db85c5bce
Revises: 
Create Date: 2018-06-23 02:27:29.434000

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = '840db85c5bce'
down_revision = 'f8f6ba9712df'
branch_labels = None
depends_on = None


def upgrade():
    pass
 
def downgrade():
    pass

We will update the upgrade() and downgrade() functions to match the set of changes in our database.

CRUD changes in migration script

Lets say we have defined a table named app_version with the following columns :
Table app_version{
id TYPE Integer, Primary Key,
name TYPE String, Constraints[Not-Null, Unique],
version_id TYPE Integer, Constraints[Not-Null], FOREIGN KEY references app.id,
owner_email TYPE String, Constraints[Not-Null], FOREIGN KEY references user.email,
json_data TYPE String, Constraints[Not-Null],
description TYPE String
}

There are two foreign key references

version_id references id column of the app table
owner_email references email column of the user table

The upgrade() and downgrade() function for the above table will look as follows -

upgrade():
    op.create_table(
  'app_version',
  sa.Column('id', sa.Integer, primary_key=True),
  sa.Column('name', sa.String, nullable=False, unique=True),
  sa.Column('app_id', sa.Integer, nullable=False),
  sa.Column('owner_email', sa.String, nullable=False),
  sa.Column('json_data', sa.String, nullable=False),  
  sa.Column('description', sa.String, nullable=True),
 )
op.add_column('app_version', sa.Column('created_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))
op.add_column('app_version', sa.Column('updated_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))

# Create New Index
op.create_index('app_version_idx_name', 'app_version', ['name'], unique=True)

# Add new foreign key reference
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_id_fkey 
            FOREIGN KEY (app_id) REFERENCES "app" (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_owner_email_fkey 
            FOREIGN KEY (owner_email) REFERENCES "user" (email) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')

downgrade():
    op.drop_table('app_version')

Now, let us breakdown the code to understand what is actually happening

Adding the created_at and updated_at columns

op.add_column('app_version', sa.Column('created_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))
op.add_column('app_version', sa.Column('updated_at', sa.TIMESTAMP, server_default=sa.func.current_timestamp()))

The line above creates the created_at and updated_at columns (which are created by default when using Django ORM) We have created an Index on the name column of the table, as follows :

Creating Index

op.create_index('app_version_idx_name', 'app_version', ['name'], unique=True)

If we need to create an index over multiple column then we can just append the name of the column in the 3rd parameter like so - ['name', 'id'] will create an Index over columns Name and Id.

Creating Foreign Key References and other SQL operations

For more complex operations like adding foreign keys with one or many constraints we can use the op.execute() function and pass a custom SQL statement. In our case it looks like -

op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_id_fkey 
            FOREIGN KEY (app_id) REFERENCES "app" (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')
op.execute('ALTER TABLE app_version ADD CONSTRAINT app_version_owner_email_fkey 
            FOREIGN KEY (owner_email) REFERENCES "user" (email) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;')

Here, we have added foreign key references with the following constraints - ondelete = CASCADE & onupdate = CASCADE By writing custom SQL statements to the op.execute() function we can execute complex operations on our table.

Google Summer of Code '18 with GraphSpace, NRNB

Sunday, June 17, 2018

Alembic migrations in Django with SQLAlchemy Database Toolkit

Why use SQLAlchemy instead of Django's ORM?

What is Alembic?

Creating a Alembic script

CRUD changes in migration script

Adding the created_at and updated_at columns

Creating Index

Creating Foreign Key References and other SQL operations

No comments:

Post a Comment

Writing Documentation on Read the Docs - Part 2

Search This Blog