2022-12-08

Level up your Python tooling - black, isort and other tools

Formatting and static analysis of python code and it's tooling. The lazy man’s approach to assuring Python code quality.

Based on this article I also did a presentation - Python (anti) Patterns 002 - Black, isort, bandit

Pipelines

What are they and why do we need them

Automating stuff? Pipelines to the rescue

When we want to care about our Python code quality, we usually want to care about things like formatting, consistent import patterns, security and keeping our standars up to date. Ifwe want to do that in our repo/in the cloud automatically, we can use pipelines.

Pipelines are simply a set of steps that constitute our CI/CD process.

It’s more or less just a piece of code that does some steps for us. Usually pipelines are defined as a yaml file that definies what steps/actions we want to take as a part of our CI/CD process, meaning analysing, checking for quality, formatting of our code and building/deploying it.

In this presentation I’dlike to Focus on the steps related to automation of the quality assurance process of developing Python apps.

Most commonly known tools for this in the cloud include: GitHub Actions, GitLab CI/CD, Bitbucket Pipelines, CircleCI, Azure DevOps.

Usually these are the things that fire up when we for example create a merge/pull request, push some code to the repo, merge one branch into the other. They trigger various checks, builds, tests and what not.

The flow is like so:

Trigger is received (eg. Branch is pushed to the repo) -> pipeline is fired -> various checks are made -> based on that pipeline can fail or succeed

Other than pipelines being there up in the cloud, I consider some parts of them an integral part of local development too. Mostly the parts related to the stuff about quality control.

What makes a good code?

Nowadays the trend in Python is to take care about certain things that while not crucial, over time contribute to the project’s quality, readability and maintanability.

On a high level, in my book, any piece of Python code can use some of:

Consistent formatting
Ordered imports that are split by sections
Absolute imports
Usage of modern standards that are compliant to latest standards
Lack of unused imports and variables
Security/vulnerability scans

Further on we will talk how to handle this in Python.

Black

Few words on formatting and black

More often than not in projects that are not so automated and could use some of dem good tooling, you can find people in the pull requests arguing which formatting is better. How to change the formatting? Which one is better? Which one is more pep8 compliant?

It can be a nightmare that is as counter productive as it gets.

To get us rid of such problems and have it handled for us we use black in Python. Black is a code formatter that, well, just formats the code for you. You can make black automatically format your code before you commit. This way you can prevent any kind of arguments about pep8 and code formatting preferences of reviewers/authors, making the whole project have consistent formatting pattern, making it easier to read and so on. The easier code is to read, the better. It’s the lazy man approach. If you know what to expect, you won’t be surprised. The less you have to take care of, the better.

def is_unique(
               s
               ):
    s = list(s
                )
    s.sort()


    for i in range(len(s) - 1):
        if s[i] == s[i + 1]:
            return 0
    else:
        return 1


if __name__ == "__main__":
    print(
          is_unique(input())
         )

Gets turned into this:

def is_unique(s):
    s = list(s)
    s.sort()

    for i in range(len(s) - 1):
        if s[i] == s[i + 1]:
            return 0
    else:
        return 1


if __name__ == "__main__":
    print(is_unique(input()))

Example from geeksforgeeks.org.

' vs "

One thing worth noting is the fact that Python as a Language allows for the usage of both ' and " to mark strings. Black by default prefers double quotes over single. Why? Readability, usage of single quote in English language and the need to escape it everytime we use it inside our strings, it’s harder to mistake with sign.

So on so forth. One may argue here, I stand united with the double quote crowd as IMO it’s the better approach. Readability is king.

Isort

Have ya heard about imports sorting? It makes sense

Why you should sort your imports properly

The bigger the project we work on, usually the more stuff we import from other pieces of the code.

As time goes by, these imports can become messy. It’s often the case. Isort is something that helps us with that by optimising our imports, sorting them properly, alphabetically, grouping them in sections and so on. I know this can look like a minor thing, but it’s these minor things that overall add to general code quality. Now look at the images below, the left one is before isort, right one is after it. Which one is more readable to you?

from my_lib import Object

import os

from my_lib import Object3

from my_lib import Object2

import sys

from third_party import lib15, lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8, lib9, lib10, lib11, lib12, lib13, lib14

import sys

from __future__ import absolute_import

from third_party import lib3

print("Hey")
print("yo")

Gets turned into:

from __future__ import absolute_import

import os
import sys

from third_party import (lib1, lib2, lib3, lib4, lib5, lib6, lib7, lib8,
                         lib9, lib10, lib11, lib12, lib13, lib14, lib15)

from my_lib import Object, Object2, Object3

print("Hey")
print("yo")

Absolufy-imports

The new standard is to have absolute imports. Why that is you can read on your own. There were multiple debates regarding that, the result of which is: when you can prefer absolute imports. They make for less ambiguity and provide clearer distinction of what we are really using, from which package.

We also have a tool for that which is absolufy-imports. This tool is especially usefull when dealing with older projects where you might need to fix the imports in a lot of files to fit the new convention. This tool does that for you.

This:

from .notifications.some_important_file import SomeClass
from .another_important_file import AnotherClass

Gets turned into this:

from em.jobs.notifications.some_important_file import SomeClass
from em.jobs.notifications.another\_important_file import AnotherClass

Bandit

Static analysis of our code for potential security threads.

Why sometimes you need a bandit in your life

When we write our code we should have security in mind. Unless you sometimes want to make your company vulnerable to potentially losing millions. I’m going overboard with this example, but still. Security is important.

Somehow we can make mistakes simple because of forgetfulness and negligence that could have been prevented otherwise. To remind us of this there are various tool that you can use.

Among them is bandit. Bandit is a static analysis tool that scans your code for potentially unsafe fragments of code and warns you about them. When you run bandit against your code you’ll get a report like this and a list of where in code the potential problems are.

Code scanned: 
Total lines of code: 52868 Total lines skipped (#nosec): 0 
Run metrics: 
  Total issues (by severity): 
    Undefined: 0 
    Low: 105 
    Medium: 38 
    High: 7 

Total issues (by confidence): 
  Undefined: 0 
  Low: 15 
  Medium: 18 
  High: 117

autoflake

The less you have…

Reducing waste

Sometimes it so happens that we may have unused import statements in our code or unused variables. Happens to the best. In order to automatically take care of those we may want to include autoflake in our projects.

It’s a tool that simply takes care of that – removing unused imports and variables.

No magic here.

pyupgrade

This piece of software automatically upgrades some old syntax patterns to newer ones. That’s it.

-set(())
+set()
-set([])
+set()
-set((1,))
+{1}
-set((1, 2))
+{1, 2}
-set([1, 2])
+{1, 2}
-set(x for x in y)
+{x for x in y}
-set([x for x in y])
+{x for x in y}
-dict((a, b) for a, b in y)
+{a: b for a, b in y}
-dict([(a, b) for a, b in y])
+{a: b for a, b in y}
-'{0} {1}'.format(1, 2)
+'{} {}'.format(1, 2)
-'{0}' '{1}'.format(1, 2)
+'{}' '{}'.format(1, 2)

Examples can be found above.

bumpversion

There’s this thing we call semantic versioning or semver. It’s a convention that tells us to version our code according to the following pattern: MAJOR.MINOR.PATCH

For example: v0.2.12

Major piece is incremented when we go for major rollouts that change A LOT.

Minor piece is incremented when we do normal releases eg with bigger features.

Patch is something we use for smaller features, patches, fixes etc. This one grows the fastest.

In order for us to not have to manage it manually, we have a tool called bumpversion. It updates the version, creates a commit with the changes, creates a git tag and so on, all automatically. It’s a neat little piece of tooling to have in you CI/CD.

This makes it easier to manage versions, create changelogs, filter commits and spot changes, bugs, versioning of your packages/api etc.

You can see example commit message history and bumpversion usage here, in my project's commit history - braindead

Do we run all of these by hand?

No. We want to be lazy.

Git hooks & pre-commit

Automate boring tasks.

Git hooks and pre-commit

If you want to make all of this happen automatically, you can create git hooks that are fired eg. When you commit or before the commit. One way is to just create .pre-commit file and put it in your .git folder and leverage eg. Makefile or use something like pre-commit tool.

It’s a nice handy tool that handles this for you. You need to install it and create config for it to tell it which things to do before the commit. No magic here.

I’ll let you google the details yourself☺

Example Makefile

Below you can find an example of a bit outdated Makefile that I use.

PATH  := $(PATH)
SHELL := /bin/bash

flake:
    flake8 -v ./

isort:
    isort --check-only --diff ./

isort-inplace:
    isort ./

bandit:
    bandit -x './styles/*' -r ./

black:
    black --check --line-length 120 --exclude "/(\.eggs|\.git|\.hg|\.mypy _cache|\.nox|\.tox|\.venv|_build|buck- out|build|dist|migrations|node_modules)/" ./

linters:
    make flake
    make isort
    make bandit
    make black

bumpversion:
    bumpversion --message '[skip ci] Bump version: {current_version} → {new_version}' --list --verbose $(part)

black-inplace:
    black --line-length 120 --exclude "/(\.eggs|\.git|\.hg|\.mypy _cache|\.nox|\.tox|\.venv|_build|buck- out|build|dist|migrations|node_modules)/" ./

autoflake-inplace:
    autoflake --remove-all-unused-imports --in-place --remove-unused-variables -r --exclude "styles" ./

format-inplace:
    make black-inplace
    make autoflake-inplace
    make isort-inplace

Summary

Black, isort, absolufy-imports, pyupgrade, autoflake, bandit, bumpversion are tools that will make your life a bit easier.

Maybe it's a good idea to include them in your local development flow and pipelines?