Packaging with Python

How to create and share your python package
Club Bioinfo
Author

Laurent Modolo

Published

September 3, 2020

Packaging with Python

cc-by-sa

This practical was made using various contents from:

  • https://python-packaging.readthedocs.io
  • https://setuptools.readthedocs.io
  • https://packaging.python.org/
  • https://github.com/pybind/pybind11
  • https://pybind11.readthedocs.io/
  • https://github.com/pypa/manylinux
  • https://opensource.com/article/19/2/manylinux-python-wheels
  • https://stackoverflow.com

Python pip

pipy

The Python Package Index (PyPI) is a repository of software for the Python programming language.

You can install the python pip manager with the following command.

Some outdated distributions still have python2 as default python, use python3 command

On Ubuntu

apt install python3-pip

On OSX

brew install python3

With docker:

docker run -it python:3.8-alpine sh

installing packages with pip

And then you can install any packages you want on https://pypi.org/

sudo pip3 install setuptools==50.0.2 # system wide install
pip3 install setuptools==50.0.2 --user # user install
pip3 install twine wheel --user

pip is just another python module

python3 -m pip install --user --upgrade pip

Packaging python projects

You can follow the official guide (https://packaging.python.org/tutorials/packaging-projects/)

Creating the package files

This is the recommended basic structure your project should have to easily build a pip package:

your_project/
├── LICENSE
├── README.md
├── example_pkg/
│ └── __init__.py
├── setup.py
└── tests/

But we can adapt it to follow the LBMC guide of good practices

your_project/
├── LICENSE
├── README.md
├── src/
│ └── example_pkg/
│      └── __init__.py
│ └── setup.py
│ └── tests/

All you python code goes in the example_pkg/ folder. The most important file for the packaging is the setup.py file.

Here is a basic setup.py file:

import setuptools

with open("../README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="example-pkg-YOUR-USERNAME-HERE", # Replace with your own username
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
description="A small example package",
long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/sampleproject",
    packages=setuptools.find_packages(),
classifiers=[
········"Programming·Language·::·Python·::·3",
········"License·::·OSI·Approved·::·CEA·CNRS·Inria·Logiciel·Libre·License,·\
version·2.1·(CeCILL-2.1)",
········"Operating·System·::·OS·Independent"
····],
    python_requires='>=3.6',
)
  • name is the distribution name of your package. This can be any name as long as only contains letters, numbers, _ , and -. It also must not already be taken on pypi.org. Be sure to update this with your username, as this ensures you won’t try to upload a package with the same name as one which already exists when you upload the package.
  • version is the package version see PEP 440 for more details on versions.
  • author and author_email are used to identify the author of the package.
  • description is a short, one-sentence summary of the package.
  • long_description is a detailed description of the package. This is shown on the package detail package on the Python Package Index. In this case, the long description is loaded from README.md which is a common pattern.
  • long_description_content_type tells the index what type of markup is used for the long description. In this case, it’s Markdown.
  • url is the URL for the homepage of the project. For many projects, this will just be a link to GitHub, GitLab, Bitbucket, or similar code hosting service.
  • packages is a list of all Python import packages that should be included in the Distribution Package. Instead of listing each package manually, we can use find_packages() to automatically discover all packages and subpackages. In this case, the list of packages will be example_pkg as that’s the only package present.
  • classifiers gives the index and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is licensed under the MIT license, and is OS-independent. You should always include at least which version(s) of Python your package works on, which license your package is available under, and which operating systems your package will work on. For a complete list of classifiers, see https://pypi.org/classifiers/.

Creating distribution archives

Now you just have to run the following command in the same directory as the setup.py file:

python3 setup.py sdist bdist_wheel

It will create files in the dist/ directory

dist/
  example_pkg_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
    example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz

The tar.gz file is a Source Archive whereas the .whl file is a Built Distribution. Newer pip versions preferentially install built distributions, but will fall back to source archives if needed. You should always upload a source archive and provide built archives for the platforms your project is compatible with.

What can you do with those two files ?

Install them:

You can use the .whl or the .tar.gz file to install your package

pip3 install dist/example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz --user

Upload them

You can upload your package to pypi, but first you can run tests on https://test.pypi.org/. As https://pypi.org is an archive, if you upload broken packages, they will stay there.

You first need to create an account https://test.pypi.org/account/register/

Then we use the twine tools that we installed before

twine upload --skip-existing --repository testpypi dist/*

The output should look like that:

Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading example_pkg_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
100%|█████████████████████| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz
100%|█████████████████████| 4.25k/4.25k [00:01<00:00, 3.05kB/s]

To install your package from https://test.pypi.org you can use the following pip options:

pip install --index-url https://test.pypi.org/simple/ --no-deps example-pkg-YOUR-USERNAME-HERE --user

You should be able to open a python console anywhere and run:

>>> import example_pkg

When everything is OK, you can create an account on https://pypi.org and use the twine command without the --repository testpypi option.

Creating executable software

You can also use pip to distribute executable software. To do that, you have to specify the __main__ function to execute when calling your software in the setup.py file.

setuptools.setup(
    ...
        entry_points={
        'console_scripts': ['example_pkg=example_pkg.__main__:main'],
    },
    ...
)

You can have different executable in this list with the format EXECUTABLE_NAME=LIBRARY.FILE:FUNCTION

After the installation, calling example_pkg will run your software if your $PATH is correctly configured.

Adding dependencies to your package

As your project will grow more complex, you will split it into different file for code clarity.

Your __init__.py file will need to contain a list of all the .py files in the example_pkg repository:

#!/usr/bin/env python3
# -*-coding:Utf-8 -*

"""
idr library
"""

name = "midr"
__all__ = ["__main__",
    "idr", "samic", "archimedean", "archimedean_plots",
    "log", "narrowpeak", "raw_matrix", "auxiliary"]

As you don’t want to reinvent the wheel, you may also import other python library (which could be installed with pip). You can specify a list of these libraries in the setup.pyfile:

setuptools.setup(
    ...
        install_requires=[
        'cmake>=3.18'
        'scipy>=1.3',
'numpy>=1.16',
'pynverse>=0.1',
'pandas>=0.25.0',
'mpmath>=1.1.0',
'matplotlib>=3.0.0'
    ],
    ...
)

Don’t forget to specify the version of each dependency to ensure that the function you use are present in the installed library.

If, some packages are required for the installation of your package (for example here cmake), you should also add them to the install_requireslist.

Sometimes you’ll want to use packages that are properly arranged with setuptools, but aren’t published to PyPI. In those cases, you can specify a list of one or more dependency_links URLs where the package can be downloaded, along with some additional hints, and setuptools will find and install the package correctly.

setup(
    ...
    dependency_links=['http://github.com/user/repo/tarball/master#egg=package-1.0']
    ...
)

pybind11 and other unnecessary complications

Sometime, you code is slow and instead of blaming yourself for your poor algorithm, you can blame python. pybind11 allows you to do just that.

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code. Its goals and syntax are similar to the excellent Boost.Python library by David Abrahams: to minimize boilerplate code in traditional extension modules by inferring type information using compile-time introspection.

So great you can now have a lightweight interface to recode some of your function into C/C++. But what about packaging ? setuptools almost only understand python (it can compile simple C/C++ code).

The https://github.com/pybind/cmake_example repository gives an example on how to use cmake within a setup.pyscript.

Ideally we want to:

  • use setuptoolsto build standards pypi packages
  • usecmake to compile complex C/C++ library
  • be able to include loots of C/C++ libraries (because writing C/C++ code is a pain, and some people do it better than ourselves)

Simple C/C++ code

The https://github.com/pybind/cmake_example repository gives an example on how to use cmake within a setup.pyscript.

The idea is to write a CMakeExtension class from the Extensionclass to rewrite the default Extention attributes (we don’t want setuptoolsto try to do it’s own compilation on top of our cmake compilation). And then use the information retrieved by CMakeExtension to run cmake as a subprocess in with a CMakeBuild class.

import os
import re
import sys
import platform
import subprocess

from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
from distutils.version import LooseVersion


class CMakeExtension(Extension):
  def __init__(self, name, sourcedir=''):
          Extension.__init__(self, name, sources=[])
  self.sourcedir = os.path.abspath(sourcedir)


class CMakeBuild(build_ext):
  def run(self):
  try:
              out = subprocess.check_output(['cmake', '--version'])
  except OSError:
  raise RuntimeError("CMake must be installed to build the following extensions: " +
                                 ", ".join(e.name for e in self.extensions))

          if platform.system() == "Windows":
  cmake_version = LooseVersion(re.search(r'version\s*([\d.]+)', out.decode()).group(1))
              if cmake_version < '3.1.0':
  raise RuntimeError("CMake >= 3.1.0 is required on Windows")

          for ext in self.extensions:
  self.build_extension(ext)

      def build_extension(self, ext):
  extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name)))
          # required for auto-detection of auxiliary "native" libs
  if not extdir.endswith(os.path.sep):
  extdir += os.path.sep

          cmake_args = ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=' + extdir,
                        '-DPYTHON_EXECUTABLE=' + sys.executable]

          cfg = 'Debug' if self.debug else 'Release'
          build_args = ['--config', cfg]

          if platform.system() == "Windows":
  cmake_args += ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir)]
  if sys.maxsize > 2**32:
  cmake_args += ['-A', 'x64']
  build_args += ['--', '/m']
  else:
  cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg]
  build_args += ['--', '-j2']

          env = os.environ.copy()
          env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(env.get('CXXFLAGS', ''),
                                                                self.distribution.get_version())
          if not os.path.exists(self.build_temp):
  os.makedirs(self.build_temp)
          subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env)
          subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp)

setup(
    name='cmake_example',
version='0.0.1',
author='Dean Moldovan',
author_email='dean0x7d@gmail.com',
description='A test project using pybind11 and CMake',
long_description='',
ext_modules=[CMakeExtension('cmake_example')],
cmdclass=dict(build_ext=CMakeBuild),
zip_safe=False,
)

You just need to as your CMakeList.txt in your src/folder and your .cpp file in a folder of your choise within the src/ folder (here src/).

cmake_minimum_required(VERSION 2.8.12)
project(cmake_example)

add_subdirectory(pybind11)
pybind11_add_module(cmake_example src/main.cpp)

Finally, you want your main.cpp file to be included in your package (by default only the .py files are going to be included). Therefore, you have to write a MANIFEST.in in your src/:


include README.md LICENSE
global-include CMakeLists.txt *.cmake
recursive-include src *
recursive-include pybind11/include *.h

From now on we will use the example of the midr project (https://gitbio.ens-lyon.fr/LBMC/sbdm/midr)

Built Distribution vs Source Archive

Built Distribution

A Distribution format containing files and metadata that only need to be moved to the correct location on the target system, to be installed. Wheel is such a format, whereas distutil’s Source Distribution is not, in that it requires a build step before it can be installed. This format does not imply that Python files have to be precompiled (Wheel intentionally does not include compiled Python files).

Advantages:

  • Quick to install

Disadvantages:

  • Can be system specific (especially with C/C++ dependencies)

Source Archive

An archive containing the raw source code for a Release, prior to creation of a Source Distribution or Built Distribution.

Advantages:

  • Easily build on any systems

Disadvantages:

  • You have to compile everything with each installation

Manylinux project

Linux comes in many variants and flavors, such as Debian, CentOS, Fedora, and Pacman. Each of these may use slight variations in shared libraries, such as libncurses, and core C libraries, such as glibc.

If you’re writing a C/C++ extension, then this could create a problem. A source file written in C and compiled on Ubuntu Linux isn’t guaranteed to be executable on a CentOS machine or an Arch Linux distribution. Do you need to build a separate wheel for each and every Linux variant?

The goal of the manylinux project is to provide a convenient way to distribute binary Python extensions as wheels on Linux. This effort has produced PEP 513 which is further enhanced by PEP 571 defining manylinux2010_x86_64 and manylinux2010_i686 platform tags.

PEP 513 defined manylinux1_x86_64 and manylinux1_i686 platform tags and the wheels were built on Centos5. Centos5 reached End of Life (EOL) on March 31st, 2017 and thus PEP 571 was proposed.

Which mean that instead of having Built distribution file like that midr-1.3.9-cp38-cp38-linux_x86_64.whl which won’t be accepted by pypi, we will get at midr-1.3.9-cp36-cp36m-manylinux1_x86_64.whl file.

For this we will build the package within a manylinux container (hosted on quay.io)

docker run -it --volume $(pwd):/root/ quay.io/pypa/manylinux1_x86_64
cd /root/

The image has different version of python installed in /opt/python/

cd /root/
/opt/python/cp38-cp38/bin/pip3 install cmake
PATH=$PATH:/opt/_internal/cpython-3.8.5/bin/
/opt/python/cp36-36mu/pip wheel ./ -w output

will produce a binary wheel in /output. However, this will still not be a manylinux wheel, since it is possible to build wheels that accidentally depend on other libraries.

The auditwheel tool will take that wheel, audit it, and copy it to a manylinux name:

auditwheel repair output/midr*whl -w output
INFO:auditwheel.main_repair:Repairing midr-1.3.9-cp38-cp38-linux_x86_64.whl
INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64
INFO:auditwheel.wheeltools:New filename tags: manylinux1_x86_64
INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp38-cp38-linux_x86_64
INFO:auditwheel.wheeltools:New WHEEL info tags: cp38-cp38-manylinux1_x86_64
INFO:auditwheel.main_repair:
Fixed-up wheel written to /root/output/midr-1.3.9-cp38-cp38-manylinux1_x86_64.whl

Then we can exit the container and fix the rights of the output folder (maybe use singularity next time ?):

sudo chown -R $USER:$USER output
mv output/* dist/
twine upload dist/midr-1.3.9-cp38-cp38-manylinux1_x86_64.whl --skip-existing --verbose

GL & HF