Talk proposal for Scipy 2017: Bringing the Generic Mapping Tools to Python

Thumbnail image for publication.

This is the proposal for talk that I co-authored with Paul Wessel and submitted to Scipy 2017. Fingers crossed that they'll accept it! If not, this post can serve as an example of what not to do for next time :)

The submission is on the GenericMappingTools/scipy2017 Github repository. It covers our initial ideas about the GMT Python interface.

UPDATE (8 May 2017): The proposal was accepted!

UPDATE (11 May 2017): I've posted the reviews we got for the proposal along with some comments and replies.


Abstract

The Generic Mapping Tools (GMT) is an open-source software package widely used in the geosciences to process and visualize time series and gridded data. Maps generated by GMT are ubiquitous in scientific publications in areas such as seismology and oceanography. We present a new GMT Python wrapper library built by the GMT team. We will show the design plans, internal implementations, and demonstrate an initial prototype of the library. Our wrapper connects to the GMT C API using ctypes and allows input and output using data from numpy ndarrays and xarray Datasets. The library is still in early stages of design and implementation and we are eager for contributions and feedback from the Scipy community.

Extended Abstract

The Generic Mapping Tools (GMT) is an open-source software package widely used in the geosciences to process and visualize time series and gridded data. GMT is a command-line tool written in C that is able to generate high quality figures and maps using the Postscript format. Maps generated by GMT are ubiquitous in scientific publications in areas such as seismology and oceanography. GMT has a large, feature rich, mature, and well tested code base. It has benefited from over 28 years of development and heavy usage within the scientific community. It is no wonder that there have been at least three attempts to bridge the gap between GMT and Python: gmtpy, pygmt, and PyGMT. Of the three, only gmtpy has had any development activity since 2014 and is the only project that has documentation. gmtpy interfaces with GMT through subprocesses. It pipes standard input and output to the GMT command-line application. Piping has it's limitations because all data are handled as text, making it difficult or impossible to pass binary data such as netCDF grids. On the Python side, the two main libraries for plotting data on maps are the matplotlib basemap toolkit and Cartopy. Both libraries rely on matplotlib as the backend for generating figures. Basemap is know to have its limitations (e.g., this post by Filipe Fernandes). Cartopy is a great improvement over basemap that fixes some of those limitations. However, Cartopy is still bound by the speed and memory usage constraints of matplotlib when it comes to very large and complex maps.

We present a new GMT Python wrapper library gmt-python built by the GMT team. We will show the design plans, internal implementations, and demonstrate an initial prototype of the library. Starting in version 5, GMT introduced a C API that is exposed through a shared library. The API allows access to GMT modules as C functions and provides mechanisms for input and output of binary data through shared memory. Our Python wrapper connects to this shared library using the ctypes standard library module. The wrapper code can thus be written in pure Python, which greatly simplifies packaging and distribution. Input and output will be handled using the GMT C API "virtual files" that allow access to shared memory between C and Python. We will implement a thin conversion layer between native scientific Python data types and the GMT internal data structures. Thus, we can accept input and produce output as numpy ndarrays and pandas DataFrames for tabular data and xarray Datasets for netCDF grids. Support for displaying figures inline in the Jupyter notebook is planned from the start by retrieving PNG previews of the Postscript figures. This will allow GMT to be seamlessly integrated into the rich scientific Python ecosystem.

Internally, the gmt-python library will contain low-level wrappers around the GMT C API functions and data structures. Users will interact with a higher-level API in which each GMT module is represented by a function. The module functions can take arguments as a single string representing the command-line arguments ("-Rg -JN180"), as keyword arguments (R='g', J='N180'), or as long-form aliases (region='g', projection='N180'). The Python interface relies on new features in GMT that are currently under development in the trunk of the SVN repository, mainly the "modern" execution mode, which greatly simplify the building of Postscript figures and maps. We are working in close collaboration with the rest of the GMT core developers to make changes on the GMT side as we exercise the new C API and discover bugs and missing features.

The capabilities of GMT go beyond the geosciences. It can produce high-quality line plots, bar graphs, histograms, and 3D surfaces. Significant barriers to entry for GMT have been the complexities of programming in bash and the many command-line options and their meanings. The Python wrapper library can serve as a backend for new and easier to use APIs, making GMT more accessible while retaining the high quality of figures. Work on gmt-python is still in early stages of design and implementation. A prototype is not yet available but is predicted in time for the conference in July. We are open and eager for contributions and feedback from the Scipy community.