browsing bytechnicalposts

This paper is now published in Environment & Planning B: The cityseer Python package for pedestrian-scale network-based urban analysis.

cityseer-api emerged somewhat unintentionally. In 2013, obsessed with complex systems, I checked out of a decade in architecture and set-off for London where I enrolled in the MRes in Advanced Spatial Analysis and Visualisation, the precursor to the current line of course offerings at the Bartlett Centre for Advanced Spatial Analysis (CASA). My MRes dissertation (2014): A Computational Implementation of Jane Jacob’s Generators of Diversity, was a first stab at using Python to calculate network centralities and mixed-uses en-masse for towns and cities in the UK. With the MRes completed, I went about working for CASA where I further embraced Python while creating an asynchronous geometry engine linking PostGIS (Postgres) databases to an interactive Unity3D interface (2014-2016). This later overlapped with a stint working for Woods Bagot’s Superspace team (2016-2017), where we developed workflows consisting of a combination of Python, PostGIS, and web dashboards for urban analysis. In essence, pulling in various sources of open data, storing these in a PostGIS database where we could run varied forms of analysis, then figuring out how to display this information to designers and clients.

By this point, I’d developed a deep affinity for Python and PostGIS and their tremendous versatility in the realm of all things geospatial and data-sciency. Other than the adoption of PostGIS, my toolset was the same as that used initially for the MRes. My “stack” constituted an off-the-shelf network analysis package in the form of the performant graph-tool combined with numba JIT compilation for when I needed to implement customised loops by-hand while retaining some measure of speed. This approach generally worked as intended, and I was able to muddle along until my PhD (2016-2021, also at CASA) when it became increasingly apparent that all was not what it seemed… Mixed-use methods that I had favoured at the time didn’t behave in a mathematically consistent manner on windowed graphs and therefore couldn’t be reliably compared from location to location. Likewise, the more I looked into centralities, the more it sunk in that conventional formulations of closeness centralities also behaved counter-intuitively on windowed graphs. This added to a long-standing list of network analysis related questions, all of which seemed to have no clear-cut answers. Which centrality measures to use? Whether to use the primal or dual network? Shortest vs simplest-path heuristics? And, at which distance thresholds? This was further compounded by a list of issues relating to algorithmic implementations of network analysis specific to urban systems: Do you window the graph using network distances or crow-flies distances (Cooper 2015)? How do you stop the algorithm from short-cutting sharp corners when using simplest-path methods (Turner 2007)? How do you assign land-uses to the network or overload network algorithms to calculate custom network centralities, land-use accessibilities, mixed-use measures, and spatially aggregated statistical measures? How do you reduce topological distortions and abstract the handling of network geometries and the ensuing calculation of shortest-path and simplest-path distances?

The PhD provided an opportunity to systematically work through these issues; yet, this also meant that piggy-backing performant off-the-shelf network analysis packages at high levels of customisability quickly became counterproductive. Shoehorning extraneous considerations into such packages leads to complex code. At the same time, the encroachment of performant languages (e.g. C or C++) from the realm of Python sacrifices the speed advantages that led to their adoption in the first place. This conundrum led down the rabbit hole to direct experimentation with the underlying shortest-path algorithms, a strategy allowing unique considerations to be taken into account because they could now be addressed directly at the source: terminating a graph-search when encountering a distance threshold, adding a parameter to trigger checks for prevention of side-stepping of sharp angular turns, calculating angular centralities on-the-fly, incorporation of unique forms of centralities, and extending these workflows for the calculation of land-use or statistical measures.

This approach afforded customisability and like-for-like comparisons of workflows and algorithms while retaining the requisite performance for application to large graphs. At this point, the whole stack was being built “by hand” and from “first principles” with extensive reliance on Numba. This inevitably led to the next issue: an ever-growing level of code-base complexity that had begun to encumber further progression in and of itself. At this point, the data structures were still being created by hand, involving loading Ordnance Survey data from databases, extracting information from roadway geometries, and converting this information into NumPy data structures that could be processed by the underlying algorithms. These workflows became challenging to maintain and were hard to touch without breaking something somewhere else or having to copy-and-paste bits here-and-there only to realise something else needed tweaking… and that days’ worth of analysis had to be rerun. This prompted the formalisation of the code-base into what would become cityseer-api. It was only later that I appreciated more fully that the exercise of abstracting and formalising the logic of the code-base would substantially benefit downstream development because the code was now documented and tested. I could continue tinker with the code with the confidence that I wasn’t breaking something else along the way and could resort to the official documentation by the time I inevitably forgot the parameters and logic for various workflows and functions had been.

So, with that introduction out of the way, I’ll point directly to the documentation where you can learn more about the package and how it can be used. Questions can be asked in the repo’s Discussions area, and the code can be found at cityseer-api. The broader background to some theoretical considerations can be found in the associated paper.

Cooper, C.H.V., 2015. Spatial localization of closeness and betweenness measures: a self-contradictory but useful form of network analysis. International Journal of Geographical Information Science, 29(8), pp.1293–1309. Available at: https://www.tandfonline.com/doi/pdf/10.1080/13658816.2015.1018834?needAccess=true.
Turner, A., 2007. From axial to road-centre lines: a new representation for space syntax and a new model of route choice for transport network analysis. Environment and Planning B: Planning and Design, 34, pp.539–555. Available at: http://journals.sagepub.com/doi/pdf/10.1068/b32067.
Copyright © 2014-present Gareth Simons