/cityseer-package
1st May, 2021Gareth Simons

## The cityseer Python package for pedestrian-scale network-based urban analysis

### Abstract

cityseer-api is a Python package consisting of computational tools for fine-grained street network and landuse analysis, useful for assessing the morphological precursors to vibrant neighbourhoods. It is underpinned by network-based methods that have been developed from the ground-up for localised urban analysis at the pedestrian scale, with the intention of providing contextually specific metrics for any given streetfront location. It can be used to compute a variety of node or segment-based network centrality methods, landuse accessibility and mixed-use measures, and statistical aggregations. Aggregations are computed dynamically — directly over the street network while taking into account the direction of approach — and can optionally incorporate spatial impedances and network decomposition to further accentuate spatial precision. The use of Python facilitates interaction with popular computational tools for network manipulation, geospatial data processing, and interaction with the numpy stack of scientific packages. The underlying algorithms are implemented in numbaJIT compiled code so that the methods can be applied to large networks. In-out convenience methods are provided for interfacing with NetworkX and OSMnx and the provision of graph cleaning tools aids the incorporation of ‘messier’ network respresentations such as those derived from Open Street Map.

Online documentation is available from cityseer.benchmarkurbanism.com and the github repository is available at github.com/benchmark-urbanism/cityseer-api.

### Introduction: Spatial aggregation: the elephant in the room?

Computational tools have dramatically increased the scale and depth of scientific analysis. Methods applied to spatial analysis have, likewise, been revolutionised, and hold tremendous potential for rigorous and scalable forms of urban analysis which may prove useful as benchmarking tools for principles espoused in urban theory and policy. However, computational constraints and coarse data-sources have thus-far tended to favour larger areal units of spatial aggregation combined with simplified proximity methods that do not fully exploit fine-scaled information and its potentially rich interpretation at the level of the street1|2. This may undermine the relevancy of such methods for the interpretation of urban morphological characteristics at the granular scale because the use of overarching gridded or zonal aggregations collapses fine-scaled particularities which are otherwise present at the scale of the street-front. Further, the bundling of data into spatially dissociated averages or rates discards underlying prevalences and combinations of information in relation to individual data-points, buildings, or plots. Attempts to project inferences from this larger spatially blended scale of analysis onto a smaller and more contextually specific context therefore invokes the Ecological Fallacy: correlations which may have been valid can become misleading if interpreted in the disaggregated context3. Statistical observations for zones, neighbourhoods, cities, counties, or countries may therefore be far removed from the context of a local street corner.

As illustrated by Simpson’s Paradox, aggregations can furthermore mask confounding variables: a canonical example being negative correlations between smoking and the death rate unless the data is first stratified by age, in this case the confounding variable. Loss of information across ‘geography’ (space) and ‘history’ (time) may similarly confound spatially aggregated data4. This issue segues into the broader Modifiable Areal Unit Problem (MAUP): statistical observations derived from spatially aggregated data is sensitive to the scale of aggregation; the arrangement of the data in relation to zonal extents; and spatial autocorrelation in the variables. As a rule-of-thumb, larger aggregations increase sensitivity to MAUP because of a smoothing effect in the distribution of data and decreasing levels of variance, with the implication that correlation coefficients will strengthen as the unit of aggregation increases5. Variance is likewise affected by spatial autocorrelation of variables or by the movement of boundaries relative to the geographic locations of data points. Different spatial aggregations will therefore trigger fluctuating and, sometimes, questionable statistical inferences, particularly when applied across differing scales of analysis or between varied zonal configurations6|7. These problems are inherent to the use of spatially aggregated data; yet, no simple solutions exist and the issue has proved particularly intractable for multivariate analysis. Attempts persist at better defining and managing the problem8|9.

Increasing availability of spatially granular data sources combined with increasing access to computational resources has begun to tip the scales in favour of higher resolution workflows capable of allowing for more contextually precise analysis10. The confluence of rich data sources coupled with street-network-based strategies heralds a shift from the aerial vantage point of the plan — traditionally the frame of reference for morphological analysis — to that of localised pedestrian-centric methods11: the pedestrian’s vantage point can literally become the anchor and point of departure for spatial analysis.