/unsupervised-methods
1st May, 2021Gareth Simons

Untangling urban data signatures: Unsupervised machine learning methods for detection of urban archetypes

Abstract

Urban morphological measures applied at a high-resolution of analysis may yield a wealth of data describing varied characteristics of the urban environment in a substantial degree of detail; however, these forms of high-dimensional datasets are not immediately relatable to wider constructs rooted in conventional conceptions of urbanism. Data science and machine learning methods provide an opportunity to explore such forms of data through use of unsupervised machine learning methods through which the dimensionality of the data can be reduced while recovering latent themes and identifying characteristic patterns which may resonate with urbanist discourse more generally.

Dimensionality reduction and clustering methods including Principal Component Analysis (PCA), Variational Autoencoders (β\beta-VAE), and an Autoencoder based Guassian Mixture Model (VaDE) are discussed and demonstrated for purposes of ‘untangling’ data: unveiling themes which may be used to bridge quantitative and qualitative descriptions of urbanism. The methods are applied to a morphological dataset for Greater London consisting of a combination of centrality, landuse accessibility, mixed-uses, and population density measures which have been computed at pedestrian walking thresholds ranging from 100m to 800m. The spatial aggregations and morphological measures are computed at a 20m network resolution using the cityseer-apiPython package which utilises a local windowing-methodology with distances computed directly over the network and with aggregations performed dynamically and with respect to the direction of approach, thus preserving the relationships between the variables and retaining contextual precision.

Whereas the demonstrated methods hold tremendous potential, their power is difficult to convey or fully exploit using conventional lower-dimensional visualisation methods. This underscores the need for subsequent research into how such methods may be coupled to interactive visualisation methods to further elucidate the richness of the data and its potential implications.

Link to preprint paper pending.

Introduction: Detection of urban archetypes with unsupervised machine learning methods

Vibrant pedestrian districts manifest an affinity for complexity and its requisite diversity, as do complex systems more generally1. Yet, urban masterplans have historically demonstrated a proclivity towards reductionism. Cities were increasingly rearranged around motor-vehicles and were reconceived in the abstract on drawings boards: the more granular, dense, mixed-use, and visually ‘messy’ artefacts of evolved cities were steamrolled to make way for grandiose compositions that were ultimately too large, too homogenous, and too resistant towards change for pedestrian-based forms of urbanism to thrive2|3. Though initially associated with high-modernism, aspects of these patterns are still prevalent in various forms of contemporary urbanism manifesting across the spectrum from suburbia to romanticised smart city masterplans, explicitly or implicitly emphasising idealised efficiencies at the expense of complexity4|5|6. This paradigm can stifle the oft unpredictable and chaotic forms of interaction aiding processes of discovery and diffusion within complex adaptive systems7|8.

The complex systems interpretation of cities, replete with dynamics from emergence to non-linearities to phase-changes9|10|11|12, resists simple averages and crude models13. This presents a challenge: architects, urban designers, and planners are faced with the dilemma of how to plan for inherently unpredictable processes at the urban scale14|15. Whereas it is not possible to anticipate every last action of city citizens — and how these chains of interaction might bifurcate or coalesce through space — it is possible to gauge, more generally, how that certain forms of urbanism may be more conducive to large numbers of permutations of complex interactions16. Complex systems derived methods, including network centralities and landuse diversity measures, are proxies for urban complexity: they foreshadow networks of potential interactions available to city citizens. Whereas our ability to model the full complexity of urban systems will always be constrained — not least by sensitivity to initial conditions — it remains possible to explore the spatial manifestations of complex processes present in historical cities and to compare these to new forms of development. The question then arises: if we apply localised centrality and mixed-use measures at a sufficiently precise and high resolution of analysis, then are emerging forms of data analysis and unsupervised machine learning methods useful for purposes of ‘untangling’ and ‘sifting-out’ signature patterns from the mass of ensuing data? These terms are used in the literal sense because large and high-dimensional datasets require teasing apart to reveal latent themes which may ultimately help bridge the gap from quantitative forms of urban analytics to qualitatively framed conceptions of cities (@Portugali2012.

At first glance, the use of data science or machine learning models to understand or even predict aspects of healthy cities may appear ironic and misguided given planning’s problematic past. Examples such as Robert Moses’ ruthless dismemberment of New York City, in which neighbourhoods were razed to make way for motorways17, and failed housing schemes such Pruitt-Igoe, grotesque reconceptions of urbanism devoid of human scale, are stark reminders that reliance on shallow or overly abstract interpretations of urbanism can lead to misguided decisions and problematic forms of urban policy. It is precisely these issues that provoked Jane Jacobs’ The Death and Life of Great American Cities (@Jacobs1961 which would forever change perspectives on urbanism. Jacobs laments that planners had misconstrued the nature of cities and had mistakenly assumed that decisions made in the abstract were somehow sufficient to deal with the emergent complexity of healthy cities. Her thoughts were articulated in reference to Warren Weaver’s seminal paper Science and Complexity18, which casts the nature of scientific problems into three classes: “problems of simplicity”, which can be described and modelled using simple sets of variables and equations that behave predictably; “problems of disorganised complexity”, where the behaviour of large quantities of elements such as gas molecules in a container can be modelled collectively using statistical methods even though the behaviour of each constituent part is chaotic; and, heralding the complexity sciences, “problems of organised complexity”, which do not adequately yield to either of the aforementioned approaches and present some difficulty in solving because they exhibit non-linear, emergent, and adaptive behaviours. She places the nature of cities squarely in the last category — problems of organised complexity — entailing “a sizable number of factors which are interrelated into an organic whole” and present “situations in which a half-dozen or even several dozen quantities are all varying simultaneously and in subtly interconnected ways”13. She explains that attempts at planned settlements tended to eschew complexity in favour of, per Garden Cities, dumbed-down linear ratios between abstract quantities such as housing and jobs or, as may be symbolised by Le Corbusier’s Radiant City, conceptions of urbanism rooted in larger-scale abstractions wherein people are reduced to statistical aggregations that could, again, be treated as simpler linear combinations of variables. By recasting cities through a reductionist lens, planners had come to make decisions that were detached from the complexities of functioning neighbourhoods and treated inhabitants as simplistic aggregations tantamount to “grains of sand, or electrons or billiard balls”13. Jacobs proceeds to offer a prescription: our understanding of cities should instead be develop out of “the microscopic or detailed view\ldots rather than on the less detailed, naked-eye view suitable for viewing problems of simplicity or the remote telescopic view suitable for viewing problems of disorganized complexity”13. She outlines three principles to this end. First, is to think about city ‘processes’: elements within cities have different effects depending on their combinations with other elements and the varied interactions between them. Secondly, to reason from an inductive rather than deductive approach, meaning from particularities to generalisations instead of from generalities to particulars. Thirdly, to look for ‘unaverage’ clues such as peculiarities, outliers, or nascent trends that help elucidate the workings of cities rather than fixating on statistical methods rooted in large-scale ‘averages’ which may offer little explanation for how constituent elements may be operating within a complex system, particularly if applied at a more localised scale.

A knee-jerk reaction may be to reject machine learning out-of-hand for for its links to mathematics and statistics more generally; however, on closer scrutiny the synthesis of locally precise urban morphological measures combined with machine learning methods affords the use of highly detailed datasets capable of capturing and preserving contextual particularities; facilitates use of high-dimensional datasets with significant assortments of variables and potentially complex and varied non-linear relationships between them; and, in the form of unsupervised methods combined with deep neural networks, allows for structures to be unearthed directly from within the data without the imposition of externally held theories or formulas. Compared to traditional statistical methods applied to larger spatial aggregations, the application of machine learning to high resolution spatial data resembles an approach that is more akin to proceeding from the particular to the general: in spite of the large volumes of information the data-space is (in effect) explored ‘line-by-line’ with model losses computed and updated over comparatively small batches of data. Patterns are ‘sniffed-out’ using exploratory and bottom-up-like procedures with the more prevalent of these congealing over successive iterations to reveal thematic patterns that have emerged from within the data. It must be emphasised that this reasoning only holds true if working with localised metrics gathered at a sufficiently high resolution of analysis and with the measures processed directly from each location. The use of intervening levels of spatial aggregation, interpolation from larger to smaller units of scale, or overly large units of analysis needs to be eschewed because these would otherwise result in the attrition of information and, critically, discard or otherwise mask local-scale inter-relationships between the variables.

Traditional forms of urban morphological analysis have been difficult apply at scale because of reliance on manually collated observations and wearisome calculations. Geographic Information Systems (GIS) have permitted larger scales of quantitative analysis, but the lack of large and granular data sources combined with computational constraints meant that these methods have tended to be applied against larger units of spatial aggregation and relied on simplified distance metrics19|20. More recently, however, the increased availability of detailed datasets has facilitated a finer scale of analysis while retaining the ability to process larger areal extents21, thus prompting the adoption of multi-variable and multi-scalar workflows. The ensuing large and high-dimensional datasets can be combined with unsupervised exploratory methods, and has engendered interest in how urban morphological analysis can be applied not only to the exposition of existing cities, but also in the capacity of a rigorous design-aid for newly planned forms of development22|23|24|25.

Link to preprint paper pending.


  1. 1. Page S. Diversity and Complexity. Princeton: Princeton University Press; 2011.
  2. 2. Harvey D. The condition of postmodernity : an enquiry into the origins of cultural change. Oxford: Oxford : Basil Blackwell; 1989.
  3. 3. Lyon D. Postmodernity. Second Edi. Minnesota: University of Minnesota Press; 1999.
  4. 4. Greenfield A. Against the smart city. 1.2. Author; 2013.
  5. 5. Townsend AM. Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. New York: W. W. Norton & Company; 2013.
  6. 6. Sterling B. The Epic Struggle of the Internet of Things. First Edit. Strelka Institute for Media, Architecture; 2014.
  7. 7. Kauffman SA, Johnsen S. Coevolution to the Edge of Chaos: Coupled Fitness Landscapes, Poised States, and Coevolutionary Avalanches. J theor Biol [Internet]. 1991;149:467–505. Available from: https://ac-els-cdn-com.libproxy.ucl.ac.uk/S0022519305800943/1-s2.0-S0022519305800943-main.pdf?%7B%5C_%7Dtid=11f38742-1339-11e8-9a6f-00000aacb35f%7B%5C&%7Dacdnat=1518799827%7B%5C_%7Df6a0ee5be10600e848f09bca22a07f9c
  8. 8. Kauffman S. At Home in the Universe. Oxford: Oxford University Press; 1995.
  9. 9. Batty M, Longley P. Fractal Cities: A Geometry of Form and Function. London: Academic Press; 1994.
  10. 10. Batty Michael. Cities and Complexity : Understanding Cities with Cellular Automata, Agent-Based Models, and Fractals. Cambridge, MA: MIT Press; 2005.
  11. 11. Batty Michael. The New Science of Cities. Cambridge, MA: MIT Press; 2013.
  12. 12. Allen PM. Cities: The Visible Expression of Co-evolving Complexity. In: Portugali J, Meyer H, Stolk E, Tan E, editors. Complexity Theories of Cities Have Come of Age. London: Springer; 2012. p. 67–89.
  13. 13. Jacobs J. The Death and Life of Great American Cities. Vintage Bo. New York: Random House; 1961.
  14. 14. Portugali J. Complexity Theories of Cities: Achievements, Criticism and Potentials. Portugali J, Meyer H, Stolk E, Tan E, editors. London: Springer; 2012.
  15. 15. Marshall S. Cities Design and Evolution. New York: Routledge; 2009.
  16. 16. Alexander C. A City is Not a Tree. Ekistics [Internet]. 1967;23(139):344–8. Available from: http://www.jstor.org.libproxy.ucl.ac.uk/stable/pdf/43614532.pdf?refreqid=excelsior%7B%5C%25%7D3Ac511aa087d15f84bf0cf4e8714b25e4e
  17. 17. Flint A. Wrestling with Moses: How Jane Jacobs Took on New York’s Master Builder and Transformed the American City. New York: Random House; 2011.
  18. 18. Weaver W. Science and Complexity. American Scientist [Internet]. 1948;36(536). Available from: http://people.physics.anu.edu.au/%7B~%7Dtas110/Teaching/Lectures/L1/Material/WEAVER1947.pdf
  19. 19. Logan TM, Williams TG, Nisbet AJ, Liberman KD, Zuo CT, Guikema SD. Evaluating urban accessibility: leveraging open-source data and analytics to overcome existing limitations. Environment and Planning B: Urban Analytics and City Science [Internet]. 2017 Nov;46(5):897–913. Available from: https://doi.org/10.1177/2399808317736528
  20. 20. Araldi A, Fusco G. Urban Form from the Pedestrian Point of View: Spatial Patterns on a Street Network. In: SiTI, sui Sistemi Territoriali per l’Innovazione ISMB IS, di Torino ISMBDP, editors. 9th International Conference on Innovation in Urban and Regional Planning (INPUT 2016) [Internet]. Turin, Italy; 2016. p. 32–8. (E-agorà for the transition towards resilient communities - INPUT 2016, Conference Proceedings). Available from: https://hal.archives-ouvertes.fr/hal-01417484
  21. 21. Araldi A, Fusco G. From the street to the metropolitan region: Pedestrian perspective in urban fabric analysis. Environment and Planning B: Urban Analytics and City Science [Internet]. 2019 Aug;46(7):1243–63. Available from: https://doi.org/10.1177/2399808319832612
  22. 22. Serra M, Gil J, Pinho P. Towards an understanding of morphogenesis in metropolitan street-networks. Environment and Planning B: Urban Analytics and City Science [Internet]. 2016 Dec;44(2):272–93. Available from: https://doi.org/10.1177/0265813516684136
  23. 23. Gil J, Montenegro N, Beirão J, Duarte J. On the Discovery of Urban Typologies: Data Mining the Multi-dimensional Character of Neighbourhoods. Computation: The New Realm of Architectural Design [27th eCAADe Conference Proceedings / ISBN 978-0-9541183-8-9] Istanbul (Turkey) 16-19 September 2009, pp. 269-278. 2009.
  24. 24. Berghauser Pont M, Stavroulaki G, Gil J, Marcus L, Serra M, Hausleitner B, et al. Quantitative Comparison of Cities: Distribution of street and building types based on density and centrality measures. 2017.
  25. 25. Berghauser Pont M, Stavroulaki G, Bobkova E, Gil J, Marcus L, Olsson J, et al. The spatial distribution and frequency of street, plot and building types across five European cities. Environment and Planning B: Urban Analytics and City Science [Internet]. 2019 Aug;46(7):1226–42. Available from: https://doi.org/10.1177/2399808319857450