Prediction of `artificial’ urban archetypes at the pedestrian-scale through a synthesis of domain expertise with machine learning methods
Abstract
The vitality of urban spaces has been steadily undermined by the pervasive adoption of car-centric forms of urban development as characterised by lower densities, street networks offering poor connectivity for pedestrians, and a lack of accessible land-uses; yet, even if these issues have been clearly framed for some time, the problem persists in new forms of planning. It is here posited that a synthesis of domain knowledge and machine learning methods allows for the creation of robust toolsets against which newly proposed developments can be benchmarked in a more rigorous manner in the interest of greater accountability and better-evidenced decision-making.
A worked example develops a sequence of machine learning models generally capable of distinguishing ‘artificial’ towns from the more walkable and mixed-use ‘historical’ equivalents. The dataset is developed from morphological measures computed for pedestrian walking tolerances at a 20m network resolution for 931 towns and cities in Great Britain. It is computed using the cityseer-api
Python
package which retains contextual precision and preserves relationships between the variables for any given point of analysis.
Using officially designated ‘New Towns’ as a departure point, a series of clues is developed. First, a supervised classifier (Extra-Trees) is cultivated from which 185 ‘artificial’ locations are identified based on data aggregated to respective town or city boundaries through a process of iterative feedback. This information is then used to train supervised and semi-supervised (M2) deep neural network classifiers against the full resolution dataset, where locations are assessed at a 20m network resolution using only pedestrian-scale information available to each point of analysis. The models broadly align with intuitions expressed by urbanists and show strong potential for continued development.
Introduction: Prediction of urban archetypes with deep neural networks
In 2012, Krizhevsky, Sutskever, and Hinton (Krizhevsky et al. 2012) introduced a revolutionary machine learning model: it combined massive datasets, convolutional layers, and deep-learning to attain best-in-class classification accuracies on the ImageNet database, consisting of more than 15 million images in over 22,000 categories. The ideas and techniques were not necessarily new, but the authors noted that the depth of their neural network, in this case, five convolutional layers and three fully connected layers, was pivotal to the model’s performance. The subsequent upsurge in the use of ever-larger datasets combined with ever-deeper neural networks has led to machine learning models with human — or better than human — levels of performance. Deep learning has attained a near-mythical status and is a prevalent feature in the rapid development of AI, with Silver (2017) (Silver et al. 2017) going on to claim that AlphaGo, an AI underpinned by deep learning, had learned ‘superhuman proficiency’ in the game of Go from scratch without the aid of human knowledge.
On closer scrutiny, claims that such systems are truly capable of developing human-like intelligence, especially from ‘tabula rasa’, tend to be overstated, and further breakthroughs will likely be required before truly generalisable artificial intelligence can become a reality (Mitchell 2021). Marcus (Marcus 2018) argues that AlphaGo’s intelligence is not truly ‘innate’: human knowledge has entered the system in the form of a Monte Carlo tree search algorithm, thus empowering the system with the techniques necessary to learn solutions specific to the challenge at hand. Further, the model’s intelligence is not generalisable: learning other games implies retraining, and the model cannot solve broader classes of problems that young children may trivially solve. The tremendous volumes of data and the great difficulty in generalising deep neural nets to other problems draws sharp contrasts to the human mind (Sinz et al. 2019) whose innate structures appear to facilitate an ability to form rapid and powerful abstractions that generalise well to varied forms of problem-solving. These challenges underscore an important and oft understated reality: deep learning is a tremendously powerful, but also fickle, tool (Marcus 2018a). It is brittle by nature: data-hungry, narrowly focused, and easily fooled. Neural networks may learn patterns but cannot ‘see the forest for the trees’. If representative patterns are not present in the data or go undetected by a model’s structure or loss function, the model ‘does not know what it does not know’. Such models may consequently behave contrary to best intentions by being needlessly complex (Rudin & Radin 2019), biased or ignorant of unrepresented or unfairly represented classes within the data [@Rudin2019; @Corbett-davies2018], and are generally difficult to develop or reproduce [@Henderson2017].
The proverbial notion that machine learning is an autonomous technology that can magically conjure meaning out of meaningless jumbles of data and that deep-learning infused AI and robotics technologies will soon usher in a utopian future must therefore invoke cynicism. However, it is also important to note that nascent machine learning methods remain amongst the most powerful and valuable tools currently at the disposal of the scientific community and that many of the perceived shortcomings are attributable to a disconnect between the hype associated with the models and an otherwise more realistic understanding of their nature and limitations. The contributions of humans to model development tends to be understated (Marcus 2018b): for these models to be meaningful and trustworthy, they require large amounts of domain-specific information imparted at various stages of the development process. In this sense, ML is a powerful sidekick, but one that is potentially prone to naive assumptions or misbehaviour if left entirely to its own devices. The models require interaction and oversight in a process akin to a ‘dance with data’. Datasets have to be selected and prepared in a manner that accurately represents the nature of the data that we want the algorithms to learn, targets and loss functions coerce models in the right direction, and regularisation methods and testing procedures are necessary to ensure that models are capable of generalisation to unseen samples in a manner that is realistic and fair for the task at hand.
Urban scientists consequently need to be aware of how datasets, data science methods, and machine learning models may ultimately affect day-to-day decisions and policies (Duarte 2020), and how that misinformed models may end up being used to justify courses of action affecting city-citizens and the urban environment for the worse. There is a danger in chasing misguided accuracy metrics or ‘buzz-friendly’ marketing pitches: models can be accurate, but meaningless. A simple and not uncommonly encountered example is applying simple error or accuracy rates to unbalanced datasets. Class imbalances are regularly faced by real-world data analysis situations when labels for one class substantially overpower another’s presence. Credit card fraud data provides an extreme example, where the minority class (fraudulent transactions) may be infinitesimally more diminutive than the majority class. When training a classifier against an unbalanced dataset using simple accuracy rates, the algorithm may opt to completely ignore the minority class (e.g. inferring that all credit card transactions are not fraudulent) while claiming an accuracy approaching 100%.
Various strategies exist for the temperance of class imbalance problems: undersampling the majority class, oversampling the minority class, adjusting the costs associated with losses from respective classes (Chawla et al. 2004); use of more nuanced accuracy metrics such as Receiver Operating Characteristic curves (the true positive rate plotted against the false positive rate) or F1 scores (weighted average of precision and recall) (García V. and Mollineda & S 2009); and calibration techniques for correcting the distributions of probabilistic classifications (Pozzolo et al. 2015). Nevertheless, the application of such techniques requires intervention through the role of an informed data scientist who, in turn, needs to be aware of the potential presence of such imbalances and how overlooking these may have far-reaching ramifications. This example reflects the broader issue: the development of predictive machine learning models may require a substantial degree of nurturing, testing, and oversight to understand how the model ‘thinks’ and ‘reacts’ to the data and to guard against unintended forms of behaviour. Visualisation methods can thus be important because they facilitate comprehension of how the models work while allowing domain experts, who may not have direct knowledge of how these models function at a lower level, an opportunity to provide feedback on suspicious forms of predictive behaviour.
Whereas the misuse of data science methods for any variety of problematic workflows or end-purposes exists, these methods also hold the potential for scalable and rigorous forms of sensible analysis if used with sufficient safeguards and rigorous oversight from those with detailed knowledge of the domain of interest. Contrarily, it bears emphasis that throngs of architects, urban designers, planners, engineers, civic officials, and NIMBYs have, in turn, been directly responsible for a trail of ill-conceived urban interventions, and this cannot be blamed on statistics or models so much as a human proclivity towards reductionism and self-interest. Although humans are better than machines at generalising problems, they can also be susceptible to wistful narratives or easily waylaid by idealistic pursuits or profit-driven motives. Further, even where skilled and perceptive urban designers and planners are well-aware of implicit biases underpinning problematic planning proposals, they may be at a loss to bolster better-informed decision-making against hearsay or political pressures. Against this backdrop, an interesting question can be posed: can we connect the strong suits of domain experts, who may intuitively understand the issues at hand, to the strong suits of algorithms capable of exhaustively exploring and laying bare the solution space in a robust and scalable manner? How might tools that synthesise qualitative knowledge with quantitative approaches build an accountable evidence base within the context of politically wrangled decision-making processes?
Please see the linked preprint paper for additional information.