City-scale simulations and visualization

Krishna Kumar, krishnak@utexas.edu

UT Austin. 26 April 2019

Smart city

Queries: Big, Fast and Rich

Relational database

Humidity? Vibration? 3D point cloud data? …

PostGIS

Geographic objects to the PostgreSQL object-relational database
Geometry types for Points, LineStrings, Polygons, MultiPoints, MultiLineStrings, MultipPolygons and GeometryCollections.
Spatial operators for determining geospatial measurements like area, distance, length and perimeter.
Spatial operators for determining geospatial set operations, like union, difference, symmetric difference and buffers (provided by GEOS).

v.to.db map=roads option=length type=line col=linelength units=me

Spatio-Temporal Query

Semantic analytics tools have primarily focused on thematic relationships, but spatial and temporal relationships are often critical components in analytical domains.
Current GIS and spatial database technology does not support complex thematic analytics operations.

GIS - modelling and analysing spatial and temporal relationships
Thematic entities and their relationships are not explicitly and independently represented, making analysis of these relationships difficult.

Smart cities standard

Smart cities standard (PAS 182) offers a handful of generic concepts (such as place, observation, metric etc.) to formulate a common language for linking data across organizations in a city.
The main goal is to seamlessly integrate information from multiple organisations using Linked Data (Semantic Web technology) that can be shared and edited.
Different from BIM, Smart Cities allow for the sharing of critical but less tangible aspects that relate to the data: such as related Assumptions and Objectives.

Linked Data - Semantic Web

Linked Data - example

Linked Data - Concept model

Linked Data - Time Slices

Temporal RDF

Temporal RDF extends the RDF statement from a triple to a quad where the fourth element is the valid time of the RDF statement

Graph processing

Graph optimisation

Big Data - City scale modelling

Gerry Casey, Bing Yu, Kenichi Soga, Peter Guthrie, Elisabete Silva, Krishna Kumar and Mietek Bak

SQL v NoSQL - Google directions API

London journey times

Traffic saturation curves

Agent based modelling (ABM)

Graph data structure

London graph network: Macroscopic view

London graph network: Microscopic view

London graph network: Edge weights

Geospatial conflation

Geospatial differences: Google (dark blue) versus ITN (light blue). Google use epsg:4326 and the ITN uses epsg:27700

ABM time step

Spatial v Temporal complexity

Spatial

Temporal

MapReduce v In-memory

MapReduce

Batch processing
Larger set of data to extract features and correlations

In-memory

Real-time predictions
GraphX: Graph parallel computing

ABM hadoop map-reduce

Spark RDD

The resilient distributed dataset (RDD) is an immutable, fault-tolerant distributed collection of objects that can be operated on in parallel. RDDs are divided into smaller chunks called Partitions, and when you execute some action, a task is launched per partition.

ABM system architecture

Decentralised data structures
Graph size: ~250k nodes and ~800k edges
~1.5 GB (.gml) to 43 MB (.json.gz)

ABM hadoop architecture

Agent based modelling (ABM) simulations with LSOA

ABM simulations with LSOA and Page Rank

Dynamic computations

Shortest path at time 't'
Shortest path considering emissions
Shortest path re-computation in response to an event
1 million agents in ~1 hour

Agent based modelling (ABM)

Agent based modelling

ABM: London PDF jouney distance

ABM: London PDF jouney time

Computational performance

HDInsight Spark Cluster.

6 nodes, where there were two head node D12v2 with 8 cores and 4 worker nodes D13v2 with 32 cores.
St Pancras, London City Airport and Heathrow Airport (4835 origins and 3 destinations) hub
run over 5 time slices, from 9am to 1pm and thus a total of 34 million agents were simulated.
significant time saving, from over 2 hours on a single node to around 15 minutes on the 6 node cluster.