City-scale simulations and visualization


Krishna Kumar, krishnak@utexas.edu



UT Austin. 26 April 2019

Smart city

Queries: Big, Fast and Rich

Relational database

Humidity? Vibration? 3D point cloud data? …

PostGIS

  • Geographic objects to the PostgreSQL object-relational database
  • Geometry types for Points, LineStrings, Polygons, MultiPoints, MultiLineStrings, MultipPolygons and GeometryCollections.
  • Spatial operators for determining geospatial measurements like area, distance, length and perimeter.
  • Spatial operators for determining geospatial set operations, like union, difference, symmetric difference and buffers (provided by GEOS).
v.to.db map=roads option=length type=line col=linelength units=me

Spatio-Temporal Query

  • Semantic analytics tools have primarily focused on thematic relationships, but spatial and temporal relationships are often critical components in analytical domains.
  • Current GIS and spatial database technology does not support complex thematic analytics operations.
    • GIS - modelling and analysing spatial and temporal relationships
    • Thematic entities and their relationships are not explicitly and independently represented, making analysis of these relationships difficult.

Smart cities standard

  • Smart cities standard (PAS 182) offers a handful of generic concepts (such as place, observation, metric etc.) to formulate a common language for linking data across organizations in a city.
  • The main goal is to seamlessly integrate information from multiple organisations using Linked Data (Semantic Web technology) that can be shared and edited.
  • Different from BIM, Smart Cities allow for the sharing of critical but less tangible aspects that relate to the data: such as related Assumptions and Objectives.

Linked Data - Semantic Web

Linked Data - example

Linked Data - Concept model

Linked Data - Time Slices

Temporal RDF

Temporal RDF extends the RDF statement from a triple to a quad where the fourth element is the valid time of the RDF statement

Graph processing

Graph optimisation

Big Data - City scale modelling

Gerry Casey, Bing Yu, Kenichi Soga, Peter Guthrie, Elisabete Silva, Krishna Kumar and Mietek Bak

SQL v NoSQL - Google directions API

London journey times

London journey times

London journey times

Traffic saturation curves

Agent based modelling (ABM)

Graph data structure

London graph network: Macroscopic view

London graph network: Microscopic view

London graph network: Edge weights

Geospatial conflation

Geospatial differences: Google (dark blue) versus ITN (light blue). Google use epsg:4326 and the ITN uses epsg:27700

ABM time step

Spatial v Temporal complexity

Spatial

Temporal

MapReduce v In-memory

MapReduce

  • Batch processing
  • Larger set of data to extract features and correlations

In-memory

  • Real-time predictions
  • GraphX: Graph parallel computing

ABM hadoop map-reduce

Spark RDD

The resilient distributed dataset (RDD) is an immutable, fault-tolerant distributed collection of objects that can be operated on in parallel. RDDs are divided into smaller chunks called Partitions, and when you execute some action, a task is launched per partition.

ABM system architecture

  • Decentralised data structures
  • Graph size: ~250k nodes and ~800k edges
  • ~1.5 GB (.gml) to 43 MB (.json.gz)

ABM hadoop architecture

Agent based modelling (ABM) simulations with LSOA

ABM simulations with LSOA and Page Rank

Dynamic computations

  • Shortest path at time 't'
  • Shortest path considering emissions
  • Shortest path re-computation in response to an event
  • 1 million agents in ~1 hour

Agent based modelling (ABM)

Agent based modelling

ABM: London PDF jouney distance

ABM: London PDF jouney time

Computational performance

HDInsight Spark Cluster.

  • 6 nodes, where there were two head node D12v2 with 8 cores and 4 worker nodes D13v2 with 32 cores.
  • St Pancras, London City Airport and Heathrow Airport (4835 origins and 3 destinations) hub
  • run over 5 time slices, from 9am to 1pm and thus a total of 34 million agents were simulated.
  • significant time saving, from over 2 hours on a single node to around 15 minutes on the 6 node cluster.

Computational performance: Shortest path

Computational performance

Parallelism v Concurrency

Shared v Distributed memory architectures

Cache

ABM runtime reduction (Ligra)

Dijkstra

Priority queue

Priority queue

Binary Heaps have average O(1) for findMin/findMax and O(logn) for insertion and deletion.

ABM scaling time

ABM scaling

ABM computational performance

C++ STL containers: Vector

Vectorization

Non-vectorized

Vectorized

C++ STL containers: Map

Big-O STL containers

Map hashing

Hash collisions

Robinhood hashing

Robinhood hashing performance

Big Data visualisation

Tile based visualisation

WebGL - GLSL (Shading language)


GLSL - MathBox WebGL

GPU

Super computing

Alien Isolation

Tokyo ABM: Deck.gl


Thank you!
and
Hook 'em!