pgRouting on IBM Cloud Databases for PostgreSQL Using Esri Shapefiles
We’re going to show you how to plan your travel around Seattle’s museums and sights, building on our introduction to pgRouting on IBM Cloud Databases for PostgreSQL.
Today, we are going to look at how to upload an Esri shapefile of Seattle streets, layer on a set of locations throughout Seattle, and then find the shortest paths to those points using the street network.
In the previous article, “Getting Started with pgRouting on IBM Cloud Databases for PostgreSQL,” we covered adding the extension to IBM Cloud Databases (ICD) for PostgreSQL, creating a dataset of nodes, connecting them together, and then finding the shortest path using the pgRouting routing functions pgr_dijkstra
and pgr_ksp
. Now, we’ll build on that.
Importing an Esri shapefile
If you don’t already have PostGIS installed and you’re using macOS with homebrew, you can install it using the terminal with:
Directions on how to install PostGIS on other platforms are found here.
We’ll also be using shapefiles to get the streets that we’ll be routing on and the tool we’ll be using is shp2pgsql, which comes with PostGIS out of the box. PostGIS has some good documentation on all its features for more advanced scenarios.
The shapefile we’ll be using is the City of Seattle street network. It’s a large ZIP file that includes all the streets within the Seattle area. Once you’ve downloaded it, open the ZIP file, and in it, you’ll have two folders named “StatePlain” and “WGS84.” We’ll use the shapefile included in the “WGS84” file. Why? Because it’s already set to SRID 4326, which is a frequently used SRID for geographic data.
Now, we’ll load the shapefile into PostgreSQL using the shp2pgsql
tool. The syntax we’ll use is the following:
Let’s go over the options we passed to shp2pgsql
. The two most common options that are frequently used are the -I
and -s
switches. The -I
switch automatically sets up the index for the geometry column, which is important for any geospatial data set, and the -s
switch sets the SRID to 4326. Our shapefile already is set to SRID 4326, but we’ll use the flag as a precaution.
The next two options focus on converting the geometry of the dataset. The -S
switch converts any multilinestrings to linestrings. pgRouting is known to have difficulties with multi-geometries and will only consider the first linestring in a multilinestring so it’s best to just do the conversion to avoid any issues that might arise. We’ll also use the -t
switch to set the dimensions of the streets to “2D,” or two dimensions. The reasoning behind this is that sometimes Z or M coordinates are added to indicate three-dimensional data for underpasses, intersections, etc. This can lead to data that’s not noded correctly leading to problems when creating a routing topology. Therefore, by setting the shapefile dimension to “2D”, we’re solving this problem.
The last parts of the command are the shapefile and an optional name of the table that will be created that’ll contain the shapefile data; we’ve called ours “streets.” That name is optional. By default, shp2pgsql
will create a table using the shapefile file name.
For the psql
command, you’ll have to substitute it by entering your ICD PostgreSQL credentials. You can find them by clicking on the Service Credentials link in the left panel and then generating a new credential by clicking on the New Credential button. Once you’ve done that, decode the certificate_base64
certificate and save that somewhere on your system. Then, within composed
, use that command to connect to PostgreSQL and substitute the location of your self-signed certificate with the default location of the PGSSLROOTCERT
.
Once we’ve executed shp2pgsql
, it will take a couple minutes for the shapefile to be converted and imported into PostgreSQL (depending on your internet connection). When it’s finished, you’ll have a table named “streets” in your database.
Now, let’s start exploring the table and preparing the data for pgRouting.
Preparing shapefiles for pgRouting
We have a table called “streets” inside our database, so we’ll now need to set up the table to use it with pgRouting. In the last article, we mentioned that tables containing routes, called edges, need to have “source,” “target,” and “cost” columns (or a column that will hold a cost value) in order to know where nodes begin and end and a meaningful cost to each edge.
Let’s set up three columns: “source,” “target,” and “length” (for cost). We can do that with these SQL commands:
Next, we’ll populate the “source” and “target” columns using pgr_createTopology
. In the last article, we populated these columns ourselves since we created the nodes and edges. With the imported street data, the edges and nodes have been created for us so we’ll let pgRouting figure out the source and target nodes by running:
This function takes the table name “streets,” a tolerance number, the name of the geometry column, and the name id field. We’re using the names geom
and gid
in the function because the default column names the function looks for are id
and the_geom
. As for the tolerance number, it should be as low as possible—the lower the better. The tolerance should be lower than the distance between nodes; otherwise, the topology function will think that some nodes are connected and they’ll be snapped together and we don’t want that otherwise some streets will be connected at the wrong intersections.
Running this function, the “source” and “target” columns will be populated with the source and target nodes for each edge. Additionally, a table called “streets_vertices_pgr” will be created which includes the id and geometry of each node in the “streets” table. This table serves as a lookup for each node that pgRouting goes to when finding the locations of the nodes.
The next column in the “streets” table that needs to be populated is the “length” column. This serves as the cost of each edge. For the length, we’ll use the edge length between the source and target nodes using the geom
column from streets. To do that, we’ll let PostGIS determine the length of each street segment using ST_Length
and casting the geom
column to the geography data type to get the street segment length in meters:
Once, we’ve done that, all the lengths will be inserted into the database and we’ll be set up to do the routing.
What about locations to route to?
Technically, we don’t need point locations to make pgRouting work; all you have to know are the source and target nodes for the origin and destination edges. That means we’ll have to figure out what the source and target nodes are for each street using, for instance, some SQL to find the street name from the “streets” table or by using a geospatial tool like QGIS to find the source and target node ids through a UI.
Instead of trying to figure out the nodes of each street, let’s overlay some locations and use those as source and target nodes. How will we do that?
First, let’s download a CSV file with the locations from a community-driven project by the City of Seattle called My Neighborhood Map. The CSV file has a lot of different points of interest, some of which will not relevant for our use case, such as traffic cameras. Next, create another table called “places.”
Use the following command in the shell to copy the CSV file into the “places” table:
Now we’ll populate the geom
column using PostGIS, set the SRID to 4326 like our streets, and add an index on the column:
Now we’ll update the place_id
column by using the “streets_vertices_pgr” which stored the locations of each node from “streets.” The place_id
column is where we’ll snap the locations to the street nodes using the by associating each street node with a location. Using some SQL, we can search for street nodes that are within a very minimal distance from each location using the PostGIS functionST_DWithin
and specifying an optional distance parameter with a distance of 0.001
meters. This function will populate the place_id
column with the ids of edges that are within 0.001 meters from a location.
You’ll find that some of the places in the table do not have an id associated with them. That’s because they are not within the specified distance to a street node. In this case, we can delete them or just ignore them. To delete them execute:
DELETE FROM places WHERE place_id IS NULL;
Routing between locations
We have streets and locations all set up for routing, and now we’ll use pgRouting to get the shortest routes between locations. If you’ve read the previous article “Getting Started with pgRouting” then you’ll be familiar with how to set up the routes. We’ll be using the same pgRouting routing functions here: pgr_dijkstra
and pgr_kps
.
Let’s say we want to go to the Museum of Flight and then we want to go to a concert at Benaroya Hall. Using pgr_dijkstra
, we’ll set up the query with:
In this query, we’re selecting the source and target nodes from their place_id
using the common_name
column to select them by name. Some names are repeated because they have two or more category names in city_feature
. Therefore, for this data set, you might have to specify the city_feature
, which we’ve done for Benaroya Hall; otherwise, you’ll get the following error when running the above query:
ERROR: more than one row returned by a subquery used as an expression
Executing this query in the shell will give us a long list of nodes with the edge geometries. However, to give this query a little more context, we’ll execute this query in QGIS using our “streets” and “place” tables giving us the route:
Using pgr_ksp
, the set up will be similar to that of pgr_dijkstra
within the function, but now we’ll add an extra parameter to indicate how many routes we want to be returned to the destination. In the following example, we’ll determine the three shortest paths from the Pacific Science Center to the Museum of History and Industry. Set up the query like this:
Again, giving this some context, we’ve put the query in QGIS, and the route looks something like:
You’ll notice that the majority of all three routes overlap. We’ve labeled all three routes in the image above. Number one is the fastest route, number two take another street but merges into number one, and number three takes a small side street off of number one, which is barely visible, but then merges back to the same route as number one.
To get an idea about the distances each route covers, we’ll use psql
. If we modify the query a little to get only the locations that each route goes through and the number of kilometers they have to travel to get from the Pacific Science Center to the Museum of History and Industry, we’d substitute everything in the query except what’s within the pgr_ksp
function:
We’ll be able to see each place that the route travels past and the total distance in kilometers:
Get routing!
In the two articles that have covered pgRouting, we’ve only scratched the surface of what’s available and how the extension can be used in your GIS applications. pgRouting is particularly interesting when using it with OpenStreetMap data, which we’ll cover in the near future. Nevertheless, we encourage you to explore more and do more with pgRouting on IBM Cloud Databases for PostgreSQL.
Learn more
-
Using OpenStreetMap Data with IBM Cloud Databases for PostgreSQL
-
Getting Started with pgRouting on IBM Cloud Databases for PostgreSQL
Enjoyed this article? Get started with Databases for PostgreSQL now.
Databases for PostgreSQL is a fully managed, enterprise-ready PostgreSQL service with built-in security, high availability, and backup orchestration. Learn more here.