IBM Analytics Engine and Watson Studio now support a wide variety of geospatial functions as native components of these services. 

The library—developed by IBM Research, which provides geospatial and temporal functions for various other IBM products—joins these two cloud services. With the introduction of this feature, Watson Studio (in its Spark environments) and Analytics Engine support complex geospatial functions and expand data science to location analytics.

Key features of spatiotemporal functions

  • Geodetic function support—all functions are accurate for all geometries without the need for projections, including large geometries like entire countries or hemispheres and geometries that are near the poles or the anti-meridian.
  • With Analytics Engine and Watson Studio Spark environment, all the geospatial functions are fully distributable and can take advantage of Spark’s native distributed processing capabilities. This improves overall performance.
  • Extensions of Spark distributed joins to performing spatial, temporal, and spatiotemporal joins.
  • Native geohash support for arbitrary geometries that can be used for simple aggregations and object storage spatial indexing, improving cloud storage retrieval.
  • SQL/MM extensions to Spark SQL, providing native SQL support for geospatial functions

Why geospatial support?

Location sensors have become first-class citizens in various devices—spanning phones, connected cars, and IoT sensor feeds. Fusing geographic feature data (e.g., zipcode polygons, address features) with these location sensor data is key to extracting and enriching with contextual information. The addition of such a location context to most of the enterprise problems provide valuable business insights.

Geospatial information, either by itself or in combination with traditional relational data, can help institutions and businesses do things like decide in which areas to provide services or determine the locations of possible markets. For example:

  • The manager of a county welfare district can verify which welfare applicants and recipients actually live within the area that the district services. This can be done by analyzing the geometry of the service area and the addresses of the applicants and recipients.
  • The owner of a restaurant chain wants to open new restaurants in nearby cities and needs the answer to such questions as: 
    • Where in these cities are concentrations of the types of people who typically frequent restaurants like mine? 
    • Where are the major highways? 
    • Where is the crime rate lowest? 
    • Where are competing restaurants located?
  • A business analyst in a health insurance company wants to determine if there are primary care providers within a 15-mile driving distance of each of their customers. A list of suggested health care providers can be suggested proactively and customized to the needs of the customers.

What does it do?

Our geospatial functions include, but are not limited to, the following:

  • Topological functions, per the DE-9IM standard
  • Metric functions, distance, azimuth, etc.
  • SQL/MM extensions to Spark SQL
  • Spatial indexing
  • WKT and GeoJSON support
  • Spatial, temporal, and spatiotemporal joins
  • Distributed spatial functions

Get started

This functionality is available in the Spark Python and Spark Scala environments on Watson Studio and Analytics Engine clusters. You can get started by going over the sample notebooks for Spatial and Spatial Index.

More from Analytics

Announcing Control-M integration with IBM Databand for holistic data observability

2 min read - IBM® Databand® is designed to support the hybrid and multicloud data landscape and work with any orchestration, data integration or workflow automation tool. In the quest to bring all your monitoring data under one roof, Databand enables tighter integration with cloud and on-prem applications. Last time, we announced the Databand integration with Azure ADF, and this time it’s the integration with BMC Control-M. IBM Databand acts as a magnifying glass for your Control-M workflows, providing a more comprehensive understanding of…

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters