Data Management Best Practices
Published On: September 25, 2018
There have been amazing changes in the volumes of data that are being processed in the geospatial arena in the past several decades. In my thirty years of developing large GIS software applications, we have gone from processing hundreds of megabytes of data in hours to hundreds of Gigabytes in seconds. While we have seen the hardware and software advance to being able to consume and process these monster datasets, we have not always seen the investment in providing appropriate data management for the sometimes as much as petabytes of data.
Managing Your Geospatial Data
Managing massive amounts of geospatial data can certainly be an intimidating task. With today's scanners one is able to collect more than 30 points of information per meter, and this for miles and miles of a corridor. Over time as one is continuing to collect these large datasets, the ability to know what you have and how it relates to any active or historic project can easily be lost. It becomes quickly clear that not providing adequate data management effectively wastes the data collection effort and creates a wilderness of unknown or unfindable data.
GIS's handle this problem routinely for feature data using relational databases that carry both attribute and spatial metadata for each feature and which are spatially indexed and support spatial query operators.
Catalog Your Data
The same approach works perfectly well for cataloging geospatial coverage files. The catalog record built and inserted into the data management system consists of a geographically correct footprint for the file, appropriate metadata for the file, and the physical location of the file. With records containing this information inserted in a spatially enabled relational database and a performant, secure, scalable storage location, one has the primary components of a proper data management system for geospatial coverage data.
User Friendly Data Management Portal
A necessary addition to the data management system is a portal for interrogating, selecting and delivering data from the system to the end user. Such a portal needs to provide a geographical (map) view of the database, as a primary criteria for selecting data is to specify the geo-location.
Obvious and necessary components of the metadata include identification of the originating entity (E.G., contractor, collection agency), the date of acquisition and some measure of accuracy, as well as some indication of the level of processing done.
Protect Your Data
Once we have moved the data into a place where we can easily find it and retrieve it, we immediately become concerned about security. Source data and the derived analytics are key assets for an organization and must be protected from unauthorized access.
Immediately Access Your Data
Having immediate access to the single source, authoritative version of data from any place at any time is no longer a luxury, it is a requirement. Certainly with the tremendous advances in Cloud-resident solutions for storage of data, security and high performant access, moving to Cloud storage as the primary data source is becoming more and more attractive. Many companies are beginning to look closely at the escalating costs they are incurring to support the data management, storage, access and security associated with these huge volumes of data. Migration to the Cloud for these functions has been on the rise for the past five years and is accelerating.
The Superior Solution
The intent, of course, of collecting this highly accurate and current data is to enable us to more clearly understand our business and our world. Those that are able to make full use of these valuable assets will be far ahead of their competition in providing superior solutions that are accurate, current, and well fit to the problem set.