GIS data

From Project-GC
Revision as of 11:07, 29 May 2021 by magma1447 (3305483) (talk | contribs) (Initial edit)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What's it used for?

Geocache's region/county is determined by using GIS data. The GIS data consists of polygons representing the borders of such regions and counties. Not all countries have support for this, due to lack of data. All countries which have region support by Geocaching HQ also has region support by Project-GC. Since most of the regions are chosen by the cache owner for the geocaches we are overriding the information from Geocaching.com and using the GIS data instead.

The maps in Profile stats are computer generated maps based on data from OpenStreetMap. The same data is used to overlay other maps at Project-GC with polygons representing the regions and counties.

Source

Project-GC uses OpenStreetMap as a source for this data, for all countries. The exception is Canada which is a mix between OpenStreetMap and some Census data. OSM-Boundaries.com is used as a middle layer between the sites to ease the process.

The update process

Project-GC does not update its GIS data from OpenStreetMap automatically. It's an expensive process and it also requires some manual reviewing. At times data in OpenStreetMap isn't reliable and that needs to be detected before it's imported. OpenStreetMap doesn't always have data that fits the geocaching world straight out of the box either. In many cases polygons needs to be created by joining others, or by subtracting one from another and so forth.

As mentioned OSM-Boundaries.com was created to ease the process. It's a tool to get a good overview of the OpenStreetMap data in an hierarchic way. Meaning that states are inside the country, counties within the states and so on. This relation data is not something that automatically exists in OpenStreetMap. Polygons can however have tags hinting something about their relations, but they are not trustworthy. OSM-Boundaries.com instead calculates the relations based on overlaps.

Since Project-GC doesn't read data directly from OpenStreetMap it needs to wait for the data to be available for it (or actually for OSM-Boundaries.com) in a convenient way. The tech used to updated OSM-Boundaries.com is based on OpenStreetMaps *planet.osm* dumps, which normally are created by OpenStreetMap once per week. Once the dump is available the data within it usually is 2-3 days old, so its assumed that it takes them that long time to compile it.

OSM-Boundaries.com aims to import a new database once per month, but two databases per month can be imported upon need. Importing a new database into OSM-Boundaries.com requires a lot of resources, and a single import requires more than 21 days of processing, therefore they can't be imported more often than twice per month. However, most of the data is accessible before the import is fully completed, already after a few days most data will be available for download.

An import in OSM-Boundaries.com is very resource heavy. A raid of SSDs are working under heavy load while it's consuming over 50 GB of RAM and having 10 CPU cores maxing out. During the import close to 1 terabyte of data is being handled, once the import is complete most of the data can be dropped though.

Since OpenStreetMap is based on community contributions it does happen that data in it breaks. This is another reason to why the import process can't be fully automated. If for example a county polygon suddenly isn't a closed loop (has a gap somewhere) it can't be imported into OSM-Boundaries.com, and therefore won't end up in Project-GC. *Self-intersects* in polygons are causing similar issues. Due to these reasons OSM-Boundaries.com doesn't only provide its users with the latest OpenStreetMap *planet.osm* extract, but it makes multiple versions available. This way one can also look into the history of the polygons from OpenStreetMap.

With all the above in mind, Project-GC usually won't update polygon data without reason to. If you know of major changes in a country you can inform Project-GC Support about it they can look into updating the country's data. It does however require that OpenStreetMap has the newer data. Please understand that there is some turnaround time. From the fact that OpenStreetMap is updated it's up to one week for it to end up in the Planet.osm dumps, then a few weeks until OSM-Boundaries.com imports that, and so on. It's unlikely that OpenStreetMap changes will end up in Project-GC in less than a month, even though it's prioritized.