GIS data

From Project-GC
Revision as of 16:15, 29 May 2021 by pinkunicorn (254729) (talk | contribs) (grammar, subheading)
Jump to: navigation, search

What's it used for?

All geocaches's region and county information is determined by using GIS data. The GIS data consists of polygons representing the borders of such regions and counties. Not all countries have support for this since we don't yet have map data for all countries. All countries which have region support at Geocaching HQ also have region support at Project-GC. Since most of the regions are chosen by the cache owner for the geocaches, Project-GC is overriding the information from Geocaching.com and using the GIS data instead.

The maps in Profile stats are computer generated maps based on data from OpenStreetMap. The same data is used to overlay other maps at Project-GC with polygons representing the regions and counties.

Source

Project-GC uses OpenStreetMap as a source for this data, for all countries. The exception is Canada which is a mix between OpenStreetMap and some Census data. OSM-Boundaries.com is used as a middle layer between the sites to ease the process.

The update process

Project-GC does not update its GIS data from OpenStreetMap automatically. It's an expensive process and it also requires some manual reviewing. At times data in OpenStreetMap isn't reliable and that needs to be detected before it's imported. OpenStreetMap doesn't always have data that fits the geocaching world straight out of the box either. In many cases polygons needs to be created by joining others, or by subtracting one from another and so forth.

As mentioned, OSM-Boundaries.com was created to ease this process. It's a tool to get a good overview of the OpenStreetMap data in an hierarchic way. This meant that states are inside the country, counties within the states and so on. This relation data is not something that automatically exists in OpenStreetMap. Polygons can however have tags hinting something about their relations, but they are not trustworthy. OSM-Boundaries.com instead calculates the relations based on overlaps.

Since Project-GC doesn't read data directly from OpenStreetMap it needs to wait for the data to be available for it (or actually for OSM-Boundaries.com) in a convenient way. The tech used to update OSM-Boundaries.com is based on OpenStreetMap's Planet.osm dumps, which normally are created by OpenStreetMap once per week. Once the dump is available the data within it usually is 2-3 days old, so it's assumed that it takes them that long time to compile it.

OSM-Boundaries.com aims to import a new database once per month, but two databases per month can be imported upon need. Importing a new database into OSM-Boundaries.com requires a lot of resources, and a single import requires more than 21 days of processing, therefore they can't be imported more often than twice per month. However, most of the data is accessible before the import is fully completed; already after a few days most data will be available for download.

An import into OSM-Boundaries.com is very resource heavy. A raid of SSDs are working under heavy load while it's consuming over 50 GB of RAM and having 10 CPU cores maxing out. During the import close to 1 terabyte of data is being handled, once the import is complete most of the data can be dropped though.

Since OpenStreetMap is based on community contributions it does happen that data in it breaks. This is another reason to why the import process can't be fully automated. If for example a county polygon suddenly isn't a closed loop (has a gap somewhere) it can't be imported into OSM-Boundaries.com, and therefore won't end up in Project-GC. Self-intersects in polygons are causing similar issues. Due to these reasons OSM-Boundaries.com doesn't only provide its users with the latest OpenStreetMap planet.osm extract, but it makes multiple versions available. This way one can also look into the history of the polygons from OpenStreetMap.

Update process timeframe

With all the above in mind, Project-GC usually won't update polygon data without reason to. If you know of major changes in a country you can inform Project-GC Support about it so they can look into updating that country's data. It does however require that OpenStreetMap has the newer data. Please understand that there is some turnaround time. From that OpenStreetMap is updated it's up to one week for it to end up in the Planet.osm dumps, then a few weeks until OSM-Boundaries.com imports that, and so on. It's unlikely that OpenStreetMap changes will end up in Project-GC in less than a month, even though it's prioritized.