Difference between revisions of "GIS data"

From Project-GC
Jump to: navigation, search
m (Changed "we" to "Project-GC". Changed assumed markdown formatting to corresponding wikiformatting.)
(grammar, subheading)
Line 1: Line 1:
 
== What's it used for? ==
 
== What's it used for? ==
Geocache's region/county is determined by using GIS data. The GIS data consists of polygons representing the borders of such regions and counties. Not all countries have support for this, due to lack of data. All countries which have region support by [[Geocaching HQ]] also has region support by [[Project-GC]]. Since most of the regions are chosen by the [[cache owner]] for the [[geocache]]s Project-GC is overriding the information from [[Geocaching.com]] and using the GIS data instead.
+
All geocaches's region and county information is determined by using GIS data. The GIS data consists of polygons representing the borders of such regions and counties. Not all countries have support for this since we don't yet have map data for all countries. All countries which have region support at [[Geocaching HQ]] also have region support at [[Project-GC]]. Since most of the regions are chosen by the [[cache owner]] for the [[geocache]]s, Project-GC is overriding the information from [[Geocaching.com]] and using the GIS data instead.
  
 
The [[Maps_tab|maps]] in [[Profile_Stats|Profile stats]] are computer generated maps based on data from [[OpenStreetMap]]. The same data is used to overlay other maps at [[Project-GC]] with polygons representing the regions and counties.
 
The [[Maps_tab|maps]] in [[Profile_Stats|Profile stats]] are computer generated maps based on data from [[OpenStreetMap]]. The same data is used to overlay other maps at [[Project-GC]] with polygons representing the regions and counties.
Line 10: Line 10:
 
Project-GC does not update its GIS data from [[OpenStreetMap]] automatically. It's an expensive process and it also requires some manual reviewing. At times data in [[OpenStreetMap]] isn't reliable and that needs to be detected before it's imported. [[OpenStreetMap]] doesn't always have data that fits the geocaching world straight out of the box either. In many cases polygons needs to be created by joining others, or by subtracting one from another and so forth.
 
Project-GC does not update its GIS data from [[OpenStreetMap]] automatically. It's an expensive process and it also requires some manual reviewing. At times data in [[OpenStreetMap]] isn't reliable and that needs to be detected before it's imported. [[OpenStreetMap]] doesn't always have data that fits the geocaching world straight out of the box either. In many cases polygons needs to be created by joining others, or by subtracting one from another and so forth.
  
As mentioned [[OSM-Boundaries.com]] was created to ease the process. It's a tool to get a good overview of the [[OpenStreetMap]] data in an hierarchic way. Meaning that states are inside the country, counties within the states and so on. This relation data is not something that automatically exists in [[OpenStreetMap]]. Polygons can however have tags hinting something about their relations, but they are not trustworthy. [[OSM-Boundaries.com]] instead calculates the relations based on overlaps.
+
As mentioned, [[OSM-Boundaries.com]] was created to ease this process. It's a tool to get a good overview of the [[OpenStreetMap]] data in an hierarchic way. This meant that states are inside the country, counties within the states and so on. This relation data is not something that automatically exists in [[OpenStreetMap]]. Polygons can however have tags hinting something about their relations, but they are not trustworthy. [[OSM-Boundaries.com]] instead calculates the relations based on overlaps.
  
Since Project-GC doesn't read data directly from [[OpenStreetMap]] it needs to wait for the data to be available for it (or actually for OSM-Boundaries.com) in a convenient way. The tech used to updated [[OSM-Boundaries.com]] is based on [[OpenStreetMap]]s ''planet.osm'' dumps, which normally are created by [[OpenStreetMap]] once per week. Once the dump is available the data within it usually is 2-3 days old, so its assumed that it takes them that long time to compile it.
+
Since Project-GC doesn't read data directly from [[OpenStreetMap]] it needs to wait for the data to be available for it (or actually for OSM-Boundaries.com) in a convenient way. The tech used to update [[OSM-Boundaries.com]] is based on [[OpenStreetMap]]'s [[Planet.osm]] dumps, which normally are created by [[OpenStreetMap]] once per week. Once the dump is available the data within it usually is 2-3 days old, so it's assumed that it takes them that long time to compile it.
  
[[OSM-Boundaries.com]] aims to import a new database once per month, but two databases per month can be imported upon need. Importing a new database into [[OSM-Boundaries.com]] requires a lot of resources, and a single import requires more than 21 days of processing, therefore they can't be imported more often than twice per month. However, most of the data is accessible before the import is fully completed, already after a few days most data will be available for download.
+
[[OSM-Boundaries.com]] aims to import a new database once per month, but two databases per month can be imported upon need. Importing a new database into [[OSM-Boundaries.com]] requires a lot of resources, and a single import requires more than 21 days of processing, therefore they can't be imported more often than twice per month. However, most of the data is accessible before the import is fully completed; already after a few days most data will be available for download.
  
An import in [[OSM-Boundaries.com]] is very resource heavy. A raid of SSDs are working under heavy load while it's consuming over 50 GB of RAM and having 10 CPU cores maxing out. During the import close to 1 terabyte of data is being handled, once the import is complete most of the data can be dropped though.
+
An import into [[OSM-Boundaries.com]] is very resource heavy. A raid of SSDs are working under heavy load while it's consuming over 50 GB of RAM and having 10 CPU cores maxing out. During the import close to 1 terabyte of data is being handled, once the import is complete most of the data can be dropped though.
  
 
Since [[OpenStreetMap]] is based on community contributions it does happen that data in it breaks. This is another reason to why the import process can't be fully automated. If for example a county polygon suddenly isn't a closed loop (has a gap somewhere) it can't be imported into [[OSM-Boundaries.com]], and therefore won't end up in [[Project-GC]]. ''Self-intersects'' in polygons are causing similar issues. Due to these reasons [[OSM-Boundaries.com]] doesn't only provide its users with the latest [[OpenStreetMap]] ''planet.osm'' extract, but it makes multiple versions available. This way one can also look into the history of the polygons from [[OpenStreetMap]].
 
Since [[OpenStreetMap]] is based on community contributions it does happen that data in it breaks. This is another reason to why the import process can't be fully automated. If for example a county polygon suddenly isn't a closed loop (has a gap somewhere) it can't be imported into [[OSM-Boundaries.com]], and therefore won't end up in [[Project-GC]]. ''Self-intersects'' in polygons are causing similar issues. Due to these reasons [[OSM-Boundaries.com]] doesn't only provide its users with the latest [[OpenStreetMap]] ''planet.osm'' extract, but it makes multiple versions available. This way one can also look into the history of the polygons from [[OpenStreetMap]].
  
With all the above in mind, [[Project-GC]] usually won't update polygon data without reason to. If you know of major changes in a country you can inform [[Project-GC Support]] about it they can look into updating the country's data. It does however require that [[OpenStreetMap]] has the newer data. Please understand that there is some turnaround time. From the fact that [[OpenStreetMap]] is updated it's up to one week for it to end up in the [[Planet.osm]] dumps, then a few weeks until [[OSM-Boundaries.com]] imports that, and so on. It's unlikely that [[OpenStreetMap]] changes will end up in [[Project-GC]] in less than a month, even though it's prioritized.
+
=== Update process timeframe ===
 +
With all the above in mind, [[Project-GC]] usually won't update polygon data without reason to. If you know of major changes in a country you can inform [[Project-GC Support]] about it so they can look into updating that country's data. It does however require that [[OpenStreetMap]] has the newer data. Please understand that there is some turnaround time. From that [[OpenStreetMap]] is updated it's up to one week for it to end up in the [[Planet.osm]] dumps, then a few weeks until [[OSM-Boundaries.com]] imports that, and so on. It's unlikely that [[OpenStreetMap]] changes will end up in [[Project-GC]] in less than a month, even though it's prioritized.

Revision as of 15:15, 29 May 2021

What's it used for?

All geocaches's region and county information is determined by using GIS data. The GIS data consists of polygons representing the borders of such regions and counties. Not all countries have support for this since we don't yet have map data for all countries. All countries which have region support at Geocaching HQ also have region support at Project-GC. Since most of the regions are chosen by the cache owner for the geocaches, Project-GC is overriding the information from Geocaching.com and using the GIS data instead.

The maps in Profile stats are computer generated maps based on data from OpenStreetMap. The same data is used to overlay other maps at Project-GC with polygons representing the regions and counties.

Source

Project-GC uses OpenStreetMap as a source for this data, for all countries. The exception is Canada which is a mix between OpenStreetMap and some Census data. OSM-Boundaries.com is used as a middle layer between the sites to ease the process.

The update process

Project-GC does not update its GIS data from OpenStreetMap automatically. It's an expensive process and it also requires some manual reviewing. At times data in OpenStreetMap isn't reliable and that needs to be detected before it's imported. OpenStreetMap doesn't always have data that fits the geocaching world straight out of the box either. In many cases polygons needs to be created by joining others, or by subtracting one from another and so forth.

As mentioned, OSM-Boundaries.com was created to ease this process. It's a tool to get a good overview of the OpenStreetMap data in an hierarchic way. This meant that states are inside the country, counties within the states and so on. This relation data is not something that automatically exists in OpenStreetMap. Polygons can however have tags hinting something about their relations, but they are not trustworthy. OSM-Boundaries.com instead calculates the relations based on overlaps.

Since Project-GC doesn't read data directly from OpenStreetMap it needs to wait for the data to be available for it (or actually for OSM-Boundaries.com) in a convenient way. The tech used to update OSM-Boundaries.com is based on OpenStreetMap's Planet.osm dumps, which normally are created by OpenStreetMap once per week. Once the dump is available the data within it usually is 2-3 days old, so it's assumed that it takes them that long time to compile it.

OSM-Boundaries.com aims to import a new database once per month, but two databases per month can be imported upon need. Importing a new database into OSM-Boundaries.com requires a lot of resources, and a single import requires more than 21 days of processing, therefore they can't be imported more often than twice per month. However, most of the data is accessible before the import is fully completed; already after a few days most data will be available for download.

An import into OSM-Boundaries.com is very resource heavy. A raid of SSDs are working under heavy load while it's consuming over 50 GB of RAM and having 10 CPU cores maxing out. During the import close to 1 terabyte of data is being handled, once the import is complete most of the data can be dropped though.

Since OpenStreetMap is based on community contributions it does happen that data in it breaks. This is another reason to why the import process can't be fully automated. If for example a county polygon suddenly isn't a closed loop (has a gap somewhere) it can't be imported into OSM-Boundaries.com, and therefore won't end up in Project-GC. Self-intersects in polygons are causing similar issues. Due to these reasons OSM-Boundaries.com doesn't only provide its users with the latest OpenStreetMap planet.osm extract, but it makes multiple versions available. This way one can also look into the history of the polygons from OpenStreetMap.

Update process timeframe

With all the above in mind, Project-GC usually won't update polygon data without reason to. If you know of major changes in a country you can inform Project-GC Support about it so they can look into updating that country's data. It does however require that OpenStreetMap has the newer data. Please understand that there is some turnaround time. From that OpenStreetMap is updated it's up to one week for it to end up in the Planet.osm dumps, then a few weeks until OSM-Boundaries.com imports that, and so on. It's unlikely that OpenStreetMap changes will end up in Project-GC in less than a month, even though it's prioritized.