<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://project-gc.com/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=6041005</id>
	<title>Project-GC - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://project-gc.com/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=6041005"/>
	<link rel="alternate" type="text/html" href="https://project-gc.com/w/Special:Contributions/6041005"/>
	<updated>2026-04-23T01:58:14Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.3</generator>
	<entry>
		<id>https://project-gc.com/wiki/index.php?title=Data_synchronization&amp;diff=857</id>
		<title>Data synchronization</title>
		<link rel="alternate" type="text/html" href="https://project-gc.com/wiki/index.php?title=Data_synchronization&amp;diff=857"/>
		<updated>2020-11-12T12:53:49Z</updated>

		<summary type="html">&lt;p&gt;6041005: some minor edits for British English style&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{FIXME|reason=Work in progress}}&lt;br /&gt;
&lt;br /&gt;
== Origin of data ==&lt;br /&gt;
[[Project-GC]]&#039;s statistics are based on data from [[Geocaching.com]]. The data is fetched via the [[Geocaching LIVE api]]. [[Project-GC]] uses a combination of the official API available to general [[Geocaching Partners]] and a private &#039;&#039;Enterprise API&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
The data is still owned by [[Geocaching HQ]] and [[Project-GC]] pays royalties to be allowed to fetch and use the data like it does.&lt;br /&gt;
&lt;br /&gt;
== Fetching of data ==&lt;br /&gt;
Geocaching data is continuously being updated in [[Project-GC]]&#039;s databases via the mentioned API. This is done using several parallel methods:&lt;br /&gt;
* Continuously fetching newly published [[Geocache]]s. Normal latency 0-35 seconds.&lt;br /&gt;
* Continuously updating finds for [[Geocacher]]s. See [[Data synchronization#Refreshing_profiles|Refreshing profiles]] for more details.&lt;br /&gt;
* Regular updates of [[Geocache]] information. See [[Data synchronization#Refreshing geocache data|Refreshing geocache data]] for more details.&lt;br /&gt;
&lt;br /&gt;
== Refreshing profiles ==&lt;br /&gt;
Geocaching profiles are continuously being updated based on a rule set in the background.&lt;br /&gt;
&lt;br /&gt;
When refreshing Geocaching profiles all new and updated logs are fetched from the API, and also metadata around the Geocaching profile itself, like for example the name of the [[Geocacher]].&lt;br /&gt;
=== Rules ===&lt;br /&gt;
* [[Geocacher]]s that haven&#039;t been refreshed in 24 hours and are [[Paid membership|paying members]] are being refreshed.&lt;br /&gt;
* [[Geocacher]]s that haven&#039;t been refreshed in 2 days but used [[Project-GC]]&#039;s website the last week are being refreshed.&lt;br /&gt;
* [[Geocacher]]s that haven&#039;t been refreshed in 7 days but used [[Project-GC]]&#039;s website the last 3 months are being refreshed.&lt;br /&gt;
* [[Geocacher]]s whose [[Profile stats]] page has been viewed recently.&lt;br /&gt;
* ... and so on, there are ~10 rules like this in place.&lt;br /&gt;
&lt;br /&gt;
Also whenever a user visits the website of [[Project-GC]]s the system checks when that user was last updated. If it&#039;s a paying member there will be yet another update if the data is more than 1 hour old. However, this is a job that gets queued in the background and it might take a few minutes until it kicks in.&lt;br /&gt;
&lt;br /&gt;
A [[Geocache]] with new logs will also get scheduled to have its metadata (difficulty, terrain, size, country ...) updated.&lt;br /&gt;
&lt;br /&gt;
It can be noted that a [[Geocacher]] who is a [[Paid membership|paying member]] is more likely to have up-to-date data at [[Project-GC]] when coming back to the site, while freemium users might notice that their data isn&#039;t as fresh.&lt;br /&gt;
&lt;br /&gt;
== Refreshing geocache data ==&lt;br /&gt;
The refreshing of Geocaching profiles mentioned above makes sure that [[Project-GC]]&#039;s users have frequently updated data, but some [[Geocache]]s might be left out because they have only been logged by [[Geocacher]]s not very active with [[Project-GC]]. Therefore [[Project-GC]] also refreshes [[Geocache]]s.&lt;br /&gt;
&lt;br /&gt;
The frequency of when [[Geocache]]s are refreshed are based on a mix of several variables, like for example:&lt;br /&gt;
* Last found&lt;br /&gt;
* Hidden date&lt;br /&gt;
* Disabled/Archived state&lt;br /&gt;
* Number of logs&lt;br /&gt;
&lt;br /&gt;
When a [[Geocache]] gets refreshed metadata about the [[Geocache]] gets updated in the database. Also all new/updated logs are updated in the system.&lt;br /&gt;
&lt;br /&gt;
== Log data ==&lt;br /&gt;
The log entries fetched are the same, regardless if they are fetched from Geocaching profiles or from Geocache data. It&#039;s just different approaches/angles to retrieve the same information. If Geocaching profile X has found Geocache GCX, then GCX automatically also has a log from Geocaching profile X.&lt;br /&gt;
&lt;br /&gt;
== Databases ==&lt;br /&gt;
Now most [[Geocache]] data and logs exists in the primary database cluster and are fairly up-to-date. Most of the data will only be hours old in the database, but a fair share is expected to be 24-36 hours old.&lt;br /&gt;
&lt;br /&gt;
However [[Project-GC]] has more than one database cluster. Most statistics on the web are created based on data from another database cluster. The primary database cluster replicates its data every 4 hours o the second cluster, adding an additional ~4 hours of latency. As an example most top lists are based on this secondary cluster, while [[Profile stats]] are not.&lt;br /&gt;
&lt;br /&gt;
As a technical note, the primary database cluster is a row-based relational database. The secondary cluster is created and meant for database harvesting and is a column-oriented DBMS.&lt;br /&gt;
&lt;br /&gt;
== Statistics ==&lt;br /&gt;
As mentioned in [[Data synchronization#Databases|Databases]], most statistics are based on a secondary database cluster. Even if [[Project-GC]] itself has up-to-date profile data the secondary database cluster might still have the old data. This is due to the fact that a full replica of the data-set is copied every four hours.&lt;br /&gt;
&lt;br /&gt;
Most top lists are using this secondary cluster. Basically everything that gets heavily computed (data harvesting) uses the secondary cluster, while more raw fetches use the primary one. [[Profile stats]] is an exception.&lt;br /&gt;
&lt;br /&gt;
It&#039;s also worth mentioning here as well that [[Lab caches]] generally aren&#039;t included in statistics. They aren&#039;t technically compatible and it would be very complex to do this. Again, [[Profile stats]] can be an exception.&lt;br /&gt;
&lt;br /&gt;
Finally, all generated statistics are [[Data caching|cached]]. The period for which it is cached varies but anything between 5 minutes and 1 hour is very common. So if two people ask for the same statistic, it will only be computed the first time, the second person will receive a cached version. This has the downside that it potentially also adds more latency and shows older data in some cases.&lt;br /&gt;
&lt;br /&gt;
== Profile stats ==&lt;br /&gt;
* Generated from &#039;&#039;current state&#039;&#039; at the date written in the header.&lt;br /&gt;
* Labs included depending on setting and paying/freemium.&lt;br /&gt;
* Cached for 7 days for freemium, 24h for paying.&lt;br /&gt;
* Caching is not shared between the user the data is about and other viewers.&lt;br /&gt;
* Based on the primary database cluster.&lt;br /&gt;
&lt;br /&gt;
== Challenge checkers ==&lt;br /&gt;
Challenge checkers use a mix of the different database clusters. If the [[Checker script]] fetches the user&#039;s finds from [[Project-GC]] it will use the primary database cluster. But some other more advanced API methods might use the secondary database cluster instead. This is for performance reasons.&lt;br /&gt;
&lt;br /&gt;
== Special numbers ==&lt;br /&gt;
Some statistics are very special since they are based on pre-calculated values. This is usually because it would be extremely hard to calculate this in real-time, therefore it&#039;s based on pre-calculated values. [[Project-GC]] calculates the following data for every [[Geocacher]] in the background on a regular basis. Normally daily, but there are some exceptions to it (usually that some users gets calculated more often).&lt;br /&gt;
* D/T loops&lt;br /&gt;
* streaks&lt;br /&gt;
* calendar loops&lt;br /&gt;
&lt;br /&gt;
This does not affect [[Profile stats]]. If numbers like these are needed, [[Profile stats]] calculates them itself instead of using pre-calculated data. This is also needed since it may have [[Labcache]] data merged.&lt;/div&gt;</summary>
		<author><name>6041005</name></author>
	</entry>
</feed>