Advanced

Re: Rewrite of GetOldest()

Rewrite of GetOldest()
February 14, 2017 01:56PM
Let me first explain how GetOldest() work right now.

Simply put, it's a
SELECT * FROM geocaches ORDER BY hidden, cacheId LIMIT x
In other words, for those who don't speak SQL. It fetches limit geocaches ordered by the hidden date, and then the cache-id.

The problem here is that the max limit is 100 (actually it's 1000, newly changed). Since a challenge must include archived geocaches this causes an issue. If the challenge is to log 20 of the oldest 100, it might be that not all active caches are included. First for the reason that there might be too many archived caches, pushing the active ones outside the limit. It can be assumed that we actually want the 100 oldest that are active, and those older than the newest one. The second issue is that it might be expected that all geocaches from 2004-01-02 is included if one of them are. They are not, since we are cutting hard on the limit.


I have now made a new implementation, which is not released yet. I am looking for feedback.

The new approach is to first figure out the oldest hidden date that falls into the filter. In SQL it's more like
SELECT MAX (hidden) FROM geocaches ORDER BY hidden LIMIT x

And then we make a second run which is then
SELECT * FROM geocaches WHERE hidden <= maxHidden ORDER BY hidden
(no limit).


So, the question is. Is there anyone who think this approach will be a problem with any current scripts? It seems like some checkers are broken due to an API that has unexpected behaviours. As I see it it can only fix things, though it might change things as well.

If there are any objections, I would like to hear the argument. Based on that we can decide if we will move all current scripts to an obsolete API method or I can just overwrite the current one, or maybe keep both in parallell.



Edited 3 time(s). Last edit at 02/14/2017 02:35PM by ganja1447. (view changes)
Re: Rewrite of GetOldest()
February 14, 2017 02:31PM
I don't see an immediate problem (but I haven't written any checker with that method).
Re: Rewrite of GetOldest()
February 14, 2017 03:15PM
I don't see an obvious problem with that. The checkers I wrote should just loop through the data provided.

That bump from 100 to 1000ish may also help performance in some of my scripts. It will pull in more regions and or countries and keep me from looping as much to get more regions.

The nagging question is does this impact speed negatively. Some of my scripts are on the edge of timing things out.
Re: Rewrite of GetOldest()
February 14, 2017 03:52PM
sloth96 Wrote:
-------------------------------------------------------
> I don't see an obvious problem with that. The
> checkers I wrote should just loop through the data
> provided.
>
> That bump from 100 to 1000ish may also help
> performance in some of my scripts. It will pull
> in more regions and or countries and keep me from
> looping as much to get more regions.
>
> The nagging question is does this impact speed
> negatively. Some of my scripts are on the edge of
> timing things out.


It could definitely affect performance. Generally I would say that it's almost half the speed. We are going from one SQL query to two, and they are almost identical.

If you can find me an (or a few) example (link + name), I can make a few test runs to see the difference in time. It's easy for me to compare the development environment with the live servers. It's definitely interesting to know if this will become an issue.
Re: Rewrite of GetOldest()
February 14, 2017 04:22PM
It doesn't seem as bad as I thought. I ran a few test cases.

Old code: 13, 16, 9.8, 9.5, 9.8 seconds. Average: 11.62 seconds
New code: 18, 27, 14, 19, 18 seconds. Average: 19.2 seconds.

65-66% longer execution time for this particular case. On the other hand, the numbers are so varying that I think it could be a lot more accurate using 2*100 runs.
Re: Rewrite of GetOldest()
February 14, 2017 06:26PM
http://project-gc.com/Challenges/GC6K9P6/23464

This might timeout excessively with the new get oldest.

My username sloth96 or RetiredGuy shoukd take a while.
Re: Rewrite of GetOldest()
February 14, 2017 08:48PM
It seems to be extremely slow. About 100% slower with our new api method.

I don't exactly understand the approach though. I started with writing a simple pseudo code that does only this, but after I was done I started to run the SQL queries the api methods would run. Seems like it would take ~50*3.5*2 seconds = 350 seconds.

It's surprisingly slow to figure out the oldest geocache in a US state. 3-4 seconds for California, Texas and Ohio, per state. However, I can make a 10 fold optimization for this.

Not sure how familiar you are with databases, but Project-GC is using both row-based dbms and column based. A column based dbms is much much faster in certain queries, especially those not needing all the data for a row. This is such case. The downside is that our column based dbms is a replica from our row-based, created every 4th (iirc) hour. So the data is always older. But for this specific api method it feels like we don't care if it's even two days old.

Some timings for linked challenged, user sloth96:
Old code = 1.8e+2 seconds
New code = 3.1e+2 seconds
New code using columnar dbms = 5.0 seconds (after some reconfiguring of the engine as well)

With these changes I also assume the script could be rewritten to be extremely more straight forward?

PS! No changes are live yet.
Re: Rewrite of GetOldest()
February 14, 2017 09:04PM
I am not sure what you mean by extremely more straight forward. Are you saying it needs more comments?

The code is somewhat convoluted to try to get the runtime down. It has been a while since I wrote it so I don't remember all the nastiness at the time. It took a few tries to speed it up to where it is. Timeouts were quite common.Frankly I expected a nasty gram that the checker was too intensive.

I did put in some other requests to speed it up. They were going to be convoluted as well.

If the runtime is really 5 seconds then provide me a goal for a rewrite and I will try to meet it. On the flip side, part of me says it ain't broke so don't fix it. Part of me also realizes the reporting could improve. Sorts were eliminated and the like. Not all states get listed unless someone has visited them.

Just as an aside, there is a corresponding oldest in n country checker that is equally ugly.
Re: Rewrite of GetOldest()
February 14, 2017 10:02PM
I am sorry if I were to heavy on you. Using the word extremely probably sounded like I didn't like your code, which isn't the case. Most important, I didn't mean to offend you.

Since I took a look at your code I understand that it's a bit cumbersome because you have been fighting performance issues. It's of course great that you found a way around that and I appreciate that part.

My thought were more that if the new api method will be more than 10 times faster, maybe a simpler, dummer, but easier to read script would be better for the future, if/when script changes are required. It's definitely not necessary to change what's already in use, and as long as you understand your code I don't care.

But for myself, it would be easier to write it from scratch than understanding your optimizations, if I needed a script with slight modifications. I do not mean this as criticism, but IF the new api method will be fast enough, it would be quite fast to write a script with a fairly dumb approach. I am thinking something like this (pseudocode using mixed languages):
finds = GetFinds()
foundGccodes = []
foreach(finds as find) {
  if foundGccodes[find.gccode] == nil {
    foundGccodes[find.gccode] = find
  }
}

states = conf.states // all 50 of them
statesCompleted = 0
foreach(states as state) {
  params = {limit: 1, excludeDisabled: true, excludeArchived: true, country = 'US', region: state}
  oldest = GetOldest(params)
  foreach(oldest as old) {
	if(isset(foundGccodes[old])) {
	  statesCompleted++
	}
  }
}

if(statesCompleted >= 10) {
  ok = true
} else {
  ok = false
}

I would expect this approach to take 50*0.3*2 seconds. But that might be too long as well. Sounds like 30 seconds which would be much slower than your approach.

In fact, I never understood your approach really. But it doesn't matter, I don't need to. And I didn't really spend time enough to actually understand it either.

Anyway. I didn't mean for you to actually rewrite it. I more meant to present the possibility that a script without hard to understand optimizations might work out after the method changes. I have no intention to criticize any code from anyone, as long as it works as it should.
Re: Rewrite of GetOldest()
February 15, 2017 12:40AM
Outline for the existing approach for regions goes something like this:

Get all finds in the relevant area (USA)

Extract list of regions from finds needed to score the finds
No need to expend resources where the cacher has not not cached.

Get the oldest caches in the USA. (This will contain some number of states oldest caches. More than one.)
Mingo, GC12, Beverly, The spot, Beaver Cache, Camels Prairie cache are all in the oldest 100 non archived caches. This gets atleast 6 states without trying really hard. 1000 might have helped find more states. Maybe not.

Check for states where the oldest has not been found.

Loop over states missing from list of finds one at a time changing the filter to one of the states.
(This is where the request in the other board would have helped. The hope was again to multiple states in a single call by filtering with a list of regions instead of one at a time.)
Repeat until bored or we are not finding any more.

Build table and report results.

There are a whole bunch of details such as check for caches that have bogus placed dates and other configuration options.

Also LUA is not on my top 10 programming languages. You might see some style items from other languages. You might also see the thrashings of a madman trying to get something to work and when it started working well enough, leave it alone, I might break it.
Re: Rewrite of GetOldest()
February 15, 2017 09:33AM
Ah I get it. Generally most likely smart to use GetOldest() on every state there is. The downside is that those really hardcore geocachers won't benefit from that. They will most likely need almost every state anyway. But in average it will definitely be faster.
Re: Rewrite of GetOldest()
February 14, 2017 09:13PM
Btw, I hope this was the sort of feedback you were seeking.

Part of me feels that I am whining when there are bigger fish to fry.
Re: Rewrite of GetOldest()
February 14, 2017 09:50PM
I did not read a single line of wine (edit: I of course mean whine, though I didn't see any wine either), so we are all good.

This is exactly the form of feedback I am looking for. If a performance decrease will ruin most checkers using this API method we need to rethink it, and we rather catch it before it's done. Your feedback made me consider using our columnar based dbms instead, and that improvement feel quite good.



Edited 2 time(s). Last edit at 02/14/2017 10:03PM by ganja1447. (view changes)
Re: Rewrite of GetOldest()
February 15, 2017 12:03AM
Just a couple of other variants for your benchmarking pleasure.

http://project-gc.com/Challenges/GC6K6ZV/21974
Run with RetiredGuy or some other world traveller. It does a search over oldest by countries.

http://project-gc.com/Challenges/GC6JF6T/23191
No recommendation. It does a search over oldest by type. But only in a small part of the world.
Re: Rewrite of GetOldest()
February 15, 2017 09:46AM
sloth96 Wrote:
-------------------------------------------------------
> Just a couple of other variants for your
> benchmarking pleasure.
>
> http://project-gc.com/Challenges/GC6K6ZV/21974
> Run with RetiredGuy or some other world traveller.
> It does a search over oldest by countries.
Current live code: 21, 28, 22
Development code: 31, 20, 20
Seems to average about the same.

> http://project-gc.com/Challenges/GC6JF6T/23191
> No recommendation. It does a search over oldest
> by type. But only in a small part of the world.
Myself, current live code: 17, 16, 16
Myself, development code: 1.9, 4.1, 1.53
Landmaus (German with second most finds), current live code: 22, 24, 23
Landmaus (German with second most finds), development code: 3.0, 2.8, 8.5
Re: Rewrite of GetOldest()
February 15, 2017 03:21PM
Those column based databases seem to do wonders. I had never worked with any.

You have addressed all my scripts so I foresee no problems.

Thanks for the speedup. I will worry less about nasty grams from the server admins and cachers who are experiencing timeouts.
Re: Rewrite of GetOldest()
February 15, 2017 10:59PM
I will most likely release the new code tomorrow (~12 hours from now) if there hasn't been any objections before that. Then we will see what happens.
Re: Rewrite of GetOldest()
February 16, 2017 12:03PM
Wow. What a difference.
Re: Rewrite of GetOldest()
February 16, 2017 12:55PM
:)

I assume you noticed it was released.
Re: Rewrite of GetOldest()
February 18, 2017 05:36PM
I made another update today. Adding two new options (they are not inside the filters in params).
  1. dontCountArchivedTowardsLimit: (bool) defaults to true
  2. dontCountDisabledTowardsLimit: (bool) defaults to false

With default settings the server code will then figure out the oldest not archived geocache. Then it will return the geocache data for those matching the filters. Ie, it will exclude all archived if that filter-flag has been set.

This seems to be the expected behaviour, but I am not 100% certain either.


If setting dontCountArchivedTowardsLimit to false, the <limit> oldest caches will be returned only. It's then likely that almost all of them are archived today.
Sorry, you do not have permission to post/reply in this forum.