Return to Project-GC

Welcome to Project-GC Q&A. Ask questions and get answers from other Project-GC users.

If you get a good answer, click the checkbox on the left to select it as the best answer.

Upvote answers or questions that have helped you.

If you don't get clear answers, edit your question to make it clearer.

0 votes
947 views

I see "timeout while calculating" error on some users on the new Log similarity stats

Example: 

http://project-gc.com/ProfileStats/terabaitas

in Bug reports by vukisz (1.1k points)

1 Answer

+1 vote
This is normal. This is a calculation that takes a lot of processing, so it may not be finished in the first few runs. If not, you will get this message. However, the state gotten to in this unfinished run is preserved so the next time the same stats are run the processing will get further, eventually completing. Once complete, it will probably stay complete for that user unless an extremely large amount of logs are added in the same stats period.
by pinkunicorn (Moderator) (197k points)
Affecting old logs is the thing that is most likely to make it start to timeout again.
* Editing several old logs.
* Logging something way back in time.
* Deleting an old log (not in the few hundred most recent).

The values are cached in chunks of 100 logs or so. If a log is deleted in the middle of that, all caching after that log entry will be obsolete and useless.
My profile still shows timeout.  Is this still to be expected?  Will it be announced when there should be no more?
A timeout will always be possible. Though unlikely nowadays. I can try to explain how it's calculated technically, which might increase your understanding.

The variance is calculated as an average of the difference in characters between log #1 and #2, and #2 and #3 and so on. The simple solution is to calculate this for every log text you have, each time. But if you have 10000 logs, with 500 characters in each, it takes a LOT of time.

We therefore calculate them in chunks of 100, and store the values. This is stored in a memory cache, which is flushed to disk a bit now and then. Ie, a disruption in the service may result in data loss (for the cached data).

When we implemented this, we had to build up the cached values for all the users. Each time the Profile stats was loaded, it started calculating, and aborted after a set number of seconds (I don't remember how many, 5, 10?). If it managed to calculate the first 15 chunks (1500 finds), it could continue from there the next time. After a few times, it should then complete.

What can then make this stop working now, when it has been running for so long?
1) Disruption in our services, loss of cached data. I don't think we have had this issue yet though, but it will most likely happen sooner or later.
2) You add a log in the middle. This will change all chunks starting from that log, since the next chunks will start and end with other logs.
3) Editing a log in the middle. This should only require recalculating that chunk, which shouldn't be a big deal. 1 chunk normally takes less than a second, so as long as you haven't added a few hundred new logs, it should normally be safe. But if you have edited plenty of logs, it might affect.
4) Adding several hundred new logs might cause this effect, since there are quite a few new logs to do the math on.
5) Removing a log will have the same effect as #2.
6) Project-GC can sometimes lose a log or two, and then add them back again. This could potentially trigger the effect of #2 as well.

All in all, it comes down to reducing loading times of the rest of the data. It's not worth waiting 5 minutes for the Profile Stats for this single, fairly unimportant, value.
It has been over a year since this feature has been returned and I still have not seen a single result for log similarity.

In this time there have been weeks where I have not added a log. My rate has been generally low.

So looking at the list of points
First is has never worked for me so the question "What can then make this stop working now, when it has been running for so long?" seems silly
1) I cannot comment on that.
2) I have not added or deleted any log going back more than a month lately.
3)Not log editing
4) I have not added several hundred logs since caching with CuteLilFuzzyMonkey and RetiredGuy both of who have computed values.
5) see comment on #2
6) does not seem like something I have control over.

I suspect there is something funky about my cached data that is causing this to go wonky which might be considered an example of 1.

It is very frustrating that this number has never been seen by me and it seems like I am not getting something that other are.
...