Return to Project-GC

Question

Log similarity

4 Answers

Answer 1 · 2017-05-31T12:08:57+0000

I guess the algorithm count the number of occurrences of each word you put in your logs. So, words like "the" in english which are really oftenly used, will increase significantly the log similarity.

For my part, I also love making long and (I hope) interesting logs. But, when I log a series of caches, I always add a "header" log (identical to all caches of the day) and a part dedicated to the cache itself. So, my log similarity is very high (45%).

Answer 2 · 2017-05-31T18:54:04+0000

I would not be surprised if details about the algorithm are withheld on purpose. This might make it too easy to fool the algorithm to achieve a good ranking.

I fully agree it is worth trying to make good and interesting logs, however, this is not an easy task, and very difficult to be judged by an algorithm. My own logs (mostly written in German, plus the local language in most cases) also have a high similarity which may also be due to the fact that a few words appear frequently in any normal text. I also have to admit I use the 'day header' or 'trip header' part which may add to the similarity and then would assume the check may also be done on a 'text similarity' (checking for identical sets or sentences), which is also easily programmable.

Answer 3 · 2017-06-02T16:37:16+0000

Perhaps shorter logs give more variance. I have an average log length of 41 words, and a similarity score of 31%. Although I sometimes write full length descriptions, I often post logs such as "Another quick find here - TFTC" (I'm not saying I should, and I will write more for a good cache, or more effort than usual needed to find it). Personally I'd rather have a higher log length and lose out on the similarity score, since I get a better badge for the log length, whereas the similarity score doesn't seem to be used in any other statistics. However with over 2000 logs, bringing the average length up is very slow!

Answer 4 · 2018-08-12T08:49:10+0000

I'm now at 68% although i still do long and different logs in multiple languages.

My guess is that the algorithm is based on histogram entropy and number of different words used but why does it give a so high value for me?

Return to Project-GC

Categories

Log similarity

Please log in or register to add a comment.

Please log in or register to answer this question.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.