This is correct. Has it been a big change for you?
We are currently implementing a new algorithm for calculating words, which we believe to be more correct. The process of recalculating the data of 500M logs will probably take a few weeks.
So, why have we changed it then ...
Calculating the number of words in a log might sound very simple, but it's not. I'll give some examples, and you (whoever reads this, not specific to the one who asked the question) can think about how many words it is:
- X X X X X X X
And then we haven't look at chinese or similar yet. Solving this in a fair and good way with code is quite hard. I am quite sure that if we would post an example log and let users count the words, we would get quite a few different results from it. And something with a dash in it might be considered one word in some cases, two in others, and none in yet another.