Advanced

Max length script output

Max length script output
September 20, 2017 02:41PM
Since a while back we have started to cut the the html output from checker scripts after 50 kB. This might have caused some scripts to break. There are two main reasons to this change.

1) Makes it harder to data-harvest. This is actually not of our biggest concern since it's still possible. We also know who has access to writing scripts and any abuse of the data would be reported to Geocaching HQ.

2) Some scripts are outputting huge long meaningless lists of data. It normally doesn't make sense to output a list of 5000 caches for example. The problem here of course isn't the huge list in itself. But all HTML output from the scripts are "purified" to prevent XSS and other exploits. The purification takes quite a lot of time/cpu power.

I found the issue itself while adding more proper support for script timeouts. I ran a script on myself, it was a simple challenge and I would expect the LUA script to take 0.5 seconds. It did timeout. After some analyzing I realized that it was due to returning a huge list of HTML. It took ~100 times more CPU power to handle the HTML output than to validate the user.

Some scripts do however output proper feedback with tables and images. Some of those produce quite big blobs as well. Those are mainly large (length) because they have to output things like "/images/gc-icons/traditional_16.gif" a hundred times, with an img tag around it.

I am not sure about the long-term solution for this. The current workaround is to output "better" HTML. I have created a proof of concept that should shorten the HTML a lot when being repetitive.

<html>
        <body>
                <style type="text/css">
                        #cc_HtmlFeedback table {
                                border: none;
                                border-collapse: collapse;
                        }
                        #cc_HtmlFeedback td {
                                border: 1px solid black;
                        }
                        #cc_HtmlFeedback .ct-t {
                                background-image: url('https://project-gc.com/images/gc-icons/traditional_16.gif';);
                                width: 16px;
                                height: 16px;
                                margin: 0;
                        }
                        #cc_HtmlFeedback .ct-m {
                                background-image: url('https://project-gc.com/images/gc-icons/multi_16.gif';);
                                width: 16px;
                                height: 16px;
                                margin: 0;
                        }
                        #cc_HtmlFeedback .ct2-tm {
                                background-image: url('https://project-gc.com/images/gc-icons/traditional_16.gif';), url('https://project-gc.com/images/gc-icons/multi_16.gif';);
                                width: 16px;
                                height: 32px;
                                background-repeat: no-repeat, no-repeat;
                                background-position: 0 0, 0 16px;
                        }
                        #cc_HtmlFeedback .ct2-tmu {
                                background-image: url('https://project-gc.com/images/gc-icons/traditional_16.gif';), url('https://project-gc.com/images/gc-icons/multi_16.gif';), url('https://project-gc.com/images/gc-icons/unknown_16.gif';);
                                width: 16px;
                                height: 48px;
                                background-repeat: no-repeat, no-repeat, no-repeat;
                                background-position: 0 0, 0 16px, 0 32px;
                        }
                </style>

                <div id="cc_HtmlFeedback"> <!-- A div with this id already exists, your html-output will be added to that div. -->
                        <table>
                                <tr>
                                        <td>
                                                <p class="ct-t"></p>
                                                <p class="ct-m"></p>
                                        </td>
                                        <td>
                                                <p class="ct-m"></p>
                                        </td>
                                </tr>
                                <tr>
                                        <td class="ct2-tm">
                                        </td>
                                        <td class="ct2-tmu">
                                        </td>
                                </tr>
                        </table>
                </div>
        </body>
</html>

Note that the html and body tag isn't needed from the checker-script, since it already exists on the web. The div with id cc_HtmlFeedback also already exists.

The output will look like this. Note that there are two different examples, the first row is easier, but less byte-saving. Also, this is only a good solution when there are a lot of images, haven't done the math, but it might be worth it with 100 of them.


As mentioned, not sure this will be the long-term and final solution. An alternative could be a predefined CSS for the checker scripts for example.

Another variant is callbacks that produces trusted HTML. For example RenderTableFromAssociativeArray(data). I bet this will will require quite a few different callbacks though, which might make it quite tiresome to build them, and also tiresome every time someone wants something that doesn't exist.

This can of course be combined. We could still allow html output. That output could contain for example <div id="CC_1"><div>, and the script could return { trustedHtml: { 1: [ RenderTableFromAssociativeArray, data ] }}
We could then purify part of it, and insert the trust html blobs.

To be honest I am not sure what the best solution is.


The ultimate goal in my opinion is that all scripts/tags should output usable log examples and script output that is worth looking at.



Edited 1 time(s). Last edit at 09/20/2017 02:43PM by ganja1447. (view changes)
Re: Max length script output
September 21, 2017 09:08AM
Quote

The ultimate goal in my opinion is that all scripts/tags should output usable log examples and script output that is worth looking at.

The HTML output is additional information like debug output. In most cases it is not needed at all until there is some reason to check details. You could automatically hide every too long HTML until the user clicks the "show" button to see it. This way purification could be done by demand and short HTML works without any modifications.
Re: Max length script output
September 22, 2017 12:58AM
The main problem with that idea is that the long HTML would still be generated and it's the generation of the HTML which takes computation time. The display of the HTML is insignificant to the server.
Re: Max length script output
September 22, 2017 01:01AM
sumbloke Wrote:
-------------------------------------------------------
> The main problem with that idea is that the long
> HTML would still be generated and it's the
> generation of the HTML which takes computation
> time. The display of the HTML is insignificant to
> the server.

That isn't true. What's taking time is verifying the html and making sure there isn't anything harmful in it.
Re: Max length script output
September 22, 2017 01:16AM
Quote
ganja1447
After some analyzing I realized that it was due to returning a huge list of HTML. It took ~100 times more CPU power to handle the HTML output than to validate the user.

Handling (purifying) of the HTML can happen only after the script has returned the html string in the very end of the process. If the returned string is too long, it could be stored temporarily and processed later on demand for display.
Re: Max length script output
September 22, 2017 01:16PM
So an update.

For now, I have removed the 50 kB limit again. There has been two changes, one of them based o what arisoft said.

1)
The auto challenge checkers now don't purify the html. Since they don't use it, we just clear it instead. This isn't a relevant noticeable change, except for the CPUs.

2)
The html output is now stored for 5 minutes in a key/value store. Instead of returning the HTML on the ajax call we return a key to that store. There is then a second ajax call being made which fetches a purified version of that HTML.

The advantage is that the purification isn't included in the 30 second time limit, it's almost only LUA execution time in that now. Also, even though the HTML is huge and complex, the user will at least see the result from the LUA script itself. 0-30 seconds later they might get the HTML output as well. If it's to complex, it will just throw an error (30 second timeout).


I still don't believe there are many cases where the HTML needs to be of 50 kB. A reasonable limit could be discussed, or something that we should analyze, 50 kB wasn't a scientific choice. But with these changes, it feels like this task will be put on ice, considered lower priority.
Re: Max length script output
October 31, 2017 01:08AM
Is this the reason for some of the "not a checker" scripts now failing? For example, Target.'s shortest log script ( https://project-gc.com/Challenges//19085 ) or my list finds script ( https://project-gc.com/Challenges//26124 ) for users with lots of finds.

Is there any possibility of having a more meaningful error message than "Unknown error fetching html output"? It seems to be returning this error in about 10 seconds.
Re: Max length script output
November 13, 2017 03:23PM
sumbloke Wrote:
-------------------------------------------------------
> Is this the reason for some of the "not a checker"
> scripts now failing? For example, Target.'s
> shortest log script (
> https://project-gc.com/Challenges//19085 ) or my
> list finds script (
> https://project-gc.com/Challenges//26124 ) for
> users with lots of finds.
>
> Is there any possibility of having a more
> meaningful error message than "Unknown error
> fetching html output"? It seems to be returning
> this error in about 10 seconds.

I can't really say if it's the reason, but it's probably related to the changes.

A more meaningful error isn't possible, since the error definitely is unknown. The server side process doesn't answer. With a more manual approach I can of course tell why. It's running out of memory.

If I understand the scripts correctly, the fix is easy. Don't output a hundred thousand html tags. The system isn't built for handling this form of output, and it's quite resource heavy. For the shortest logs script I assume it's not of interest to actually output all log lengths.

For the second script (yours), I can understand that you don't want to cut it to a list of 100. But still, this is not why the system is in place. These aren't changes to actively prevent these scripts, it's just a side-effect. Giving the output as plain text (not html) will probably solve it though.
Sorry, you do not have permission to post/reply in this forum.