Ratelimits RE: Scripting Tech Development

by **Darcania** » Sat Oct 08, 2022 10:14 am

Hello, tech team, looking forward to seeing what you all with the API going forward

I'll note real quick that I haven't personally touched the API for a while; the last time I've had to change any of my nationstates-related projects was back when the Cards API was introduced, so apologies ahead of time for any inaccuracies in my request below.

Personally, what I'd love to see most is improvements to both the ratelimit headers and the ratelimit documentation. While following the ratelimit is currently very possible, it is very limiting both to API admins and to API users.

As it is now, both the headers and the documentation provide woefully incomplete information. In order to follow ratelimits, information beyond what the server returns, and even beyond what is documented, is required. This both makes it more difficult for clients to follow ratelimits, and locks the API to its current ratelimits barring a long and greuling deprecation period from the admins. Providing more data points, such as those listed below, would provide more flexibility both to the server and to the client with regards to creating and following ratelimits.

Ratelimit requests seen.
While this is provided by the server via response header, this is not documented at all - no part of the API mentions the use of the X-Ratelimit-Requests-Seen header.
Maximum request limit.
While this is documented, this information is only provided via documentation, meaning any changes to ratelimits would require all clients to change their hardcoded values for this data point. This can be resolved by adding an X-Ratelimit-Limit header, by following the next bullet point, or both.
1. Ratelimit requests remaining.
  As #1 is unknown to the client barring a hardcoded value, there is also no way to determine requests remaining based purely on information provided by the server. This can be resolved with an X-Ratelimit-Remaining header or similar.
Time remaining in the bucket.
This is also documented, but only that: no part of the API itself provides this information, once again leaving it to clients to hardcode both the time per bucket and to record the time of the first request. This can be resolved with an X-Ratelimit-Reset or X-Ratelimit-Reset-After header which list either an absolute or relative time that the bucket resets.
Note that this assumes that the API is intended to follow the flush bucket algorithm that it currently does, rather than a leaky bucket algorithm like the documentation implies; see below for more on that point.
The bucket being used.
This is a big one, and one that has been discussed in the past. Currently, there are 3 ratelimit buckets, all nested inside of each other: the global API ratelimit, the telegram ratelimit, and the recruitment telegram ratelimit. While exceeding the telegram ratelimits doesn't result in a 429 by itself (and thank you whoever implemented that little detail!), it would be much appreciated if scripts could tell from the API which bucket the requests fall under, rather than depending on the user as the sole source of truth for, e.g., whether a telegram is a recruitment TG or not.

I'd also like to note that the documentation for the rate limit is incomplete in another fashion: namely, what kind of bucket the 30s are. The documentation implies that the 30s bucket is a "leaky" bucket, where each request "leaks" out of the bucket 30 seconds after it is put in, which results in a sliding 30s window along the requests timeline preventing any 30s block from having more than 50 requests. However, in practice, the bucket is closer to a "flush" bucket, where 30 seconds after the first request, the entire bucket is "flushed" and is now in an empty state again. Rather than a sliding 30s window, this instead results in static 30s "blocks" which never exceed 50 requests; however, by carefully timing requests like as per the below charts, it is easily possible to reach upwards of 99 requests in 30 seconds.

I'm not sure which type of bucket the admins actually intend to use (most notably, mock-nationstates and trawler seem to use variations of a leaky bucket rather than a flush bucket like nationstates apparently uses), but I would appreciate the documentation at least making it clearer what the intended bucket type is.

by **Roavin** » Sun Oct 09, 2022 11:42 am

Thank you for the detailed post (and graphs!), as well as our productive and cordial discussion in NS Coders.

Unfortunately, there isn't an internet standard for how HTTP rate limiting is supposed to function. Different vendors have included and specified different custom rate limit HTTP headers and semantics, and NationStates is no exception. Fortunately, there is a draft specification heading towards standardization for this very thing. This draft pretty much includes what you're requesting or a semantically equivalent functionality. I would be inclined to go with this, possibly with the X-prefix while it's not an official Internet Standard yet, since this is where things are generally heading.

As for the Telegram buckets, they aren't really buckets (in that sense). Quoth the API docs:

The Telegrams API Rate Limit works by checking the amount of time that has passed since your last successful request. For example, if you sent a telegram via the API 60 seconds ago, you can now successfully send a non-recruitment telegram (since 60 is greater than 30), but not a recruitment telegram (since 60 is less than 180).

This is a different (and in some ways simpler) approach. It's also incompatible with the rate limiting RFC above, but that's okay because TG rate limits are a separate thing entirely, and therefore should sit a layer above HTTP (which they do, except for the X-Retry-After weirdness).

Regarding the 45-second burst, I'm not worried about this — while it's possible to exceed 50 in a 30 second window, it's still not possible to exceed 100 in a 60 second window, so on average the rate limit is held. I don't know whether this behavior (i.e. allowing short bursts like this) was intended or not, but I don't see an issue with either for typical NS applications.

Though you didn't mention it here, it did come up on Discord - 15 minutes is indeed a pretty long time, and I do think it's reasonable to come up with more forgiving rules. I haven't looked around yet to see what "typical" lockouts would be, but I imagine that spurious 429s are generally free-ish, and only when hitting hard do longer lockouts occur.

All of the above leads to other advantages that weren't explicitly stated:

NationStates could change the rates for any reason and conformant clients would automatically work out of the box. And the reasons are many - these could be policy changes, or load balancing, or a number of other things.
Two conformant clients on the same PC wouldn't lock each other out. This is a real scenario that happens to R/D officers, for example, when they're using, say, 20xx and Deadeye at the same time.

To summarize, I'm very much inclined to pursue something like this:

Revamp rate limiting as per the RFC-draft (and hopefully soon full Internet Standard RFC)
Reconsider lockouts, making spurious 429s "free-ish" but progressively harsher
Adjust docs accordingly

The open questions, on a high level, are considering lockouts, and whether to stick with the current bucketing behavior. I've not done research on the former and have no opinion on the latter, so I would appreciate comments and suggestions.

by **Window Land** » Sun Oct 09, 2022 4:41 pm

Regardless of what is chosen in terms of the rate limits, I would greatly appreciate them having properly documented. In the past, I've written code based on the leaky bucket assumption, and it results in an annoyingly high rate of spuriously tripping the rate limit when making large quantities of requests, even when running well below the max rate limit. For example, after the last z-day I wound up querying all the happenings for every nation in Forest, and still tripped the rate limit 2/3 of the way through with my code rate limited to 30 requests per 35 seconds, and ensuring every request has completed before the next one is sent.

I also didn't know about the fact the X-Ratelimit-Remaining header exists either. Better documentation around the rate limits would have saved me a fair bit of frustration. While I don't use the API that often, I don't think it matters too much exactly what strategy you go with for rate limiting, so long as it is clearly and correctly documented.

by **Roavin** » Sun Oct 09, 2022 5:26 pm

Window Land wrote:still tripped the rate limit 2/3 of the way through with my code rate limited to 30 requests per 35 seconds, and ensuring every request has completed before the next one is sent.

That shouldn't happen. Did you possibly have something else using the API, i.e. an auto-login script, running at the same time?

by **Window Land** » Sun Oct 09, 2022 5:59 pm

Roavin wrote:
Window Land wrote:still tripped the rate limit 2/3 of the way through with my code rate limited to 30 requests per 35 seconds, and ensuring every request has completed before the next one is sent.

That shouldn't happen. Did you possibly have something else using the API, i.e. an auto-login script, running at the same time?

Nope. I haven't ever used anything like that. I didn't authenticate at all, either. I did have the script running on a docker container somewhere in the cloud rather than my local machine, so that might have affected things, too. Also, looking through that script's history, my previous statement was wrong. I started at the max speed, tripped the rate limit pretty quickly, then brought it down to 30 requests per 30 seconds, tripped it again, restarted my script, tripped it a third time, and then changed it to 30 requests per 35 seconds, and was able to complete it like that (at least I think so- I'm going based off of the changes I made as I went, and what I previously said was just going off of the final iteration of my script).

Sorry if this was overly, vague, the last time I ran it was a year ago.

edit:
I can come up with code pretty easy if it helps (warning, it's fairly long, in rust, and doesn't have dependencies pinned properly).

by **Roavin** » Tue Feb 28, 2023 5:27 pm

What's described in draft-ietf-httpapi-ratelimit-headers-latest has already been implemented in several places in part or in full (see here), so I think it's viable to go this route. Not only would it be standard(ish), but also has several advantages, such as being able to statically or dynamically change the limits and API consumers should (if implemented correctly) automatically be able to adjust accordingly, and also work if multiple API consumers are running from one IP address without risking long lockouts. I imagine NationStates' specific implementation to look something as follows.

When receiving an API request, regardless of whether the request succeeds (200) or is throttled (429), the server sends back the following response headers:

RateLimit-Policy: This describes the rate limiting policy of NationStates, and initially has the value 50;w=30, which means 50 requests per 30 second time window.
RateLimit-Limit: This has the value 50, which means that there are a total of 50 requests available in the current time window (bucket). API consumers should use this value rather than hardcoding 50.
RateLimit-Remaining: How many more requests can be made within the current time window.
RateLimit-Reset: Number of seconds remaining in the current time window (bucket).

The principal algorithm will remain flush bucket, as it's simple and works well with these headers.

Together with the above change, I want to change the lockout time mechanism. Making the time too short makes it meaningless, as scripts would simply not bother rate limiting and just retrying on 429's. But making it too long makes it annoying for developers and makes using multiple API consumers on one computer essentially impossible (which, given that we want to move people to primarily use the API rather than hitting the HTML site, would be counter productive).

My suggestion is that lockouts are, by default, only until the end of the time window, but "repeat violations" (which would be either 3 successive time windows with lockout, or 3 extraneous requests after the first 429 of a time window) are served with additional lockout; fancy would be an exponentially rising lockout, starting with 30 seconds and capping out at an hour of lockout, or something simple where it simply applies the 15 minutes as before when this condition is triggered. These numbers aren't set in stone, just a first rough idea of the magnitude of values I'm thinking about here.

by **Khronion** » Tue Feb 28, 2023 6:46 pm

+1 to these proposed changes. While not necessarily needed for tools like Spyglass, I'd certainly want to see it take advantage of these potential headers for the sake of future-proofing and playing nice with other tools.

No opinion on a flush bucket vs a leaky one, but your proposal sounds okay to me. The suggested lockout behavior also sounds good - I agree that the magnitude would be less annoying for developers.

by **Esfalsa** » Wed Mar 01, 2023 1:43 am

I also don't hold much of an opinion either way on rate-limiting methods, but to briefly clarify this sentence (and apologies for the nitpick here)…

Roavin wrote:The principal algorithm will remain leaky bucket, as it's simple and works well with these headers.

…I thought that…

Roavin wrote:the API actually has a flush bucket system rather than a leaky bucket system

…so to clarify, would these changes be paired with a new rate-limiting algorithm or would they keep the (general contours of) the current algorithm?

I'm also wondering what the headers returned along with a throttled (429) response would be. Will the current X-Retry-After header be kept, or would the API return something like 'Ratelimit-Remaining: 0' with RateLimit-Reset providing the number of seconds until the API consumer can make another request?

For what it's worth, draft-ietf-httpapi-ratelimit-headers would say that RateLimit-Reset and Retry-After should reference the same point in time if both are present. Somewhat confusingly, it also provides an example where Retry-After indicates the length of time a client should pause requests for, and RateLimit-Reset would then indicate the nominal (and different) value that would apply after the rate limit reset — but it appears that example refers to dynamic rate limits, so keeping RateLimit-Reset and Retry-After the same would be more compliant... unless or until NationStates implements dynamic rate limiting?

(Do let me know if I'm reading the spec(-ish) wrong, by the way >_>)

by **Roavin** » Wed Mar 01, 2023 3:39 am

Esfalsa wrote:…so to clarify, would these changes be paired with a new rate-limiting algorithm or would they keep the (general contours of) the current algorithm?

Oops. No, it remains a flush bucket. Fixed. Thanks!

Esfalsa wrote:I'm also wondering what the headers returned along with a throttled (429) response would be. Will the current X-Retry-After header be kept, or would the API return something like 'Ratelimit-Remaining: 0' with RateLimit-Reset providing the number of seconds until the API consumer can make another request?

For what it's worth, draft-ietf-httpapi-ratelimit-headers would say that RateLimit-Reset and Retry-After should reference the same point in time if both are present. Somewhat confusingly, it also provides an example where Retry-After indicates the length of time a client should pause requests for, and RateLimit-Reset would then indicate the nominal (and different) value that would apply after the rate limit reset — but it appears that example refers to dynamic rate limits, so keeping RateLimit-Reset and Retry-After the same would be more compliant... unless or until NationStates implements dynamic rate limiting?

(Do let me know if I'm reading the spec(-ish) wrong, by the way >_>)

The key is the use of SHOULD rather than MUST in the spec - Reset and Retry-After should usually be the same, but don't have to be for certain scenarios. The given example (which doesn't keep them the same) makes more sense if you think of the rate limit headers' information being lower in priority than Retry-After. So it's "please chill 20 seconds, but since we're already here, here's the data for the current time window (which expires later), so once you've chilled out a bit, you can go ham with this policy for the rest of the time window".

NS would initially just implement a static rate limit; it's nice to have the option to implement dynamic limits that can be presumed to "just work", though.

by **Esfalsa** » Wed Mar 01, 2023 12:16 pm

My bad; thanks for the explanation!

Just to clarify, does this mean the current play is for the API to return a Retry-After header on 429s along with the RateLimit-* headers, or was that just an explanation in response to my confusion surrounding draft-ietf-httpapi-ratelimit-headers?

by **Roavin** » Wed Mar 01, 2023 12:26 pm

Both!

by **Roavin** » Mon Apr 03, 2023 3:40 pm

I've now implemented this, and documented it. In the course of this, I've also changed it so that the 15 minute lockout only applies if the rate limit is exceeded excessively. Comments are appreciated!

by **Darcania** » Tue Apr 04, 2023 7:30 pm

Roavin wrote:I've now implemented this, and documented it. In the course of this, I've also changed it so that the 15 minute lockout only applies if the rate limit is exceeded excessively. Comments are appreciated!

Thank you!

Will there be any further ratelimit info provided by the server for the Telegrams API? At the very least knowing which type of TG (recruitment or non-recruitment) would be useful for library authors.

by **Roavin** » Wed Apr 05, 2023 11:14 am

Not entirely sure I understand what you mean with type of TG - do you mean what type of TG was last sent?

by **Darcania** » Wed Apr 05, 2023 11:54 am

Roavin wrote:Not entirely sure I understand what you mean with type of TG - do you mean what type of TG was last sent?

Unless something's changed since last time I've had a TG API key to work with, it hasn't been possible for a library to determine what TG it just sent via the API - only the script author could determine that, since the API response doesn't tell the library. I'm not sure what other TG it's possible to refer to here so I don't really understand the question...

by **Roavin** » Wed Apr 05, 2023 12:24 pm

Ah, gotcha. Well, error messages are currently readable HTML, rather than programmatically parseable payload; that has advantages and disadvantages. Specifically for TGs, an XML delivery report would probably be more convenient and it could include such a thing. I'd have to check backstage if we want to do that (as it's rather more involved), but principally I think that'd be a good idea.

Or the quick and hacky way of adding another HTTP header. Ugly, but easy. Hm. Your thoughts welcome!

by **Darcania** » Mon Apr 10, 2023 3:18 pm

Roavin wrote:Ah, gotcha. Well, error messages are currently readable HTML, rather than programmatically parseable payload; that has advantages and disadvantages. Specifically for TGs, an XML delivery report would probably be more convenient and it could include such a thing. I'd have to check backstage if we want to do that (as it's rather more involved), but principally I think that'd be a good idea.

Or the quick and hacky way of adding another HTTP header. Ugly, but easy. Hm. Your thoughts welcome!

I don't use the TG API, but I think an HTTP header would be the best way to go just to give time to focus on other areas of the API. I'm not sure of the best approach for putting an HTTP header in since, as you said above, it's not a "ratelimit" per se, but even if it were a string appended to "Ratelimit-Policy" it'd still be something library authors could work with.

Unless, of course, you plan on adding other features where a delivery report would be useful, though I haven't seen any feature requests that would make much use of that. I know there's a request for API templates that scripts could fill with URL parameters, in which case the filled text and other info could be returned in a delivery report (for debugging purposes, for example, since for a lot of regions the telegram author isn't the same person as the telegram scripter).

Ratelimits RE: Scripting Tech Development

Ratelimits RE: Scripting Tech Development

Who is online