Page 86 of 97

PostPosted: Thu Dec 03, 2020 3:52 pm
by SherpDaWerp
Yet another bump on the request to have "amount unbid" available on the deck info query.
i.e. this query would also have an additional field: <UNBID> ... </UNBID> which would have the information available here, at the bottom of the page.

If this is an unfeasible addition for whatever reason, just tell me so I stop bumping the request.

Encoding

PostPosted: Thu Dec 10, 2020 7:39 pm
by Imperium Anglorum
In this API call: https://www.nationstates.net/cgi-bin/ap ... resolution. Can the escapes (eg &#147;) be put into Unicode instead of whatever encoding is currently used?

PostPosted: Thu Dec 10, 2020 11:29 pm
by Merni
Imperium Anglorum wrote:In this API call: https://www.nationstates.net/cgi-bin/ap ... resolution. Can the escapes (eg &#147;) be put into Unicode instead of whatever encoding is currently used?

If it helps, at least the decimal 133 (ellipsis … ) and 145-148 (‘ ’ “ ” smart quotes) appear to be from Windows-1252. Weirdly, there are also 39 (' straight apostrophe) and &quot; (" straight quotes).

PostPosted: Fri Dec 11, 2020 6:25 am
by Imperium Anglorum
Merni wrote:
Imperium Anglorum wrote:In this API call: https://www.nationstates.net/cgi-bin/ap ... resolution. Can the escapes (eg &#147;) be put into Unicode instead of whatever encoding is currently used?

If it helps, at least the decimal 133 (ellipsis … ) and 145-148 (‘ ’ “ ” smart quotes) appear to be from Windows-1252. Weirdly, there are also 39 (' straight apostrophe) and &quot; (" straight quotes).

Sadly, I can't find anything in Java which will decode HTML entities to Windows-1252 instead of Unicode.

Both

Code: Select all
org.apache.commons.text.StringEscapeUtils
org.jsoup.parser.Parser

Will only decode to Unicode and after looking through how they do this, it seems to be written into their numeric handling.



Looking into it further: a number of code points should be encoded using HTML escapes rather than using Windows-1252 numeric values. Eg:

Code: Select all
            "&#145;", "&apos;",
            "&#146;", "&apos;",
            "&#147;", "&ldquo;",
            "&#148;", "&rdquo;",
            "&#133;", "&hellip;",
            "&#150;", "&ndash;",
            "&#151;", "&mdash;"

PostPosted: Fri Dec 11, 2020 9:36 am
by Trotterdam
Imperium Anglorum wrote:In this API call: https://www.nationstates.net/cgi-bin/ap ... resolution. Can the escapes (eg &#147;) be put into Unicode instead of whatever encoding is currently used?
Merni wrote:If it helps, at least the decimal 133 (ellipsis … ) and 145-148 (‘ ’ “ ” smart quotes) appear to be from Windows-1252. Weirdly, there are also 39 (' straight apostrophe) and &quot; (" straight quotes).
This looks to be an issue with the contents of that particular resolution as submitted, not with the API in general. The API is correctly reporting the data in the server, that data just happens to be wrong, which is nominally the fault of the player who submitted the resolution rather than of NationStates (although really, it's more that neither of them properly sanitized text made by shoddy Microsoft products).

Imperium Anglorum wrote:Sadly, I can't find anything in Java which will decode HTML entities to Windows-1252 instead of Unicode.
HTML entities are supposed to be in Unicode, and usually are (even on NationStates). However, because using Windows-1252 characters as though they were Unicode characters is a common mistake, and because the actual Unicode codepoints that they would replace are used by basically no-one ever, most modern browsers recognize them anyway.

There's really only a few Windows-1252 characters in common use you need to worry about:
&#133; / &#x85; -> &#x2026; (...)
&#145; / &#x91; -> &#x2018; (')
&#146; / &#x92; -> &#x2019; (')
&#147; / &#x93; -> &#x201C; (")
&#148; / &#x94; -> &#x201D; (")
&#150; / &#x96; -> &#x2013; (-)
&#151; / &#x97; -> &#x2014; (-)
So you can just implement a search-and-replace for these codepoints (it's best to perform this AFTER the parsing of XML/HTML entities into raw Unicode data). There are more than these, but they aren't seen as often and you can find them on Wikipedia if you need to.

The basic reason behind this confusion is that the Windows-1252 encoding is a Microsoft-made extension of the ISO-8859-1 (AKA Latin-1) official international standard, which has the exact same meanings for codepoints 0xA0-0xFF, while only differing in the meanings of codepoints 0x80-0x9F, the ISO-8859-1 readings of which are obsolete junk which nobody ever used anyway. This means a lot of people mistakenly treat Windows-1252 basically as if it was actually the official ISO-8859-1 (blame Microsoft). The thing is, the Unicode standard is defined as sharing its codepoints up to 0xFF with ISO-8859-1, meaning that ISO-8859-1 data can be trivially converted to proper Unicode. But that's ISO-8859-1, not Windows-1252, and Unicode has its own different codepoints for those characters that Windows-1252 assigns to the 0x80-0x9F range.

I understand if that previous paragraph goes over your head. Most people don't know or care about such details, which is why browser makers eventually threw up their hands and just started recognizing the "wrong" formatting.

PostPosted: Sat Dec 12, 2020 8:29 am
by Trotterdam
If you want, here's some code. (Motivated by a post from Imperium Anglorum that got deleted, but it wasn't really the point.)

If you're storing strings in UTF-32 or UTF-16 format, fixing Windows-1252 litter is very easy:
Code: Select all
/* wchar_t can be replaced with uint32_t or whatever else you're using for UTF-32 or UTF-16 data. */
void fix_windows_1252_in_unicode(wchar_t *str)
{
   static const wchar_t remap[0x20] =
   {0x20AC, 0x0081, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021,
    0x02C6, 0x2030, 0x0160, 0x2039, 0x0152, 0x008D, 0x017D, 0x008F,
    0x0090, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014,
    0x02DC, 0x2122, 0x0161, 0x203A, 0x0153, 0x009D, 0x017E, 0x0178};
   /* Note: 0x81, 0x8D, 0x8F, 0x90, 0x9D are undefined in the Windows-1252 standard.  They can be left unmodified (as above), or replaced with 0xFFFD. */
   int i;

   for(i = 0; str[i]; i++) if((str[i] | 0x1F) == 0x9F) str[i] = remap[str[i] & 0x1F];
}
Again, you apply this AFTER decoding HTML character entities. Note that some websites will also feed you broken Unicode in raw binary rather than character entities.

This is short enough that it should be trivial to port into any other programming language you're using.

If you're storing strings in UTF-8, the conversion becomes more complicated because you might need to increment the number of bytes in a character, memmove()ing the rest of the string accordingly. Still, the theory is simple even if the implementation is annoying. Here's a version implemented as a sed script, which is not ideal for most applications but at least shows the basic structure that can be used with any other search-and-replace functionality:
Code: Select all
s/\xC2\x80/\xE2\x82\xAC/g
# \xC2\x81 undefined
s/\xC2\x82/\xE2\x80\x9A/g
s/\xC2\x83/\xC6\x92/g
s/\xC2\x84/\xE2\x80\x9E/g
s/\xC2\x85/\xE2\x80\xA6/g
s/\xC2\x86/\xE2\x80\xA0/g
s/\xC2\x87/\xE2\x80\xA1/g
s/\xC2\x88/\xCB\x86/g
s/\xC2\x89/\xE2\x80\xB0/g
s/\xC2\x8A/\xC5\xA0/g
s/\xC2\x8B/\xE2\x80\xB9/g
s/\xC2\x8C/\xC5\x92/g
# \xC2\x8D undefined
s/\xC2\x8E/\xC5\xBD/g
# \xC2\x8F undefined
# \xC2\x90 undefined
s/\xC2\x91/\xE2\x80\x98/g
s/\xC2\x92/\xE2\x80\x99/g
s/\xC2\x93/\xE2\x80\x9C/g
s/\xC2\x94/\xE2\x80\x9D/g
s/\xC2\x95/\xE2\x80\xA2/g
s/\xC2\x96/\xE2\x80\x93/g
s/\xC2\x97/\xE2\x80\x94/g
s/\xC2\x98/\xCB\x9C/g
s/\xC2\x99/\xE2\x84\xA2/g
s/\xC2\x9A/\xC5\xA1/g
s/\xC2\x9B/\xE2\x80\xBA/g
s/\xC2\x9C/\xC5\x93/g
# \xC2\x9D undefined
s/\xC2\x9E/\xC5\xBE/g
s/\xC2\x9F/\xC5\xB8/g
Note that this isn't parsing Windows-1252, it's parsing broken Unicode with Windows-1252 contamination.

PostPosted: Sat Dec 12, 2020 9:01 am
by Imperium Anglorum
I'm writing something to add the functionality into Apache Commons Text; there weren't the proper escapes for Windows 1252. Insofar as they are added, it's a non problem. While the following snippet currently gets erroneous output, post patch it shouldn't.

Code: Select all
import org.apache.commons.text.StringEscapeUtils;
...

String w1252 = "&#126;&#151;&#161;";
String output = StringEscapeUtils.unescapeHtml4(w1252);
System.out.println(output);
System.out.println(output.chars().mapToLong(Long::valueOf)
        .boxed().collect(Collectors.toList()));




In the interim, for the application I was writing earlier, I just rewrote all the numeric escapes into normal quotes and apostrophes, which side-stepped the problem entirely. Re-encoding then just isn't necessary. A later version of the same code just translates it into HTML entities, which also side-steps the encoding problem.

EDIT. See https://github.com/ifly6/RexisQuexis/bl ... P1252.java.

PostPosted: Mon Dec 14, 2020 10:56 am
by Imperium Anglorum
There is a data inconsistency in the NS API.

This resolution declares its name to be <Repeal "Anti-Cyberterrorism Act "> (with space before terminating quote). The title of the repeal target is given as <Anti-Cyberterrorism Act> without ending space.

PostPosted: Mon Dec 14, 2020 3:46 pm
by Trotterdam
Imperium Anglorum wrote:There is a data inconsistency in the NS API.

This resolution declares its name to be <Repeal "Anti-Cyberterrorism Act "> (with space before terminating quote). The title of the repeal target is given as <Anti-Cyberterrorism Act> without ending space.
I'm seeing the space on both. The original issue has "<NAME>Anti-Cyberterrorism Act </NAME>", while the repeal has "<NAME>Repeal &quot;Anti-Cyberterrorism Act &quot;</NAME>". Or with the spaces highlighted: "<NAME>Anti-Cyberterrorism Act</NAME>" and "<NAME>Repeal &quot;Anti-Cyberterrorism Act&quot;</NAME>".

However, note that this is the raw XML. Some XML parsers might consider spaces at the beginning or end of a field to not count, so your program retrieves the field with the spaces trimmed out.

PostPosted: Wed Dec 30, 2020 12:45 pm
by Elest Adra
Is it possible to vote on World Assembly resolutions though the API?

PostPosted: Wed Dec 30, 2020 6:27 pm
by SherpDaWerp
Elest Adra wrote:Is it possible to vote on World Assembly resolutions though the API?

Currently, no. There was a short "discussion" of that a while back (from this post on) but nothing eventuated.

PostPosted: Wed Dec 30, 2020 6:39 pm
by That Crazy Casbah Sound
SherpDaWerp wrote:
Elest Adra wrote:Is it possible to vote on World Assembly resolutions though the API?

Currently, no. There was a short "discussion" of that a while back (from this post on) but nothing eventuated.

How does Stately do it then?

PostPosted: Thu Dec 31, 2020 4:55 am
by Trotterdam
That Crazy Casbah Sound wrote:How does Stately do it then?
Presumably by using the non-API site, which is perfectly legal so long as you stick to the "one server request per user action" rule, which you really shouldn't have any reason for wanting to exceed when it comes to World Assembly votes.

PostPosted: Thu Dec 31, 2020 5:12 am
by Elest Adra
That Crazy Casbah Sound wrote:
SherpDaWerp wrote:Currently, no. There was a short "discussion" of that a while back (from this post on) but nothing eventuated.

How does Stately do it then?


Okay, some time looking at Stately's source code, it seems like Stately:
1) Fetches the general assembly page with pin/autologin cookies, and scrapes the 'localid' from there.
2) Uses said localid to do a POST on the page to submit the form

Pretty clever, but I am worried that this could easily break in the future.

PostPosted: Thu Apr 01, 2021 7:41 am
by Imperium Anglorum
How do I tell if a nation has a WA authorship badge using the API? The wabadges shard (eg https://www.nationstates.net/cgi-bin/ap ... q=wabadges) deals only with commends etc.

PostPosted: Tue Apr 13, 2021 11:30 am
by Valentine Z
Hi there! So I have been looking though the API list a little, and unless I missed something, it would seem like currently there is no way to get the Government Expenditure Budget from API?

For example, you can see the amount alongside % of GDP. Is it possible to get these two values other than just the "govt" call?

Thanks in advance! ♥

PostPosted: Tue Apr 27, 2021 2:05 pm
by Libertarian Technocrats
Valentine Z wrote:Hi there! So I have been looking though the API list a little, and unless I missed something, it would seem like currently there is no way to get the Government Expenditure Budget from API?

For example, you can see the amount alongside % of GDP. Is it possible to get these two values other than just the "govt" call?

Thanks in advance! ♥

You can get the % from the publicsector call https://www.nationstates.net/cgi-bin/ap ... blicsector

Just call the gdp and multiply by the percentage.

PostPosted: Sat May 01, 2021 9:33 pm
by Valentine Z
Libertarian Technocrats wrote:
Valentine Z wrote:Hi there! So I have been looking though the API list a little, and unless I missed something, it would seem like currently there is no way to get the Government Expenditure Budget from API?

For example, you can see the amount alongside % of GDP. Is it possible to get these two values other than just the "govt" call?

Thanks in advance! ♥

You can get the % from the publicsector call https://www.nationstates.net/cgi-bin/ap ... blicsector

Just call the gdp and multiply by the percentage.

Ohhh, I never thought of that. Then this should be closed and simple for me, since I have those stats!

Thanks a bunch! :D

PostPosted: Mon May 03, 2021 11:34 pm
by [violet]
SherpDaWerp wrote:Yet another bump on the request to have "amount unbid" available on the deck info query.
i.e. this query would also have an additional field: <UNBID> ... </UNBID> which would have the information available here, at the bottom of the page.

If this is an unfeasible addition for whatever reason, just tell me so I stop bumping the request.

Is this not already available via the "asksbids" shard? Admittedly you do need to total up the individual bids.

I'd prefer not to stick that number into the "info" shard as well, because looking up trades is a relatively heavy query, which we want to perform only when people are actually seeking that data.

PostPosted: Tue May 04, 2021 1:00 am
by SherpDaWerp
[violet] wrote:
SherpDaWerp wrote:Yet another bump on the request to have "amount unbid" available on the deck info query.
i.e. this query would also have an additional field: <UNBID> ... </UNBID> which would have the information available here, at the bottom of the page.

If this is an unfeasible addition for whatever reason, just tell me so I stop bumping the request.

Is this not already available via the "asksbids" shard? Admittedly you do need to total up the individual bids.

I'd prefer not to stick that number into the "info" shard as well, because looking up trades is a relatively heavy query, which we want to perform only when people are actually seeking that data.

Requesting asksbids definitely works for my usage, but I figured "amount available to spend" was "deck info" and better to be on the "info" query - less API calls for me to make, less time spent waiting for the ratelimit. That said, my reasoning there was "makes sense to a user" - I didn't realise the only way the server knows a player's available bank was to check all their existing bids.

(So every time you make a bid, the server has to look up all your existing bids to check if you've got the money? oof)

PostPosted: Thu May 13, 2021 7:39 am
by Eluvatar
Due to recent spammers we've had to add ratelimiting to the posting of dispatches. This applies equally to the dispatches API.

Script authors may need to add delays that were previously unnecessary to their scripts.

As failed attempts count toward the rate limit, I would also advise script authors to update their scripts to back off and wait if they get the error.

PostPosted: Thu May 13, 2021 7:45 am
by Valentine Z
Eluvatar wrote:Due to recent spammers we've had to add ratelimiting to the posting of dispatches. This applies equally to the dispatches API.

Script authors may need to add delays that were previously unnecessary to their scripts.

As failed attempts count toward the rate limit, I would also advise script authors to update their scripts to back off and wait if they get the error.

That is a welcome change, thank you very much!

PostPosted: Thu May 13, 2021 7:52 am
by The Northern Light
Eluvatar wrote:Due to recent spammers we've had to add ratelimiting to the posting of dispatches. This applies equally to the dispatches API.

Script authors may need to add delays that were previously unnecessary to their scripts.

As failed attempts count toward the rate limit, I would also advise script authors to update their scripts to back off and wait if they get the error.

Could you provide some more details on how exactly the rate limit works. In particular:

1. What is the exact limit?
2. Does it apply to only posting, or also editing dispatches?

It looks like this nation has not been able to post any dispatches through the API for the past day---there have been a few attempts spread out over the past 24 hours---with the following error popping up (presumably due to the new ratelimit):
Code: Select all
<NATION id="the_northern_light">
<ERROR>Your nation is attempting to issue many announcements in a short period of time. Please wait for the international press to catch their breath, then try again.</ERROR>
</NATION>

The strange thing is that the nation can post dispatches through the standard gameside interface.

PostPosted: Thu May 13, 2021 7:58 am
by Eluvatar
The exact limit will vary based on a nation's population.

I'm not looking at the code right this second but as best as I understand the limit is shared between posting dispatches, telegrams, and RMB messages. If your nation is sending telegrams concurrently with creating dispatches, that could be an issue.

PostPosted: Fri May 14, 2021 5:58 am
by Porde
Will a nation population ever hit 100 billion?