Page 1 of 1

[Q] Sending Unicode characters such as ❖ over dispatch API

PostPosted: Sun Apr 19, 2020 2:09 pm
by Bowzin
I'm working on an automated dispatch posting tool, however I am having an issue where any unicode characters I am sending to the API are coming through as boxes and other weird characters. I am not encoding them in any form, in fact when I send them via cURL to my own web pages, the ❖ comes through. Is there some server side encoding that we aren't able to get around?

PostPosted: Sun Apr 19, 2020 4:15 pm
by Frisbeeteria
There are a large number of UniCode character sets: UTF-8 and many others. Since this site started in 2002 and has been added to on an irregular basis ever since; we do not have a single UTF standard across the entire site. It would take a massive rebuild of the older portions to make it uniformly compliant, and [violet] decided there were better uses for her time.

In any given segment or element of the site, you will simply have to see what works and what doesn't. Sorry.

PostPosted: Sun Apr 19, 2020 5:05 pm
by Bowzin
Frisbeeteria wrote:There are a large number of UniCode character sets: UTF-8 and many others. Since this site started in 2002 and has been added to on an irregular basis ever since; we do not have a single UTF standard across the entire site. It would take a massive rebuild of the older portions to make it uniformly compliant, and [violet] decided there were better uses for her time.

In any given segment or element of the site, you will simply have to see what works and what doesn't. Sorry.

hmm...its just weird considering ❖ works everywhere I've tried, including dispatches, but I can't send it over the API, oh well thanks for the response.

PostPosted: Sun Apr 19, 2020 6:17 pm
by [violet]
This is now fixed. I hadn't added UTF8 support to the API because until recently you couldn't upload any content to it. But now you can post Dispatches, so it's needed.

There still may be a few oddities because, as Fris says, our character encoding is a bit of a mess. But the API should now be consistent with the rest of the site.

With this change, I have also bumped the API version number to 11. If you require the old method (i.e. no UTF8 support), you should request version 10 or earlier via the API's "v" parameter: https://www.nationstates.net/pages/api.html#versions

PostPosted: Sun Apr 19, 2020 6:19 pm
by Bowzin
[violet] wrote:This is now fixed. I hadn't added UTF8 support to the API because until recently you couldn't upload any content to it. But now you can post Dispatches, so it's needed.

There still may be a few oddities because, as Fris says, our character encoding is a bit of a mess. But the API should now be consistent with the rest of the site.

With this change, I have also bumped the API version number to 11. If you require the old method (i.e. no UTF8 support), you should request version 10 or earlier via the API's "v" parameter: https://www.nationstates.net/pages/api.html#versions

Thanks <3

PostPosted: Sun Apr 19, 2020 7:35 pm
by Bowzin
So I am still having issues, except now its just posting ?'s

There is definitely a chance this is on my end right now, but thought I'd put it out there while I troubleshoot just to see.


EDIT: Pretty sure I am sending the UTF-8 characters properly, still trying the diamond thing, getting ?'s. I guess its progress from the boxes but still something up

PostPosted: Sun Apr 19, 2020 10:44 pm
by [violet]
Post an example of what's not working, if you can.

PostPosted: Sun Apr 19, 2020 11:55 pm
by Bowzin
https://www.nationstates.net/page=dispatch/id=1349078
Each question mark is a seperate one in that example

Here's editing in one: https://www.nationstates.net/page=dispatch/id=1349080

PostPosted: Wed Apr 22, 2020 10:13 pm
by Bowzin
Any updates or anything else you need me to do?

PostPosted: Wed Apr 22, 2020 10:20 pm
by [violet]
At the moment I don't know what you're attempting to post. Can you please do this:

1. Create a dispatch with your desired text via the regular website. So this presumably looks right. (If it doesn't, it isn't an API issue.)

2. Create a duplicate dispatch with the exact same text via the API. This presumably looks wrong.

PostPosted: Thu Apr 23, 2020 12:36 am
by Racoda
(Not OP)

I did a few tests from the command line/curl.

Code: Select all
curl -H "X-Pin: ####" -A "CLI test" "https://www.nationstates.net/cgi-bin/api.cgi" --data "nation=rsca&c=dispatch&dispatch=add&title=U2756%20UrlEncoded&category=1&subcategory=105&mode=execute&token=0123456abcdef" --data-urlencode "text=Test: ❖"

Publishing a dispatch with ❖ results in the character becoming a question mark: Test: ?


Code: Select all
curl -H "X-Pin: ####" -A "CLI test" "https://www.nationstates.net/cgi-bin/api.cgi" --data "nation=rsca&c=dispatch&dispatch=add&title=U2756%20escaped%20UrlEncoded&category=1&subcategory=105&mode=execute&token=0123456abcdef" --data-urlencode "text=Test: &#10070;"

However, escaping ❖ to be &#10070; does work (bug? feature?): the result is Test: ❖

PostPosted: Thu Apr 23, 2020 1:20 am
by Bowzin
Racoda wrote:(Not OP)

I did a few tests from the command line/curl.

Code: Select all
curl -H "X-Pin: ####" -A "CLI test" "https://www.nationstates.net/cgi-bin/api.cgi" --data "nation=rsca&c=dispatch&dispatch=add&title=U2756%20UrlEncoded&category=1&subcategory=105&mode=execute&token=0123456abcdef" --data-urlencode "text=Test: ❖"

Publishing a dispatch with ❖ results in the character becoming a question mark: Test: ?


Code: Select all
curl -H "X-Pin: ####" -A "CLI test" "https://www.nationstates.net/cgi-bin/api.cgi" --data "nation=rsca&c=dispatch&dispatch=add&title=U2756%20escaped%20UrlEncoded&category=1&subcategory=105&mode=execute&token=0123456abcdef" --data-urlencode "text=Test: &#10070;"

However, escaping ❖ to be &#10070; does work (bug? feature?): the result is Test: ❖

hmmm...I'll give that a shot
The original dispatch that triggered this:
https://www.nationstates.net/page=dispatch/id=1345798
Here's the API posted version:
https://www.nationstates.net/page=dispatch/id=1347428

PostPosted: Sun Apr 26, 2020 11:56 pm
by Bowzin
Any updates on this? Let me know if you need anything else. Escaping the characters to their numeric value isn't easy to do when they're in a huge block of text with PHP.