NATION

PASSWORD

Grabbing flags by HTML.

Bug reports, general help, ideas for improvements, and questions about how things are meant to work.
User avatar
Valentine Z
Postmaster-General
 
Posts: 13034
Founded: Nov 08, 2015
Scandinavian Liberal Paradise

Grabbing flags by HTML.

Postby Valentine Z » Sun Sep 06, 2020 7:16 pm

Hello to you!

So this is just one of the other things I have been doing as a hobby. As a shameless plug, I am the same person that uses the API requests to get all the data that I need for my NS Stats Analysis Thread. One of the things that I have managed to get, and I am finally using as of now, are the flag URLs. That is, I am using this shard (https://www.nationstates.net/cgi-bin/api.cgi?nation=valentine%20z&q=flag) on top of others to get the URL.

Here's a sample of the data that I have obtained, out of 200,000+ nations. Some of the links might have been dead given the age of the data, but that's not the main issue with me.

Image


So I am only using these URLs, no more API-side for flag downloading. I have a small piece of code that accepts these URLs and download them into my computer. May I ask if is there a limit to how fast I can download these flags? I was directed at OSRS regarding these special HTML scripts but I just want to make sure and to get a clarification.

In short, am I restricted to 10 requests per minute with an user agent? I did a short test run with 600 nations (going through 600 URLs) while exceeding the limit and without an user agent, and nothing seems to have stopped me. But I don't want to take risks, and to be on the safe side.

Thanks!
Last edited by Valentine Z on Mon Sep 07, 2020 12:24 am, edited 4 times in total.
Val's Stuff. ♡ ^_^ ♡ For You
If you are reading my sig, I want you to have the best day ever ! You are worth it, do not let anyone get you down !
Glory to De Geweldige Sierlijke Katachtige Utopia en Zijne Autonome Machten ov Valentine Z !
(✿◠‿◠) ☆ \(^_^)/ ☆

Issues Thread Photography Stuff Project: Save F7. Stats Analysis

The Sixty! Valentian Stories! Gwen's Adventures!

• Never trouble trouble until trouble troubles you.
• World Map is a cat playing with Australia.
Let Fate sort it out.

User avatar
[violet]
Executive Director
 
Posts: 16206
Founded: Antiquity

Postby [violet] » Sun Sep 06, 2020 10:52 pm

In general, any bot that hits the HTML site has to have an appropriate UserAgent and stay within the harsh 10 requests per minute limit. But images are mostly served up by our CloudFlare CDN, and we don't even see the request. When this is happening, you get this among the response headers:

Code: Select all
CF-Cache-Status: HIT


So do apply an informative UserAgent, because that's always important, but if you're getting CloudFlare cache HITs (which you should), you don't need to count those toward the rate limit.

Valentine Z wrote:I did a short test run with 600 nations (going through 600 URLs) while exceeding the limit and without an user agent, and nothing seems to have stopped me.

There isn't much regulation of the HTML site because it's meant for humans. When bad bots go there, we tend to track them down afterward, not as it's happening.

User avatar
Valentine Z
Postmaster-General
 
Posts: 13034
Founded: Nov 08, 2015
Scandinavian Liberal Paradise

Postby Valentine Z » Mon Sep 07, 2020 6:19 am

Seems like the flag images are from Cloudfare. I did this for maybe 6 of the flags in the list using this site.

Code: Select all
CF-Cache-Status: Hit
The resource requested is in Cloudflare's CDN cache, and was served from there.

CF-Ray: [Redacted just in case] (Toronto, ON, Canada)
Cache-Control: max-age=2592000


Reppy's flag was a miss, but that was because the URL I used was obsolete (as I said, old data! ^^) With that said, I will still add a header if that's okay, and seems like there's no limit to it the rate then? This is part of the code for reference-sake:

Code: Select all
main_nation = "Valentine Z"
headers = {'User-Agent': "Valentine Z's Flag Retriever for NationStates, in use by " + main_nation,}

def retrieve_and_save_flag(index, name, url): # Index and name are just for me to categorise and make sure that I don't lose track of the flags.
   
    r = requests.get(url, headers = headers)

    flag_url = url

    response = requests.get(flag_url)
    img = Image.open(BytesIO(response.content))
   
    urllib.request.urlretrieve(flag_url, "./Flags/" + str(index) + "_" + str(name) + ".jpg") # Saves under a folder, as a JPEG file.
   
    pass
EDIT: Written in Python.

Please do advice if the underlined statement is good-to-go. Thank you once again! :D
Last edited by Valentine Z on Mon Sep 07, 2020 6:26 am, edited 4 times in total.
Val's Stuff. ♡ ^_^ ♡ For You
If you are reading my sig, I want you to have the best day ever ! You are worth it, do not let anyone get you down !
Glory to De Geweldige Sierlijke Katachtige Utopia en Zijne Autonome Machten ov Valentine Z !
(✿◠‿◠) ☆ \(^_^)/ ☆

Issues Thread Photography Stuff Project: Save F7. Stats Analysis

The Sixty! Valentian Stories! Gwen's Adventures!

• Never trouble trouble until trouble troubles you.
• World Map is a cat playing with Australia.
Let Fate sort it out.

User avatar
Trotterdam
Postmaster-General
 
Posts: 10543
Founded: Jan 12, 2012
Left-Leaning College State

Postby Trotterdam » Mon Sep 07, 2020 12:35 pm

Best would be to have the code check for "CF-Cache-Status: HIT" in the response headers. If found, go right ahead with the next flag without worrying about ratelimits. If not found, wait ten seconds, or however long [violet] thinks is appropiate.

Just because 99% of flags are cached on CloudFlare isn't a guarantee all of them are (very-newly-uploaded flags, for example), and it's possible some unusual circumstance might cause the cache to be out of comission, so it's good to be prepared just in case.

User avatar
Valentine Z
Postmaster-General
 
Posts: 13034
Founded: Nov 08, 2015
Scandinavian Liberal Paradise

Postby Valentine Z » Mon Sep 07, 2020 6:20 pm

Trotterdam wrote:Best would be to have the code check for "CF-Cache-Status: HIT" in the response headers. If found, go right ahead with the next flag without worrying about ratelimits. If not found, wait ten seconds, or however long [violet] thinks is appropiate.

Just because 99% of flags are cached on CloudFlare isn't a guarantee all of them are (very-newly-uploaded flags, for example), and it's possible some unusual circumstance might cause the cache to be out of comission, so it's good to be prepared just in case.

Sounds like a plan! Thanks! ^^
Val's Stuff. ♡ ^_^ ♡ For You
If you are reading my sig, I want you to have the best day ever ! You are worth it, do not let anyone get you down !
Glory to De Geweldige Sierlijke Katachtige Utopia en Zijne Autonome Machten ov Valentine Z !
(✿◠‿◠) ☆ \(^_^)/ ☆

Issues Thread Photography Stuff Project: Save F7. Stats Analysis

The Sixty! Valentian Stories! Gwen's Adventures!

• Never trouble trouble until trouble troubles you.
• World Map is a cat playing with Australia.
Let Fate sort it out.


Return to Technical

Who is online

Users browsing this forum: Havensky

Advertisement

Remove ads