NATION

PASSWORD

Access Denied

Bug reports, general help, ideas for improvements, and questions about how things are meant to work.
User avatar
NewTexas
Spokesperson
 
Posts: 174
Founded: Antiquity
Civil Rights Lovefest

Access Denied

Postby NewTexas » Sat Jun 13, 2009 5:43 pm

We just got this:

Access Denied
Your IP address has been banned from NationStates.

OK
You don't have permission to access /cgi-bin/index.cgi/region=texas on this server.


:?: :?: :?:
Big Tex
Governor of Texas

Author: NSDossier

User avatar
[violet]
Site Admin
 
Posts: 15581
Founded: Antiquity

Re: Access Denied

Postby [violet] » Sat Jun 13, 2009 5:54 pm

I just banned an IP address which a script is using to hit the server particularly hard (hundreds of requests per minute). It's coming from Austin, I believe. Is that you?

User avatar
NewTexas
Spokesperson
 
Posts: 174
Founded: Antiquity
Civil Rights Lovefest

Re: Access Denied

Postby NewTexas » Sat Jun 13, 2009 6:28 pm

Sorry, yes that was us. Was checking out a problem in the new feature - The North Pacific, amongst others, do not show up on the list of World's Largest Woodchipping regions.

If you don't want us doing that, we will stop. It has never come up in the last 6 years.

:bow:
Big Tex
Governor of Texas

Author: NSDossier

User avatar
[violet]
Site Admin
 
Posts: 15581
Founded: Antiquity

Re: Access Denied

Postby [violet] » Sat Jun 13, 2009 7:25 pm

I'm glad to support external sites that do neat things with NS data, but it needs to happen in a server-friendly way. The page your script was loading 50-100 times per minute (the alphabetical list of nations) is one of the most disk-intensive in the game. Or was, until five minutes ago, when I finally fixed the code. But still, your script needs to slow down.

If there is a particular data set you'd like to grab on a regular basis, and if you're using that data to compile some kind of public offering that NS players can use, let me know and I might be able to fix you an API, for a more efficient transfer.

I'll unblock your IP addy on the provisio that your script doesn't hit us so often.
Last edited by [violet] on Sat Jun 13, 2009 7:26 pm, edited 1 time in total.

User avatar
[violet]
Site Admin
 
Posts: 15581
Founded: Antiquity

Re: Access Denied

Postby [violet] » Sat Jun 13, 2009 7:46 pm

Actually, I have a bugfix for you, too. Your script makes requests like this:
Code: Select all
/page=list_nations/nation=/start=17400

The "nation=" part makes our server not just compile a list of nations but also sort through it to exclude nations whose names don't match, in this case, nothing. If you don't actually want to match on names, don't include the "nation=" parameter.

User avatar
NewTexas
Spokesperson
 
Posts: 174
Founded: Antiquity
Civil Rights Lovefest

Re: Access Denied

Postby NewTexas » Sat Jun 13, 2009 8:33 pm

Thank you [violet]. :bow:

We will be good. We usually only do all the regions of the world and nations of the world once a week. Thank you for the tip on the "nation=" option.

All we are doing is scraping The World pages for the nation and regions data on there. Order is irrelevant.

We make great use of the xml feeds with our NSDossier Tool.

If you are offering perhaps another option to screen-scraping, then all we are looking for is the data that is on The World pages.

Nations could look like:
Code: Select all
<nations>
  <nation>
    <name>newtexas</name>
    <motto>Home of Big Tex!</motto>
    <wa>WA</wa>
  </nation>
  <nation>
    <name>frisbeeteria</name>
    <motto>Death to all fanatics!</motto>
    <wa></wa>
  </nation>
</nations>


Regions could look like:
Code: Select all
<regions>
  <region>
    <name>texas</name>
    <count>219</count>
    <delegate>newtexas</delegate>
  </region>
  <region>
    <name>wysteria</name>
    <count>229</count>
    <delegate>quintessence_of_dust</delegate>
  </region>


Just two files available like the other xml feeds.

We are working on a new program titled NSHistory that will be a search engine across that very data we have been scraping for 6 years. It will tell you things like every Delegate for any given region and what dates, when nations started, what their mottos have been and what variations exist on a name. Of course, it will be limited to the 3 fields for each entity described above plus a date since that is all we scraped off the site.

Anything you can do would be awesome.

And, thank you for unbanning us. We would hate to lose Texas and our elected 1719 days ago status (remember the big hard drive crash 1719 days ago?).

Big Tex
President of Texas
Big Tex
Governor of Texas

Author: NSDossier

User avatar
Eluvatar
Site Admin
 
Posts: 2518
Founded: Mar 31, 2006
New York Times Democracy

Re: Access Denied

Postby Eluvatar » Sun Jun 14, 2009 9:21 am

While we're discussing the scraping of data from nationstates.net, I've often run into confusion as to how I should organize my own script's data-gathering.

I have been given directives to not query more often than once every 3 second or once every 5, but I'm also now curious if I might be able to minimize load another way:

If my script were to group requests into batches, where it would make several requests sequentially (recycling the TCP connection) and make a greater pause between such batches, might that not be better?

I'm pondering setting it up so that it batches into up to 10 requests, separated by 5-10 seconds. Would that be reasonable? Or should I adjust that? :bow:



In terms of useful interfaces, is there any chance of information like National Events, Endorsements, Regional Events, WA events etc making their way into an API? That could if done right help to reduce the impact of these things.
Last edited by Eluvatar on Sun Jun 14, 2009 9:26 am, edited 1 time in total.
To Serve and Protect: UDL

Eluvatar - Taijitu member

User avatar
New South Hell
Spokesperson
 
Posts: 161
Founded: Feb 15, 2008
Ex-Nation

Re: Access Denied

Postby New South Hell » Sun Jun 14, 2009 1:44 pm

Speaking as someone who is pretty much a beginner at scripting NS, I'd like to request that any existing API's which cannot be discovered by monitoring the input and output of the game's player interfaces be publicly documented. Or, if such documentation already exists, that its location be placed in FAQs, Forum links, or just about any place that search engines can find it, for reference by those like me who are just getting started. And that any new APIs be treated similarly.

Many thanks for considering it.

User avatar
Eluvatar
Site Admin
 
Posts: 2518
Founded: Mar 31, 2006
New York Times Democracy

Re: Access Denied

Postby Eluvatar » Mon Jun 15, 2009 10:58 pm

I'm aware of National and Regional XML feeds and 3 kinds of National RSS feeds. I don't believe there are any APIs beyond that, and I think those are fairly self-explanatory.

Are there any APIs I missed?
To Serve and Protect: UDL

Eluvatar - Taijitu member

User avatar
Tsrill
Bureaucrat
 
Posts: 59
Founded: Antiquity
Ex-Nation

Re: Access Denied

Postby Tsrill » Thu Jun 18, 2009 3:39 am

I have a script making a request about once every 5 seconds scanning TSP, which I run at irregular intervals (generally about once a week), I guess I should be fine? Also, I use a custom user-agent with a link to an information page so that I could be found and contacted in case of issues, I hope that is useful as well...

EDIT:

[violet] wrote:Actually, I have a bugfix for you, too. Your script makes requests like this:
Code: Select all
/page=list_nations/nation=/start=17400

The "nation=" part makes our server not just compile a list of nations but also sort through it to exclude nations whose names don't match, in this case, nothing. If you don't actually want to match on names, don't include the "nation=" parameter.

I think I have the same issue...will fix that before my next scan, thanks for the info.
Last edited by Tsrill on Thu Jun 18, 2009 3:45 am, edited 1 time in total.

User avatar
[violet]
Site Admin
 
Posts: 15581
Founded: Antiquity

Re: Access Denied

Postby [violet] » Thu Jun 18, 2009 5:08 pm

The "nationdata.cgi" and "regiondata.cgi" XML feeds mentioned above are the only ones we have at the moment (not counting RSS feeds).

New Texas: If you want those three fields in particular (nation name, type, motto), then continuing to scrape the list as you've been doing (only slower) is probably best, since I've reworked that page to be much more efficient. Your request should be formatted like this:
Code: Select all
http://www.nationstates.net/page=list_nations/sort=alpha/start=X

... where X is a number.

If you don't require all that info, though, we could do things a better way. For example, if all you need is a list of every region and nation in the game, we can generate that without having to actually load all the nations and regions in question. This strikes me as complementary to the nation/regiondata APIs, in that you could have, say, worlddata.cgi, which gives you a master list of nations and regions, and then you use nationdata.cgi and regiondata.cgi when you want detail on any of those individual nations or regions. This is only going to be more efficient, though, if you intend on calling nation/regiondata.cgi on some of the world's nation/regions, not all of them. Otherwise it would just mean 10x the number of HTTP connections.

Batching to recycle the connection: I'm not sure it makes a great difference either way.

Tsrill wrote:I have a script making a request about once every 5 seconds scanning TSP, which I run at irregular intervals (generally about once a week), I guess I should be fine?

That's good spacing, but it's an inefficient page for us. What information are you trying to extract? Names of region members, or more detail?

Also, I use a custom user-agent with a link to an information page so that I could be found and contacted in case of issues, I hope that is useful as well...

Yes, definitely. That provides a way for me to contact you; otherwise, as happened with New Texas, my only way of dealing with the situation is blocking the IP address.

User avatar
NewTexas
Spokesperson
 
Posts: 174
Founded: Antiquity
Civil Rights Lovefest

Re: Access Denied

Postby NewTexas » Fri Jun 19, 2009 4:21 pm

[violet], we are sure sure someone would find value in a raw list of all the regions and all the nations. It would be quite complimentary to the nationdata and regiondata XML feeds. We could make use of it if it helps offload some traffic from the main site. If it doesn't, you might consider putting the feeds on a different server.

Like TSrill, we dialed back our hits to a page every 5 seconds. We can scrape all the regions in a little less than 1 hour and all the nations in about 7 hours. We ran on the 17th if you want to see if we stressed out the system.

The custom user Agent is a nice idea. We will incorporate that. We certainly do not want to get banned again. :)
Big Tex
Governor of Texas

Author: NSDossier

User avatar
Tsrill
Bureaucrat
 
Posts: 59
Founded: Antiquity
Ex-Nation

Re: Access Denied

Postby Tsrill » Sat Jun 20, 2009 10:26 pm

[violet] wrote:
Tsrill wrote:I have a script making a request about once every 5 seconds scanning TSP, which I run at irregular intervals (generally about once a week), I guess I should be fine?

That's good spacing, but it's an inefficient page for us. What information are you trying to extract? Names of region members, or more detail?


What I need is endorsement data, not only the number of endorsements WA nations have received, but also the endorsements they have given. So what I do is go through the list of nations, detect which nations have WA status and access those individual nation pages to find out who endorses who. I couldn't use the regiondata.cgi as for large regions it does not contain a list of nations. Also as far as I know the regiondata.cgi has no indication of WA status, so even if the XML feed had the nations listed, using it would result in considerably more requests ( more than 2000, instead of the 300-400 I need now). The nations list does not have to be sorted for me.


Advertisement

Remove ads

Return to Technical

Who is online

Users browsing this forum: No registered users

Advertisement

Remove ads