NATION

PASSWORD

Extreme update lag

Bug reports, general help, ideas for improvements, and questions about how things are meant to work.

Advertisement

Remove ads

User avatar
Ever-Wandering Souls
Negotiator
 
Posts: 7267
Founded: Jan 01, 2014
Father Knows Best State

Postby Ever-Wandering Souls » Mon Jan 10, 2022 5:17 pm

Eluvatar wrote:
[violet] wrote:At this point, I suspect the problem lies in the Activity log, which became backlogged attempting to record a very large number of entries. We haven't identified the root cause, but so far the behavior hasn't repeated.


I have some hypotheses for which data on happenings that happened more than 6 days ago would be helpful. As a gameplayer, I used to log all happenings for purposes of analyzing update, game mechanics, etc. If any players happen to have been doing that recently, I would much appreciate the submission of a log of happenings since the start of the year over GHR. If nobody has it, that's okay, there are other avenues for analysis they just might take longer.


I'm fairly certain I don't have anything even close to all happenings, but I may have something related to certain types of happenings going back a little bit, as part of the daemon that feeds our TG recruitment targeting. Long shot, but - is there a specific sort of data you're looking for?
Proud Raider; General of The Black Hawks, Ret.
TG me anytime; I'm always happy to talk about anything!

The Alicorns (Equestria) wrote:Let them stay, no need to badmouth them...From our view a bunch of nations just came in, seized the delegate position, and changed a few superficial things...we play NationStates differently...there's really no reason for us to be butthurt.
http://www.nationstates.net/page=rmb/postid=8944227
http://www.nationstates.net/page=rmb/postid=8951258

Misley wrote:
Hobbesistan wrote:Don't think I understand the question.
The color or what?..

Jesus, Hobbes, it's 2015. You can't just call someone "the color".

Reploid Productions wrote:Raiders are endlessly creative

How Do I Telegram API?

Omnis delenda est.

User avatar
Eluvatar
Director of Technology
 
Posts: 3086
Founded: Mar 31, 2006
New York Times Democracy

Postby Eluvatar » Mon Jan 10, 2022 6:39 pm

Ever-Wandering Souls wrote:
Eluvatar wrote:
I have some hypotheses for which data on happenings that happened more than 6 days ago would be helpful. As a gameplayer, I used to log all happenings for purposes of analyzing update, game mechanics, etc. If any players happen to have been doing that recently, I would much appreciate the submission of a log of happenings since the start of the year over GHR. If nobody has it, that's okay, there are other avenues for analysis they just might take longer.


I'm fairly certain I don't have anything even close to all happenings, but I may have something related to certain types of happenings going back a little bit, as part of the daemon that feeds our TG recruitment targeting. Long shot, but - is there a specific sort of data you're looking for?

The timestamp of each happening (and, ideally, what that happening was) for every individual happening going back to 2022-01-01T00:00:00-05:00.
To Serve and Protect: UDL

Eluvatar - Taijitu member

User avatar
Quebecshire
Ambassador
 
Posts: 1911
Founded: Mar 17, 2017
Democratic Socialists

Postby Quebecshire » Tue Jan 11, 2022 10:53 am

This was just pointed out to me, but apparently 2,300+ (2,300 as of like 20 minutes ago) nations have been founded recently, and update is running 30 seconds late at max (so effectively, update is running pretty close to normal).

So continuing to what Roavin said about China, we'd think the feed would be damaging here. Personally, I still haven't seen substantial lag of any kind since our last siege.
Last edited by Quebecshire on Tue Jan 11, 2022 10:53 am, edited 1 time in total.
PATRIOT OF THE LEAGUE REDEEMER OF CONCORD
Defender Moralist | Consul of the LDF | Warden-Lieutenant Emeritus | Commended
Benevolent Thomas wrote:I founded a defender organization out of my dislike of invaders, what invading represents, and my desire to see them suffer.
Pergamon wrote:I must say, you are truly what they deserve.

User avatar
WayNeacTia
Senator
 
Posts: 4330
Founded: Aug 01, 2014
Ex-Nation

Postby WayNeacTia » Fri Jan 14, 2022 3:34 pm

[violet] wrote:I've also contacted a few bot authors, because those have gotten out of control again. We have four in particular at the moment that are either illegal or broken (i.e. sending meaningless requests at maximum speed).

There's also something that identifies as"<Nation: NAME> Recruitment Tool." This looks legal and above board, but it's pretty inefficient, since it's being operated by multiple different nations, and they all look up the exact same Activity log data over and over every second or two. I don't know what this is or what it does, but it's a fair bit of load on that part of update.

Is it possible someone is running 4 nations on VPN’s trying to hide their I.P’s so they can send recruitment messages four times faster?
Sarcasm dispensed moderately.
RiderSyl wrote:You'd really think that defenders would communicate with each other about this. I know they're not a hivemind, but at least some level of PR skill would keep Quebecshire and Quebecshire from publically contradicting eac

wait

User avatar
Alfonzo
Envoy
 
Posts: 223
Founded: Dec 27, 2020
Ex-Nation

Postby Alfonzo » Sun Jan 16, 2022 8:00 pm

Hello again,

I think it's also worth pointing out last major there was brief lag at the beginning of update. It got to the point where we got server errors due to how long the loadtimes were, as they went as long as 18248 ms/18 seconds in a particular case. It disappeared as update went on, however, it is worth noting this was a smaller group of 20+ people moving into a jump point, which is no where near the total people involved in the last big operation. Hopefully, this isn't a sign of the issue getting worse if 20 people staging seem to be causing severe effects on the servers.
Last edited by Alfonzo on Sun Jan 16, 2022 8:02 pm, edited 1 time in total.
✯ ✯ ✯ In War, Victory. In Peace, Vigilance. In Death, Sacrifice: TGW ✯ ✯ ✯
Made ya look!

User avatar
Garchyland
Lobbyist
 
Posts: 18
Founded: Oct 12, 2007
Left-wing Utopia

Postby Garchyland » Mon Jan 17, 2022 3:38 pm

Alfonzo wrote:Hello again,

I think it's also worth pointing out last major there was brief lag at the beginning of update. It got to the point where we got server errors due to how long the loadtimes were, as they went as long as 18248 ms/18 seconds in a particular case. It disappeared as update went on, however, it is worth noting this was a smaller group of 20+ people moving into a jump point, which is no where near the total people involved in the last big operation. Hopefully, this isn't a sign of the issue getting worse if 20 people staging seem to be causing severe effects on the servers.


This would also essentially mean that, although raiding/defending is legal in the game, there is no way to liberate a region like mine effectively. A great win for raiders.

User avatar
Kanta Hame
Civil Servant
 
Posts: 10
Founded: Jun 27, 2012
Inoffensive Centrist Democracy

Anything

Postby Kanta Hame » Tue Jan 18, 2022 3:48 am

I guess i brake my forum radio silence.

Would be nice to hear some kind of update but things seem to be swept under rug.

So what we do know?

"Since then, update has more or less hit target length, but it did experience an unusual stall on Jan 7. At this point, I suspect the problem lies in the Activity log, which became backlogged attempting to record a very large number of entries. We haven't identified the root cause, but so far the behavior hasn't repeated.

Update is the single most load-intensive point we have, so it is sensitive to other load-generating events that would otherwise go unnoticed."

What ever that normally causes these did not happen. Those are visible in feed or would have been visible in in there. There was no large update bending region. That would have been visible in feed as such region would have required large influx of nations that would have updated in region causing influence change being in feed, nor slow down the page NS it self updates nations, just bend how long it takes from trigger to update to target to update.

That option is counted out.

There is no notable amount more moves and endos to cause this. Thus far we have not been able to move large amount of nations enough to cause affect on update. over hundred nations have moved multiple times before and before ALH with out any issue. Also update started slowing BEFORE nations moves in. Triggers was delayed and there were discussion how update froze before trigger and how there is no time adapt. Slowdown start before moving ergo moving is not issue. Beside first jump when password was still on everyone had to cross, enter password, move AND endorse aka more clicking and endorsing than attempts before yet that update was running completely fine.

Same actions causes same results. Nothing is done different from how libs have been done previously beside lib target have not been ALH since it have been password protected. 20-40 more jumpers per update than recent ops since November and new location to jump has been only differences from our behalf. Adds in numbers and defending liberation new regions has not caused notable difference before, there is no reason it would be now either.

There is no large amount of new nations. Or rather there were no large amounts of nation. Soon as we stopped jumping in, Few thousand nations are created hour before / during first half of update before spawn regions start update. And update does not care. Everything runs completely normally and can do small operations with out issues. Next update there is 10000 new nations waiting to update. Once again its not enough to cause slightest issue on update and update runs on time as it should, only more variation around feeders due high influx of new nations, then catching up and running as it should. So Thousands of new nations popping in more than normally is not enough to cause slightest effect.

None of these updates there have been nothing interesting or out of ordinary in feed. Some might heard i have won some awards on spotting witch is due i stare a feed a lot. Tecnicly every day nearly every update for past... 6 years or so on daily basis. If there would be anything out of ordinary that is visible for eye, that could be pointed and then looked, tested, counter acted. Instead what as much feed considers it goes like this. Things are going normal. Nothing out if interest happen beside as we wait trigger to go update turns to crawl seemingly with out no reason making this eeny tiny section of update taking instead of 10-15 seconds taking anything between minute and half to seven minutes and then continuing normally as nothing would have happen.

so what ever those request are, is not anything that comes up from activity that is visible in feed. So far as i understand that limits options to something like TGs, Challenges etc that does not leave anything visible. If you try send TGs too fast NS will prevent it as spam protection method. Would still repeat attempt cause it potentially?.

Also i do not know how well challenges is patched. I have not really played with it since it was added. But how i assume its patched it now ask confirmation that i am not bot. How ever after i do that, it will not repeat, so that point onward i could do as many i want in row without being asked again. So if i understand correctly could for example manually check it then let bot take over. Optionally i noticed that when you start you could for example repeatedly click OK button and doing so can also prevent challenge progressing as it will not continue until stopped and waited until it starts challenge properly. So both of those could still be abused to cause lag. Can those be tested to see if would cause the same effect?

Biggest concern here is how targeted this issue seemingly was. We have had numerous issues before update borked by un intended actions to do. Massive influx of players or someone using bots to do massive amounts of actions in same time. Common nominator is that their said actions are done when it has been convenient to do for such players and not specifically targeted to goof up update, but update goofs up as result of indented action. As result either whole update have lagged or random points that hurts everyone equally and runs longer period of times in row instead of few minutes on very specific time.

Even if i agree there is small theoretical chance this was not done intentionally to prevent liberation but some random player doing random actions, timing is too convenient to be random chance. Since everything happens in VERY short time window. Anyone can look at their clock and for someone to do X thing for 12 hour periods its quite un likely. And not exactly 12 hour but seemingly taking account difference of update time between updates of 8 minutes ALH has between minor and major and the small variation it has with itself compering major to another major and minor to another minor update and like said to take effect JUST before supposed move and end after ALH have started update repeatedly for 4 updates in row with out ever happening before and not re appearing after new attempts were stopped is just seemingly looking too targeted to not be intentional. I could be more accepting option of not intention if it would have any random features like happening some others times too than just before jump ALH, or would have not immediately stopped after attempts, or start from second attempt instead before operation started. Excuse me if it looks more intention than random chance but that is just exactly how it looks.

We fight against all possible dis advantages already. We are reactionary force so we do not choose when action happens but react to it. Raiders can call people show up in advance for specific day but hope we have folk for it randomly popping up, look for signs and then ping people in hopes it fits they got time. We cannot move folk in freely on 12 - 24 hour period to click move and endo any point of day its convenient for one, but have to get more people to be online at exactly same time doing coordinated move. Time is not in our side but against us as every passing update makes it harder. Add to that we have fought against systematic cheating before. Years of of illegal script users telling us to get gut rolf lol lfmao. And when something smells bad and lot, i would not really want to walk again in to situation we need to wait until someone comes up and blows cover exposing cheaters and get sorry it wont happen again promise. It would be just same old repeated history so more likely option than random chance. Only likely unknown variable is who, how many and how. That is just simple how looks and makes most sense to happen as it happen.

i know that looking causes can be time taking specially when not knowing where to look. But when everything smells rotten, it is really frustrating to sit and wait clueless is anything being done anymore or is it just we do not know where to look ergo cant progress. Or that options are still being looked yet nothing that would match?

User avatar
Sedgistan
Site Director
 
Posts: 35471
Founded: Oct 20, 2006
Anarchy

Postby Sedgistan » Tue Jan 18, 2022 4:29 am

It's not being swept under the rug. Admin is looking into it - there hasn't been an update from them because there hasn't been anything definitive yet to report back with. We're limited in that there's two staff members capable of looking into this who are active on the site currently (Violet, Elu), and one of them has been on holiday the last week completely without access.

This is a priority. Admin isn't twiddling their thumbs uncaring, or working on other stuff over and above this. The time they have for admin work on NS is focused on fixing this, because none of us want the current situation to continue. A substantial amount of time has already been put into looking into this, and setting up further tools to investigate it.

User avatar
Vando0sa
Chargé d'Affaires
 
Posts: 367
Founded: Mar 08, 2014
Mother Knows Best State

Postby Vando0sa » Tue Jan 18, 2022 4:44 am

Kanta Hame wrote:I guess i brake my forum radio silence.

Would be nice to hear some kind of update but things seem to be swept under rug.

So what we do know?

"Since then, update has more or less hit target length, but it did experience an unusual stall on Jan 7. At this point, I suspect the problem lies in the Activity log, which became backlogged attempting to record a very large number of entries. We haven't identified the root cause, but so far the behavior hasn't repeated.

Update is the single most load-intensive point we have, so it is sensitive to other load-generating events that would otherwise go unnoticed."

What ever that normally causes these did not happen. Those are visible in feed or would have been visible in in there. There was no large update bending region. That would have been visible in feed as such region would have required large influx of nations that would have updated in region causing influence change being in feed, nor slow down the page NS it self updates nations, just bend how long it takes from trigger to update to target to update.

That option is counted out.

There is no notable amount more moves and endos to cause this. Thus far we have not been able to move large amount of nations enough to cause affect on update. over hundred nations have moved multiple times before and before ALH with out any issue. Also update started slowing BEFORE nations moves in. Triggers was delayed and there were discussion how update froze before trigger and how there is no time adapt. Slowdown start before moving ergo moving is not issue. Beside first jump when password was still on everyone had to cross, enter password, move AND endorse aka more clicking and endorsing than attempts before yet that update was running completely fine.

Same actions causes same results. Nothing is done different from how libs have been done previously beside lib target have not been ALH since it have been password protected. 20-40 more jumpers per update than recent ops since November and new location to jump has been only differences from our behalf. Adds in numbers and defending liberation new regions has not caused notable difference before, there is no reason it would be now either.

There is no large amount of new nations. Or rather there were no large amounts of nation. Soon as we stopped jumping in, Few thousand nations are created hour before / during first half of update before spawn regions start update. And update does not care. Everything runs completely normally and can do small operations with out issues. Next update there is 10000 new nations waiting to update. Once again its not enough to cause slightest issue on update and update runs on time as it should, only more variation around feeders due high influx of new nations, then catching up and running as it should. So Thousands of new nations popping in more than normally is not enough to cause slightest effect.

None of these updates there have been nothing interesting or out of ordinary in feed. Some might heard i have won some awards on spotting witch is due i stare a feed a lot. Tecnicly every day nearly every update for past... 6 years or so on daily basis. If there would be anything out of ordinary that is visible for eye, that could be pointed and then looked, tested, counter acted. Instead what as much feed considers it goes like this. Things are going normal. Nothing out if interest happen beside as we wait trigger to go update turns to crawl seemingly with out no reason making this eeny tiny section of update taking instead of 10-15 seconds taking anything between minute and half to seven minutes and then continuing normally as nothing would have happen.

so what ever those request are, is not anything that comes up from activity that is visible in feed. So far as i understand that limits options to something like TGs, Challenges etc that does not leave anything visible. If you try send TGs too fast NS will prevent it as spam protection method. Would still repeat attempt cause it potentially?.

Also i do not know how well challenges is patched. I have not really played with it since it was added. But how i assume its patched it now ask confirmation that i am not bot. How ever after i do that, it will not repeat, so that point onward i could do as many i want in row without being asked again. So if i understand correctly could for example manually check it then let bot take over. Optionally i noticed that when you start you could for example repeatedly click OK button and doing so can also prevent challenge progressing as it will not continue until stopped and waited until it starts challenge properly. So both of those could still be abused to cause lag. Can those be tested to see if would cause the same effect?

Biggest concern here is how targeted this issue seemingly was. We have had numerous issues before update borked by un intended actions to do. Massive influx of players or someone using bots to do massive amounts of actions in same time. Common nominator is that their said actions are done when it has been convenient to do for such players and not specifically targeted to goof up update, but update goofs up as result of indented action. As result either whole update have lagged or random points that hurts everyone equally and runs longer period of times in row instead of few minutes on very specific time.

Even if i agree there is small theoretical chance this was not done intentionally to prevent liberation but some random player doing random actions, timing is too convenient to be random chance. Since everything happens in VERY short time window. Anyone can look at their clock and for someone to do X thing for 12 hour periods its quite un likely. And not exactly 12 hour but seemingly taking account difference of update time between updates of 8 minutes ALH has between minor and major and the small variation it has with itself compering major to another major and minor to another minor update and like said to take effect JUST before supposed move and end after ALH have started update repeatedly for 4 updates in row with out ever happening before and not re appearing after new attempts were stopped is just seemingly looking too targeted to not be intentional. I could be more accepting option of not intention if it would have any random features like happening some others times too than just before jump ALH, or would have not immediately stopped after attempts, or start from second attempt instead before operation started. Excuse me if it looks more intention than random chance but that is just exactly how it looks.

We fight against all possible dis advantages already. We are reactionary force so we do not choose when action happens but react to it. Raiders can call people show up in advance for specific day but hope we have folk for it randomly popping up, look for signs and then ping people in hopes it fits they got time. We cannot move folk in freely on 12 - 24 hour period to click move and endo any point of day its convenient for one, but have to get more people to be online at exactly same time doing coordinated move. Time is not in our side but against us as every passing update makes it harder. Add to that we have fought against systematic cheating before. Years of of illegal script users telling us to get gut rolf lol lfmao. And when something smells bad and lot, i would not really want to walk again in to situation we need to wait until someone comes up and blows cover exposing cheaters and get sorry it wont happen again promise. It would be just same old repeated history so more likely option than random chance. Only likely unknown variable is who, how many and how. That is just simple how looks and makes most sense to happen as it happen.

i know that looking causes can be time taking specially when not knowing where to look. But when everything smells rotten, it is really frustrating to sit and wait clueless is anything being done anymore or is it just we do not know where to look ergo cant progress. Or that options are still being looked yet nothing that would match?



I really hope this is not foul play causing this and just a coincidence. If it turns out someone involved with our raid is responsible for it I'm going to be 10 shades of livid. Especially with me as the raider delegate.

From what I've seen in the super secret op chat we are all just as clueless about what happened with the lag. None of us want to see another Predator leveled disaster.

I want to see another large liberation attempt this weekend to see if it happens again.
Kevät itkee talven töitä Käy hyinen tuulen henki Kevät itkee talven töitä Virta kantaa luita rantaan

User avatar
Kanta Hame
Civil Servant
 
Posts: 10
Founded: Jun 27, 2012
Inoffensive Centrist Democracy

Postby Kanta Hame » Tue Jan 18, 2022 6:23 am

Sedgistan wrote:It's not being swept under the rug. Admin is looking into it - there hasn't been an update from them because there hasn't been anything definitive yet to report back with. We're limited in that there's two staff members capable of looking into this who are active on the site currently (Violet, Elu), and one of them has been on holiday the last week completely without access.

This is a priority. Admin isn't twiddling their thumbs uncaring, or working on other stuff over and above this. The time they have for admin work on NS is focused on fixing this, because none of us want the current situation to continue. A substantial amount of time has already been put into looking into this, and setting up further tools to investigate it.


Thanks for update. That is relief to hear.

User avatar
Reploid Productions
Director of Moderation
 
Posts: 30507
Founded: Antiquity
Democratic Socialists

Postby Reploid Productions » Tue Jan 18, 2022 3:14 pm

Vando0sa wrote:I want to see another large liberation attempt this weekend to see if it happens again.

In all honesty, another large-scale raid or defense event would probably provide some useful data for the techies. Right now we don't know for sure if those bots were the single biggest source of the issue, and we don't necessarily have a good way to test "was the problem a whole lot of random scripts" versus "was the problem so many people hammering the activity page in preparation to jump" without... well, a bunch of people preparing to jump again.
Forum mod since May 8, 2003 -- Game mod since May 19, 2003 -- Nation turned 20 on March 23, 2023!
Sunset's DoGA FAQ - For those using DoGA to make their NS military and such.
One Stop Rules Shop -- Reppy's Sig Workshop -- Getting Help Page
[violet] wrote:Maybe we could power our new search engine from the sexual tension between you two.
Char Aznable/Giant Meteor 2024! - Forcing humanity to move into space and progress whether we goddamn want to or not!

User avatar
[violet]
Executive Director
 
Posts: 16205
Founded: Antiquity

Postby [violet] » Tue Jan 18, 2022 4:39 pm

So let's clarify what we're talking about here!

My main interest in the daily update is its length: major update should run for 90 minutes and minor update for 60 minutes -- no more, no less. In the distant past, updates used to run for all kinds of lengths depending on various factors, but that unreliability made life difficult for R/D, because you couldn't tell in advance how long you'd need your people to stick around.

Since Jan 3rd, when we tamed some bots, daily updates are back to hitting their target times every day. So at that level, everything continues to be fine.

Now there was an isolated server blowout on Jan 7th, triggering this thread, which was about a strange stall that occurred midway through an update. This caused update to run a couple of minutes over, chiefly because of a period during which it ran very very slowly, due to (probably) the overloading of one particular service. This service overload hasn't occurred since then.

However, we now seem to be talking about the issue of daily update not always proceeding at exactly the same speed throughout. But it's important to note that this happens by design. Daily updates share system resources with whatever else is going on in the server at the same time, and they deliberately slow down to make room for other processes when things are busy, then speed up during idle times, as necessary in order to hit the target finish time.

Any kind of major system-wide hang or stall is, of course, a bug. But daily updates aren't intended to run at a constant speed, and update slowdowns/speedups aren't bugs. I haven't ever looked into tracking or managing that.

I don't think R/D has ever asked for assurances regarding the intra-process timing of updates, i.e. limits on how much update can speed up or slow down within a single run. If that's something we need to look at, we can do so, but it would be a new feature.

User avatar
Wymondham
Chargé d'Affaires
 
Posts: 401
Founded: Apr 03, 2017
Libertarian Police State

Postby Wymondham » Tue Jan 18, 2022 5:05 pm

[violet] wrote:So let's clarify what we're talking about here! Now there was an isolated server blowout on Jan 7th, triggering this thread, which was about a strange stall that occurred midway through an update. This caused update to run a couple of minutes over, chiefly because of a period during which it ran very very slowly, due to (probably) the overloading of one particular service. This service overload hasn't occurred since then.

However, we now seem to be talking about the issue of daily update not always proceeding at exactly the same speed throughout. But it's important to note that this happens by design. Daily updates share system resources with whatever else is going on in the server at the same time, and they deliberately slow down to make room for other processes when things are busy, then speed up during idle times, as necessary in order to hit the target finish time.

I think the main issue at play in the thread continues to be why, for example, the update was going fast on the Saturday minor, (the 8th) but then the update hit A Liberal Haven, slowed down by 2 mins, but then sped up again and finished 1 second before it had on the day before. You've noted that the update slowed down due to a server blowout the day before, and people are generally wondering why that server blowout took place, everyone expects variance but what happened with A liberal haven doesn't look like variance as R/Ders have experienced it over the past few years. I think the suggestion that intra processing times be range limited is perhaps more verbosity than anything else, and the main question in the thread for most people is whether the slowdowns which seemed to solely hit A Liberal Haven, and hit the region on days other than the 7th while the siege was ongoing, were due to variance, natural server load, or due to the malicious actions of one player/a group of players.

(my apologies that the above is somewhat of a ramble, its midnight and I should probably be asleep)
Last edited by Wymondham on Tue Jan 18, 2022 5:05 pm, edited 1 time in total.
Doer of the things and the stuffs.
That British dude who does the charity fundraiser.

User avatar
Merni
Ambassador
 
Posts: 1800
Founded: May 03, 2016
Democratic Socialists

Postby Merni » Tue Jan 18, 2022 9:03 pm

[violet] wrote:So let's clarify what we're talking about here!

My main interest in the daily update is its length: major update should run for 90 minutes and minor update for 60 minutes -- no more, no less. In the distant past, updates used to run for all kinds of lengths depending on various factors, but that unreliability made life difficult for R/D, because you couldn't tell in advance how long you'd need your people to stick around.

Since Jan 3rd, when we tamed some bots, daily updates are back to hitting their target times every day. So at that level, everything continues to be fine.

Now there was an isolated server blowout on Jan 7th, triggering this thread, which was about a strange stall that occurred midway through an update. This caused update to run a couple of minutes over, chiefly because of a period during which it ran very very slowly, due to (probably) the overloading of one particular service. This service overload hasn't occurred since then.

However, we now seem to be talking about the issue of daily update not always proceeding at exactly the same speed throughout. But it's important to note that this happens by design. Daily updates share system resources with whatever else is going on in the server at the same time, and they deliberately slow down to make room for other processes when things are busy, then speed up during idle times, as necessary in order to hit the target finish time.

Any kind of major system-wide hang or stall is, of course, a bug. But daily updates aren't intended to run at a constant speed, and update slowdowns/speedups aren't bugs. I haven't ever looked into tracking or managing that.

I don't think R/D has ever asked for assurances regarding the intra-process timing of updates, i.e. limits on how much update can speed up or slow down within a single run. If that's something we need to look at, we can do so, but it would be a new feature.

I think commanders are experienced enough to know that update speed does vary to a reasonable extent. However, the massive slowdowns were not limited to a single update, as is illustrated in Roavin's post:
Roavin wrote:Thanks for the feedback, [v]! I do have some additional data to consider which might help you in your search:

Nakari's graphs only focus on the worst offender of the updates. However, of the four updates in which jumps into the region were performed, three were affected by this mysterious lag.

  • Jan 7 at 12am EST (major): A nominal 14 second trigger became 66 seconds long. Update rate slowed down from over 30 to about 3-9 nations per second at the point of the trigger.
  • Jan 7 at 12am EST (minor): This update did not show any irregularities.
  • Jan 8 at 12am EST (major): This update is the bad one shown in Nakari's graphs
  • Jan 8 at 12am EST (minor): A nominal 12 second trigger became 60 seconds long.

Collecting this data is a lot of work and it's very very difficult to do for minor updates (due to lack of a data dump), so here's additional data for that first major update.
The start of the Rejected Realms updating was our estimated 2.5 minute warning for the target (by watching the activity feed until influence changes for TRR started). The nation Bladetress-22 was placed and estimated to update about 14 seconds before the target.

Relevant select happenings:
(1) 1/7/2022, 6:27:37 AM GMT+1: Bladetress-22's influence in The Rejected Realms rose from "Zero" to "Unproven".


Relevant data extracted from the region dump generated during that update:
  • TRR began updating at 06:24:51 GMT+1
  • Bernadina, the region immediately after TRR, began updating at 06:27:45 GMT+1
  • Bladetress-22 was at position 5687/6146 in TRR
  • 174 nations in 69 regions resided between TRR and A Liberal Haven
  • A Liberal Haven (which I'll abbreviate as ALH going forward) began updating at 06:28:41 GMT+1

TRR's start occurred is 2 minutes and 46 seconds away from the designated trigger. This is within normal range (we estimated 2.5 minutes). However, the time from the trigger to the first ALH happening was 66 seconds, much more than the estimated 14 seconds. Doing a bit of math:
  • 2 minutes and 46 seconds to update the first 5687 nations in TRR => 34.3 nations per second
  • 66 seconds for the 632 nations from trigger to ALH => 9.6 nations per second
  • 8 seconds for the 458 nations from trigger to end of TRR => 57.3 nations per second (actually faster!)
  • 58 seconds for the 174 nations between TRR and ALH => 3 nations per second
Had the rate remained at 34.3 nations per second, the trigger length would have been 18 seconds, which is within normal parameters. The only abnormality I can see is that there were 28 new single-nation regions between TRR and ALH, but that seems an unlikely culprit because then the constant overhead for starting a region update would have to be nearly a second, which is clearly not the case.

----

Also, to provide some context as to why some individuals think this may be foul play: This is a kind of spike that we haven't seen before, and it only happens right around the time a jump into an occupied region takes place. Even more suspiciously, of the four successive updates where a jump takes place, one of them is not affected, that being the weekday minor update which are traditionally the ones were the least amount of gameplayers are available (and, if this is a malicious actor, they would have to be a R/D gameplayer due to the timing). I hope it's just server gremlins, but I can't fault suspicions that it's not, and at this point we have neither proof that it is nor that it isn't, and should probably drop both speculations and undue celebrations until [v] and Elu know more.
2024: the year of democracy. Vote!
The Labyrinth | Donate your free time, help make free ebooks | Admins: Please let us block WACC TGs!
RIP Residency 3.5.16-18.11.21, killed by simplistic calculation
Political Compass: Economic -9.5 (Left) / Social -3.85 (Liberal)
Wrote issue 1523, GA resolutions 532 and 659
meth
When the people are being beaten with a stick, they are not much happier if it is called 'the People’s Stick.' — Mikhail Bakunin (to Karl Marx)
You're supposed to be employing the arts of diplomacy, not the ruddy great thumping sledgehammers of diplomacy. — Ardchoille
The West won the world not by the superiority of its ideas or values or religion [...] but rather by its superiority in applying organised violence. — Samuel P. Huntington (even he said that!)

User avatar
Roavin
Admin
 
Posts: 1777
Founded: Apr 07, 2016
Democratic Socialists

Postby Roavin » Wed Jan 19, 2022 5:38 pm

[v], TL;DR: The issue isn't the normal variability in update, but rather three specific slow-down of nations/sec by multiple orders of magnitude; i.e. the kind of major stall you referred to.

[violet] wrote:My main interest in the daily update is its length: major update should run for 90 minutes and minor update for 60 minutes -- no more, no less. In the distant past, updates used to run for all kinds of lengths depending on various factors, but that unreliability made life difficult for R/D, because you couldn't tell in advance how long you'd need your people to stick around.

Since Jan 3rd, when we tamed some bots, daily updates are back to hitting their target times every day. So at that level, everything continues to be fine.


Yep, on that front, everything seems to work as intended, even for the updates in question.

[violet] wrote:Now there was an isolated server blowout on Jan 7th, triggering this thread, which was about a strange stall that occurred midway through an update. This caused update to run a couple of minutes over, chiefly because of a period during which it ran very very slowly, due to (probably) the overloading of one particular service. This service overload hasn't occurred since then.


It occurred three times - Jan 7 major, Jan 8 major, and Jan 8 minor, under very specific and suspiciously-timed circumstances. See my post here for details.

[violet] wrote:However, we now seem to be talking about the issue of daily update not always proceeding at exactly the same speed throughout. But it's important to note that this happens by design. Daily updates share system resources with whatever else is going on in the server at the same time, and they deliberately slow down to make room for other processes when things are busy, then speed up during idle times, as necessary in order to hit the target finish time.

Any kind of major system-wide hang or stall is, of course, a bug. But daily updates aren't intended to run at a constant speed, and update slowdowns/speedups aren't bugs. I haven't ever looked into tracking or managing that.

I don't think R/D has ever asked for assurances regarding the intra-process timing of updates, i.e. limits on how much update can speed up or slow down within a single run. If that's something we need to look at, we can do so, but it would be a new feature.


No, [v], the issue is not the typical variability of update, but rather three specific instances where the variability far exceeded anything that has been seen before by literally orders of magnitude.

Assume two regions A and B, where I (as a R/Der) predict that A will update approximately 14 seconds before B. Such a prediction these days is usually based on a statistical method of previous major update(s). Even in the absolute worst of cases, I can confidently assume based on the past half decade of experience that the actual time between A and B, given my 14 seconds prediction, will be anything between 8 and 24 seconds. But on Jan 7, this slowed down to a whopping 66 seconds, a slowdown of over an order of magnitude more (see data here). Even worse was Jan 8 major, where the number of nations processed per minute dipped from about 2000 to about 30, by two orders of magnitude (see data by Budgie Snugglers here). The reason that suspicions of foul play arose here is because all three instances happened in such a way to sabotage a large-scale R/D operation.
Last edited by Roavin on Wed Jan 19, 2022 5:38 pm, edited 1 time in total.
Helpful Resources: One Stop Rules Shop | API documentation | NS Coders Discord
About me: Longest serving Prime Minister in TSP | Former First Warden of TGW | aka Curious Observations

Feel free to TG me, but not about moderation matters.

User avatar
[violet]
Executive Director
 
Posts: 16205
Founded: Antiquity

Postby [violet] » Wed Jan 19, 2022 6:13 pm

Roavin wrote:No, [v], the issue is not the typical variability of update, but rather three specific instances where the variability far exceeded anything that has been seen before by literally orders of magnitude.

We looked at Jan 7/8 and added some logging after the fact, but there isn't much more analysis we can do there unless it happens again.

What I'm a little confused about is the posts that have continued to come in since then, when it looks to me like everything is operating within normal parameters. If people still aren't happy with how daily update is running, then either they don't know that daily update isn't supposed to run at a constant speed, or I'm unaware of some major (recent) stalls.

User avatar
Roavin
Admin
 
Posts: 1777
Founded: Apr 07, 2016
Democratic Socialists

Postby Roavin » Wed Jan 19, 2022 6:29 pm

[violet] wrote:
Roavin wrote:No, [v], the issue is not the typical variability of update, but rather three specific instances where the variability far exceeded anything that has been seen before by literally orders of magnitude.

We looked at Jan 7/8 and added some logging after the fact, but there isn't much more analysis we can do there unless it happens again.


Would it help if such an operation (i.e. 100+ nations staging for a jump) is repeated?

[violet] wrote:What I'm a little confused about is the posts that have continued to come in since then, when it looks to me like everything is operating within normal parameters. If people still aren't happy with how daily update is running, then either they don't know that daily update isn't supposed to run at a constant speed, or I'm unaware of some major (recent) stalls.


The only post I'm seeing along those lines is Alfonzo's, and I suspect that this isn't the same thing - on Jan 7/8, the site itself wasn't lagging, just update itself was all of a sudden orders of magnitude slower.
Helpful Resources: One Stop Rules Shop | API documentation | NS Coders Discord
About me: Longest serving Prime Minister in TSP | Former First Warden of TGW | aka Curious Observations

Feel free to TG me, but not about moderation matters.

User avatar
Eluvatar
Director of Technology
 
Posts: 3086
Founded: Mar 31, 2006
New York Times Democracy

Postby Eluvatar » Wed Jan 19, 2022 8:02 pm

On a related note, users of the extension known as "Storm" may wish to be careful about using it as it has a serious bug.
To Serve and Protect: UDL

Eluvatar - Taijitu member

User avatar
FNR Minister of Defense
Civilian
 
Posts: 1
Founded: Jul 01, 2021
Ex-Nation

Postby FNR Minister of Defense » Wed Jan 19, 2022 8:33 pm

Eluvatar wrote:On a related note, users of the extension known as "Storm" may wish to be careful about using it as it has a serious bug.

This browser extension was operated by Codpiece for the Free Nations Defense Association. We were not aware of this bug, and all members of FNDA have been urged to stop using Storm immediately.

- Apatosaurus II, Delegate and Deputy Minister of Defense of the Free Nations Region
Nation owned by Apatosaurus II and Burgertopian Empire. Please contact them with any inquiries.

User avatar
Thousand Branches
Diplomat
 
Posts: 754
Founded: Jun 03, 2021
Inoffensive Centrist Democracy

Postby Thousand Branches » Wed Jan 19, 2022 9:35 pm

Eluvatar wrote:On a related note, users of the extension known as "Storm" may wish to be careful about using it as it has a serious bug.

Is that possibly what’s been causing the spikes? I know FNDA folks were definitely present for that big lib as well as the other day’s incredible lag spikes iirc
|| Aramantha Calendula ||
○•○ Writer, editor, and World Assembly fanatic ○•○
•○• Proud member of House Elegarth •○•
○•○ Telegram or message me on discord at QueenAramantha for writing or editing help ○•○
•○• Failed General Assembly Resolutions Archive || The Grand (Newspaper Archive) •○•
○•○ Have an awesome day you! ○•○

User avatar
Sedgistan
Site Director
 
Posts: 35471
Founded: Oct 20, 2006
Anarchy

Postby Sedgistan » Thu Jan 20, 2022 12:05 am

Thousand Branches wrote:Is that possibly what’s been causing the spikes?

I believe the only answer we can give at present is "possibly". As per Violet's post above, additional logging was added after this topic was raised, but unless we have a repeat of the situation, it's unlikely we can come to a more definite conclusion.

Roavin wrote:
[violet] wrote:We looked at Jan 7/8 and added some logging after the fact, but there isn't much more analysis we can do there unless it happens again.


Would it help if such an operation (i.e. 100+ nations staging for a jump) is repeated?

Yes.

User avatar
Zizou
Diplomat
 
Posts: 564
Founded: Aug 23, 2018
Ex-Nation

Postby Zizou » Thu Jan 20, 2022 1:47 am

Thousand Branches wrote:
Eluvatar wrote:On a related note, users of the extension known as "Storm" may wish to be careful about using it as it has a serious bug.

Is that possibly what’s been causing the spikes? I know FNDA folks were definitely present for that big lib as well as the other day’s incredible lag spikes iirc

Obviously I'm not an admin or anything like that, but I was messing around with Storm and found some issues relating to the cross endorse feature. Storm will start generating excessive requests if:

1. You switch from cross endorsing on one nation to cross endorsing on another
2. After this, pressing the cross endorse key again multiple times on the same nation will cause it to generate too many requests

The cross endorse functionality is meant to send an API request every 700ms, but if conditions like the above occur, then it will start issuing API requests in parallel instead of one at a time. So if I pressed the C key twice instead of once on accident, it would be sending two requests every 700ms. And if I pressed the C key three times, it would be sending three requests every 700ms, etc. etc.

Bearing in mind that you need to press C each time to endorse a nation, the following scenario could have occurred:
- You start cross endorsing off of a nation and Storm starts making API calls
- For whatever reason, you stop cross endorsing off of that nation and start cross endorsing off of another one
- After this point, Storm is running in a bugged state where pressing C will begin to issue API requests simultaneously
- Every subsequent press of the C key will result in increasing the rate at which API requests are sent
Last edited by Zizou on Thu Jan 20, 2022 3:49 am, edited 4 times in total.
Zizou Vytherov-Skollvaldr
LTN in The Black Hawks
Meishu of the former Red Sun Army
Parxland wrote:It might somehow give me STDs through the computer screen with how often you hop between different groups of people.

User avatar
[violet]
Executive Director
 
Posts: 16205
Founded: Antiquity

Postby [violet] » Thu Jan 20, 2022 3:58 am

This might be about right. There was a Storm going haywire yesterday that started sending 2 requests per second, then by a couple of minutes later it was sending 4 requests per second, then 10, and eventually 130 requests per second, which is roughly equivalent to 100% of total normal site traffic.

All these requests were for the endos of a non-existent nation ("undefined"), so it was totally broken and doing nothing but keeping the server busy.
Last edited by [violet] on Thu Jan 20, 2022 4:01 am, edited 1 time in total.

User avatar
Merni
Ambassador
 
Posts: 1800
Founded: May 03, 2016
Democratic Socialists

Postby Merni » Thu Jan 20, 2022 4:06 am

[violet] wrote:This might be about right. There was a Storm going haywire yesterday that started sending 2 requests per second, then by a couple of minutes later it was sending 4 requests per second, then 10, and eventually 130 requests per second, which is roughly equivalent to 100% of total normal site traffic.

All these requests were for the endos of a non-existent nation ("undefined"), so it was totally broken and doing nothing but keeping the server busy.

This is only speculation, but considering that the lag on 7-8 January started after the 2-minute warning (when people don't generally continue to cross) and indeed, after the "go" trigger, I don't see why people would have been using a cross feature at that time.
2024: the year of democracy. Vote!
The Labyrinth | Donate your free time, help make free ebooks | Admins: Please let us block WACC TGs!
RIP Residency 3.5.16-18.11.21, killed by simplistic calculation
Political Compass: Economic -9.5 (Left) / Social -3.85 (Liberal)
Wrote issue 1523, GA resolutions 532 and 659
meth
When the people are being beaten with a stick, they are not much happier if it is called 'the People’s Stick.' — Mikhail Bakunin (to Karl Marx)
You're supposed to be employing the arts of diplomacy, not the ruddy great thumping sledgehammers of diplomacy. — Ardchoille
The West won the world not by the superiority of its ideas or values or religion [...] but rather by its superiority in applying organised violence. — Samuel P. Huntington (even he said that!)

User avatar
Zizou
Diplomat
 
Posts: 564
Founded: Aug 23, 2018
Ex-Nation

Postby Zizou » Thu Jan 20, 2022 4:17 am

[violet] wrote:This might be about right. There was a Storm going haywire yesterday that started sending 2 requests per second, then by a couple of minutes later it was sending 4 requests per second, then 10, and eventually 130 requests per second, which is roughly equivalent to 100% of total normal site traffic.

All these requests were for the endos of a non-existent nation ("undefined"), so it was totally broken and doing nothing but keeping the server busy.

That would be because someone was cross endorsing, and then switched to cross endorsing off of another nation that had less endorsements than the script had made API calls. If that happens, the script will just send bogus requests to undefined forever. If the person was also spamming C to try and get the endorsements done as fast as possible, combining that with the behavior described in my other post, the script will just keep hitting the server with requests at ridiculous rates until something is done to stop it.

Merni wrote:
[violet] wrote:This might be about right. There was a Storm going haywire yesterday that started sending 2 requests per second, then by a couple of minutes later it was sending 4 requests per second, then 10, and eventually 130 requests per second, which is roughly equivalent to 100% of total normal site traffic.

All these requests were for the endos of a non-existent nation ("undefined"), so it was totally broken and doing nothing but keeping the server busy.

This is only speculation, but considering that the lag on 7-8 January started after the 2-minute warning (when people don't generally continue to cross) and indeed, after the "go" trigger, I don't see why people would have been using a cross feature at that time.

The problem is that the API requests don't stop when people stop crossing. The API requests are made in the background while the person is crossing. If something like I described above happened, the requests would not stop until the person did something to make them stop, such as closing their browser or uninstalling the extension.
Last edited by Zizou on Thu Jan 20, 2022 4:23 am, edited 2 times in total.
Zizou Vytherov-Skollvaldr
LTN in The Black Hawks
Meishu of the former Red Sun Army
Parxland wrote:It might somehow give me STDs through the computer screen with how often you hop between different groups of people.

PreviousNext

Advertisement

Remove ads

Return to Technical

Who is online

Users browsing this forum: Amnathon, Bali Kingdom, Coleas, Ethel mermania, Falcor Toboe, Girolamo, Grailquest, Hylia, Hyperwolf, Legeworld, Lumaterra, Neeshta, New Makasta, Peoples of Waskar, Picairn, Podria, Radicalania, Ruotsaland, The Ambis, The Western Balkan Federation, Thermodolia, Tungstan, Valkyrie Reborn

Advertisement

Remove ads