Compromised Moderator Login; your data is safe though

Okay, short update: Besides some minor issues the logic and code for moving all the posts which could be identified via the internet archive is working.
Second challenge now lies in identifying the rest and where they should go. A mixture of time stamps of the posts and the users should make it possible to separate that for a good part, I would think. Will be an ugly hack though.
 
Okay, short update: Besides some minor issues the logic and code for moving all the posts which could be identified via the internet archive is working.
Second challenge now lies in identifying the rest and where they should go. A mixture of time stamps of the posts and the users should make it possible to separate that for a good part, I would think. Will be an ugly hack though.
A big thanks to you and the rest of the team for all the work you’re doing, and for keeping us updated! And good luck with all the remaining work.
 
I'm right now looking at the data. Overall there are 22625 posts.
There are 10490 which are from current or former moderators, so they probably belong to the internal threads which got merged. The exception is Leoreth, who has 1447 of these 10490 posts, because his modding thread got merged.
From the remaining 12135 posts only 3499 can be assigned to a thread based on the internet archive.
I'll now have a look at the rest if the time stamps are unambiguous.
 
I currently have an assignment for 18963 out of 22625 posts.
From the remaining 3663 in total 1062 belong to Leoreth, and 2319 to 599 other users, and the last 282 seem to be deleted posts which I by accident also pulled into the analysis :blush:.
This is under the assumption that 4690 posts which are assigned based on unique username come from users who did not post in any of the 21 threads which were not archived in the internet archive, which is as assumption a bit soso.
I'll now see if I can put things together based on quotes in the thread.

And then I need to let the bot itself run and split the monster thread... based on a rough calculation this should take.... ugh... 55 hours, as each time there is waiting times when the forum gets loaded.
 
I hate you lol.
I'll start running the bot later tonight probably without the quote chains, as I want to get some of the stuff finally recovered. The remainder I can still sort out later.
 
The bot has overnight moved 5387 posts, of which are 3143 for the enhanced user interface by @bc1 (158 pages out of 175), and so far 1889 for @Leoreth 's "Dawn Of Civ" (95 pages out of 290). At this point I stopped the bot because I have to use this computer right now :lol:, I'll restart it before I go to bed.
 
The bot has overnight moved 5387 posts, of which are 3143 for the enhanced user interface by @bc1 (158 pages out of 175), and so far 1889 for @Leoreth 's "Dawn Of Civ" (95 pages out of 290). At this point I stopped the bot because I have to use this computer right now :lol:, I'll restart it before I go to bed.
Import MSN auto-moderator algorithms, and you're on the way to self-obsolescence. "The forums practically run themselves!" :P
 
The bot run through the night, didn't report any issues, but I see it didn't split a part.
I'll let it run again over night, maybe it had to do with the connectivity of the forum, you never know.

It also seems I mis-identified a thread, so instead of having 69 pages which need to be recovered for one thread, it is only 1. That one is not archived, need to guess at the end.

I've already made one thread again public, as that was an ongoing game and only 3 pages. For the other threads I'll need to do some spot-checks first before I do anything with them.
 
I've just moved 4 threads with 450 pages in total back to the public.
Right now I have internally 40 pages of the Civ7 steam stats thread, which is far from being complete. The others were at least 2/3 complete or more.

As the remainder includes a very big thread from the moderator forum, I've programmed the bot right now to separate the posts from regular members from the posts of (former) moderators. I don't have any idea how many that will still be, but the thread with the separated posts is now at 60 pages (~1200 posts). After that, we'll need to see how many threads are worth putting back together. E.g. there was a short new thread about the tariffs, and I'll not put much time into getting that one back. There was also one about a sale, that might not be worth the time either. It seems there are at least 2 modding threads in there, which I hope to get back.
Gotta see, also depends on the final amount of posts to still separate.
 
The progress on the restoration was somewhat upended by the currently ongoing issues with the server and the forum, where we're still trying to figure out what's going on.
I've still put some more time into this today, to see if I can get a check based on quote chains implemented, and it's getting there.

quotechain-example.png


This is an example here. The dark blue posts are in the Dawn of Civilization thread, the red ones are in the Enhanced User Interface (EUI) thread, the light blue ones are not assigned currently. All posts which are connected are somehow quoting each other. You can see here that e.g. in the case on the right, on the bottom there is post 1616009, which is currently not assigned to a thread, but is quoting (or is getting quoted, direction does not matter) by a post in the EUI thread, which is quoted by another post in the EUI thread etc. So... good chance that post 1616009 belongs in the EU thread.
In some cases, like on the left, there's nothing to be gained, all quoted posts are already in the right thread.
In the case 2nd to the left, none of the posts is assigned, but for longer chains I should check if I can manually see that, assign one, and then get the rest automatically assigned.
Now I need to iteratively go through the data, expand the chains, then check again to collect usernames per thread, see if a username can be clearly assigned to a thread, see if this adds a new chain, expand the chain, repeat.
This will only work for posts up to 2024 incl., because for afterwards we have one of the big Civ7 threads, which is not assigned, and attracted users from everywhwere. From 2024 and before we have only the 2 threads, Dawn Of Civ and EUI, so that is easy. Well, "easy".

In total still 2778 posts need to be somehow assigned (or they might stay in our recycle bin).
1004 are from up to incl. 2024, for which approx 400 can already be assigned based on username. I hope to maybe get to 700, which will leave us with a leftover of 2000 posts.
Then... no clue :dunno:.
 
It almost sounds like interesting technical challenge to resolve at this point. If only it was something you would be tackling as a fun project for your own amusement instead of being forced into it by past events.
 
The progress on the restoration was somewhat upended by the currently ongoing issues with the server and the forum, where we're still trying to figure out what's going on.
I've still put some more time into this today, to see if I can get a check based on quote chains implemented, and it's getting there.

View attachment 733346

This is an example here. The dark blue posts are in the Dawn of Civilization thread, the red ones are in the Enhanced User Interface (EUI) thread, the light blue ones are not assigned currently. All posts which are connected are somehow quoting each other. You can see here that e.g. in the case on the right, on the bottom there is post 1616009, which is currently not assigned to a thread, but is quoting (or is getting quoted, direction does not matter) by a post in the EUI thread, which is quoted by another post in the EUI thread etc. So... good chance that post 1616009 belongs in the EU thread.
In some cases, like on the left, there's nothing to be gained, all quoted posts are already in the right thread.
In the case 2nd to the left, none of the posts is assigned, but for longer chains I should check if I can manually see that, assign one, and then get the rest automatically assigned.
Now I need to iteratively go through the data, expand the chains, then check again to collect usernames per thread, see if a username can be clearly assigned to a thread, see if this adds a new chain, expand the chain, repeat.
This will only work for posts up to 2024 incl., because for afterwards we have one of the big Civ7 threads, which is not assigned, and attracted users from everywhwere. From 2024 and before we have only the 2 threads, Dawn Of Civ and EUI, so that is easy. Well, "easy".

In total still 2778 posts need to be somehow assigned (or they might stay in our recycle bin).
1004 are from up to incl. 2024, for which approx 400 can already be assigned based on username. I hope to maybe get to 700, which will leave us with a leftover of 2000 posts.
Then... no clue :dunno:.
Writing specific code to do this automatically would be too risky, I suppose (?)
At any rate, you are obviously very commendable for cleaning the mess, like a new Herakles with the Augean stables :/
 
Writing specific code to do this automatically would be too risky, I suppose (?)
At any rate, you are obviously very commendable for cleaning the mess, like a new Herakles with the Augean stables :/
He just diverted a river, though. Such a simple and brute force solution would be counter-intuitive for The Jay. Like a major data flush.
 
Just?
 
Well, the main difference is certainly that the content of these challenges are very different, I hope you all agree ;).

It almost sounds like interesting technical challenge to resolve at this point. If only it was something you would be tackling as a fun project for your own amusement instead of being forced into it by past events.

I'd actually agree.
I'm considering making a "case report" over at Xenforo to explain what we did in the aftermath and to attach some of the scripts, as I think this could be useful to some people.

Writing specific code to do this automatically would be too risky, I suppose (?)

I'll be semi-automatic. I'll write the code to do all of that, but will have multiple stopping points to check in between, that the logic works etc.
Might still take a while though, my June is pretty full with travels :/.
 
Back
Top Bottom