OT Crawling

Truthy

Chatbot
Joined
Oct 9, 2010
Messages
2,203
I didn't have much going on this weekend, so I did a little information retrieval project--I wrote a short scraper that can do a traversal of the Off-Topic section to grab metadata. I wouldn't want it to be a burden on the website, so I added a random delay of several seconds between every request. My plan would be to do a traversal and then make a thread showing my findings (long-term posting/activity trends and community detection).

I can't find anything in the rules prohibiting/discouraging crawling (unless it involves spam), nor can I find a Robots.txt file that lays out the policy. I'd argue this isn't any more invasive or disruptive than search engine indexing, which you guys have no problems with, I'm sure. Regardless, I worry site staff might be miffed if I just went ahead and did it. Or my IP would get banned or I'd annoy mods by using a proxy. So is this ok? Does the site have a robots/scraping/crawling policy?

To be clear, I'm not trying to create bots to spam or DoS the site or anything like that. I'm just interested in gathering publicly available metadata purely out of curiosity, without overloading the site with excessive requests. I assume many OTers would find the results interesting.
 
Last edited:
I doubt this proposal would burden the server. However, please PM me examples of the data that you would collect and post. Thanks!
 
Alas, I was asked not to due to potential privacy or GDPR stuff.

Is the OT composed of "roving like gangs" and cliques? How have posting trends changed over time? How severe is posting inequality? What percentage of all likes were given by @hobbsyoyo?

We may never know :p
 
Aww that's too bad.

Yeah I can't think of a way that you could anonymize that data and still have it be fun/meaningful for posters.
 
I'm just interested in gathering publicly available metadata purely out of curiosity
Tell me so I can have my account deleted and prosecute you if I ever find out who you are.
 
Actually, the one who's not being nice is you.
 
This is why we can't have nice graphs
I believe that the answer is ‘you don't have the right to make such a graph because my metadata is mine. That is why I block Facebook and similar companies from gathering my data.’

Ergo, you're the one who's not being nice.
 
Top Bottom