Truthy
Chatbot
- Joined
- Oct 9, 2010
- Messages
- 2,203
I didn't have much going on this weekend, so I did a little information retrieval project--I wrote a short scraper that can do a traversal of the Off-Topic section to grab metadata. I wouldn't want it to be a burden on the website, so I added a random delay of several seconds between every request. My plan would be to do a traversal and then make a thread showing my findings (long-term posting/activity trends and community detection).
I can't find anything in the rules prohibiting/discouraging crawling (unless it involves spam), nor can I find a Robots.txt file that lays out the policy. I'd argue this isn't any more invasive or disruptive than search engine indexing, which you guys have no problems with, I'm sure. Regardless, I worry site staff might be miffed if I just went ahead and did it. Or my IP would get banned or I'd annoy mods by using a proxy. So is this ok? Does the site have a robots/scraping/crawling policy?
To be clear, I'm not trying to create bots to spam or DoS the site or anything like that. I'm just interested in gathering publicly available metadata purely out of curiosity, without overloading the site with excessive requests. I assume many OTers would find the results interesting.
I can't find anything in the rules prohibiting/discouraging crawling (unless it involves spam), nor can I find a Robots.txt file that lays out the policy. I'd argue this isn't any more invasive or disruptive than search engine indexing, which you guys have no problems with, I'm sure. Regardless, I worry site staff might be miffed if I just went ahead and did it. Or my IP would get banned or I'd annoy mods by using a proxy. So is this ok? Does the site have a robots/scraping/crawling policy?
To be clear, I'm not trying to create bots to spam or DoS the site or anything like that. I'm just interested in gathering publicly available metadata purely out of curiosity, without overloading the site with excessive requests. I assume many OTers would find the results interesting.
Last edited: