October 24, 2012 at 12:00 PM
I put together a scraping bot for a website I frequent occasionally, but they got wise to my Python shenanigans after I released the source code (who'd have thought?), so I had to step up my bot code a bit.
After some tinkering I found that they had just blacklisted the user agent header I was using for the script, instead of doing something more effective like setting access timers between page requests, or getting strict on referer headers, or just banning my account.
But I digress...
Since all they did was ban the user agent string I was using before, all I had to do was change it , and I was back in business. But in the long run this isn't really the best solution since they could always just ban the user agent header again. So instead I decided to throw together a quick Python function that generates a randomized, realish looking user agent header.
Gaze and behold!
def get_random_useragent(): base_agent = "Mozilla/%.1f (Windows; U; Windows NT 5.1; en-US; rv:%.1f.%.1f) Gecko/%d0%d Firefox/%.1f.%.1f" return base_agent % ((random.random() + 5), (random.random() + random.randint(1, 8)), random.random(), random.randint(2000, 2100), random.randint(92215, 99999), (random.random() + random.randint(3, 9)), random.random()) >>> print get_random_useragent() Mozilla/5.2 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/2009098692 Firefox/22.214.171.124 >>> print get_random_useragent() Mozilla/5.5 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/2006095233 Firefox/188.8.131.52 >>> print get_random_useragent() Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:184.108.40.206) Gecko/2064093484 Firefox/220.127.116.11 >>> print get_random_useragent() Mozilla/5.6 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/2063099117 Firefox/22.214.171.124
I hope someone out there besides me can find a use for this.