Posts

Showing posts from July, 2011

Use hive to catch grabber

Image
Get the logs from the farm via flume & syslog, mapreduce them in hive for  IP, how often / second, bytes, item and compare with "human" profiles. Get the data on the fly via sqlstream, processes back into Oracle and from there a loadbalancer could get the IPs for a smooth redirect and I process the data into a graphing system (connection from that IP): Hourly I check geolocation, whois, provider. Using pig.latin. Ready for first testing in our labs. And, of course, not a really performant task (yet) ;-)