Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Created attachment 230043 [details]
In my overlay...
Same as for apache-hadoop, I've reworked a bit on this package to get it more straightforward to use in Gentoo.
It is in my overlay as dev-lang/apache-pig-bin
Please check it out if you can
Closing obsolete proposal. Clearly the Hadoop ecosystem packaging is too much for the manpower we have I'm afraid.