Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 523412 - =sys-cluster/spark-bin-2.3.1: a fast and general engine for large-scale data processing
Summary: =sys-cluster/spark-bin-2.3.1: a fast and general engine for large-scale data ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: Normal enhancement with 2 votes (vote)
Assignee: Java team
URL: http://spark.apache.org/
Whiteboard:
Keywords: EBUILD
Depends on:
Blocks: 599282
  Show dependency tree
 
Reported: 2014-09-22 03:00 UTC by James Horton
Modified: 2018-11-05 09:05 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
spark-1.1.0.ebuild Apache-spark "in-memory" cluster engine. (spark-1.1.0.ebuild,1.37 KB, text/plain)
2014-09-22 12:44 UTC, James Horton
Details
spark-1.1.0.ebuild compiles now. Finer grain options need to be explored. (spark-1.1.0.ebuild,1.20 KB, text/plain)
2014-09-23 00:54 UTC, James Horton
Details
sys-cluster/spark-1.5.2.ebuild (spark-1.5.2.ebuild,1.28 KB, text/plain)
2016-08-08 05:16 UTC, James
Details
Ebuild for 2.3.1 (spark-bin-2.3.1.ebuild,1.16 KB, text/plain)
2018-09-21 00:35 UTC, Alec Ten Harmsel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Horton 2014-09-22 03:00:34 UTC
Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing (http://spark.apache.org/)

I have an ebuild that is not building the scala components. If one of the Java
herd members could look it over, it should be easy to fix?

After unpacking (using my ebuild) this file contains the details 
of compiling:

/var/tmp/portage/sys-cluster/spark-1.1.0/work/spark-1.1.0/README.md

It says:
You can find the latest Spark documentation, including a programming
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
This README file only contains basic setup instructions.

## Building Spark

Spark is built on Scala 2.10. To build Spark and its example programs, run:

    ./sbt/sbt assembly


From the "work" dir I can manually runs this command and compile the 
sources:

"./sbt/sbt assembly"  and "./make-distribution.sh"

but my (newly hacked ebuild needs a wee bit o fixin:


Here it is:

# Copyright 1999-2014 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

EAPI=5
PYTHON_COMPAT=( python2_7 )
JAVA_PKG_IUSE="doc examples source"
JAVA_ANT_DISABLE_ANT_CORE_DEP="yes"

inherit eutils  autotools versionator multilib flag-o-matic 

MY_PV=${PV/_/}

DESCRIPTION="Spark supports cyclic data flows and in-memory computing"
HOMEPAGE="http://spark.apache.org/"

SRC_URI="http://www.apache.org/dist/spark/spark-1.1.0/${P}.tgz" 

LICENSE="Apache-2.0"
SLOT="0"
KEYWORDS="~amd64 ~x86"
IUSE="java python scala"

DEPEND="net-misc/curl
        dev-libs/cyrus-sasl
                python? ( dev-lang/python dev-python/boto )
                java? ( virtual/jdk )
                scala? ( dev-lang/scala )
                dev-java/maven-bin "

RDEPEND=" python? ( dev-lang/python )
                  >=virtual/jdk-1.6
                  scala? ( dev-lang/scala )
                  dev-java/maven-bin
                ${DEPEND}"

S="${WORKDIR}/${P}"

ECONF_SOURCE="${S}"

src_prepare() {
    mkdir "${S}/build" || die " making dir problem"
}
src_configure() {
    cd "${S}" || die "source configure problem"
    econf \
        $(use_enable python) \
        $(use_enable java) \
        $(use_enable scala) 
}

src_compile() {
    cd "${S}/build"
        ./make-distribution.sh || die "problem with make-distribution.sh"
        ./sbt/sbt assembly || die "assembly build failed"
}

src_install() {
    cd "${S}/build"
    emake DESTDIR="${D}" install || die "emake install failed"
}


Any help is greatly appreciated. I'm learning java, python and scala
and intend to build more packages!


James
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2014-09-22 08:24:05 UTC
Could you attach that ebuild as a file, please?
Comment 2 James Horton 2014-09-22 12:44:14 UTC
Created attachment 385280 [details]
spark-1.1.0.ebuild  Apache-spark "in-memory" cluster engine.

It's new, rough and ugly!
Comment 3 James Horton 2014-09-23 00:54:40 UTC
Created attachment 385318 [details]
spark-1.1.0.ebuild   compiles now. Finer grain options need to be explored.

It compiles and installs now.

Need testers.
Comment 4 Rick Moritz 2015-01-07 19:52:39 UTC
Compilation error under ~amd64 with hardened profile, with USE= scala java -python

 * Messages for package sys-cluster/spark-1.1.0:

 * ERROR: sys-cluster/spark-1.1.0::gentoo failed (compile phase):
 *   problem with make-distribution.sh
 *
 * Call stack:
 *     ebuild.sh, line  93:  Called src_compile
 *   environment, line 3054:  Called die
 * The specific snippet of code:
 *       ./make-distribution.sh || die "problem with make-distribution.sh"
 *

maybe linked to java version "1.7.0_71" ?

It got as far as

[info] Done packaging.
[success] Total time: 1557 s, completed Jan 7, 2015 5:57:32 PM


It seems that this interactive bit around line 135 in make-distribution here is the culprit:

 read -p "Would you like to continue anyways? [y,n]: " -r
  if [[ ! $REPLY =~ ^[Yy]$ ]]; then
    echo "Okay, exiting."
    exit 1
  fi

I'll see if I can remove that bit. Also, potentially adding a Java 1.6 dependency to teh ebuil might be a good idea for the time being.
Comment 5 Rick Moritz 2015-01-08 15:20:34 UTC
Sorry for all the typos in the previous comment.

Anyway, adding --skip-java-test after make-distribution.sh has allowed me to get through to the install phase, where make died:

>>> Source compiled.
>>> Test phase [not enabled]: sys-cluster/spark-1.1.0

>>> Install spark-1.1.0 into /var/tmp/portage/sys-cluster/spark-1.1.0/image/ category sys-cluster
make -j3 DESTDIR=/var/tmp/portage/sys-cluster/spark-1.1.0/image/ install
make: *** No rule to make target 'install'.  Stop.
 * ERROR: sys-cluster/spark-1.1.0::gentoo failed (install phase):
 *   emake failed


I will see if I can find a workaround for that as well...
Comment 6 Rick Moritz 2015-01-08 15:34:47 UTC
There appears to be no Makefile anywhere within the work directory, the build directory is completely empty.
I'm not sure where the Makefile that emake uses to install is supposed to come from, but it's just not there.
Comment 7 James Horton 2015-01-14 15:48:04 UTC
(In reply to Rick Moritz from comment #6)
> There appears to be no Makefile anywhere within the work directory, the
> build directory is completely empty.
> I'm not sure where the Makefile that emake uses to install is supposed to
> come from, but it's just not there.


Background. Yes, I am still learning about ebuild creation and optimization, so
do not be surprise by any mistakes you encounter. I think it may be missing
that definition for the Makefile or whatever. I compiled it from within
the sources, and probable changed into the dir where the various make files
are. It took me days to get it to compile; so I'd be surprised if I did not
miss something fundamental in the ebuild.

hth,
James
Comment 8 James Horton 2015-01-14 16:07:43 UTC
(In reply to James Horton from comment #7)
> (In reply to Rick Moritz from comment #6)
> > There appears to be no Makefile anywhere within the work directory, the
> > build directory is completely empty.
> > I'm not sure where the Makefile that emake uses to install is supposed to
> > come from, but it's just not there.


Ok here  the basic instructions:

spark-1.1.0/README.md:

"## Online Documentation

You can find the latest Spark documentation, including a programming
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
This README file only contains basic setup instructions.

## Building Spark

Spark is built on Scala 2.10. To build Spark and its example programs, run:

    ./sbt/sbt assembly

(You do not need to do this if you downloaded a pre-built package.) "


I thought I had it configure to run the "./sbt/sbt assembly" command but
that may be what I missed, in addition to your other points?

hth,
James
Comment 9 Rick Moritz 2015-01-23 10:08:41 UTC
sbt assembly is being called and succeeds in your ebuild already.

The install phase is responsible for moving the assembled bits and pieces into the system tree, so that the spark binary (or a script calling it) ends up in a directory that is included in the PATH variable, in this case for example /usr/bin/.

Currently the ebuild calls
emake DESTDIR="${D}" install || die "emake install failed"

where it appears neither D is defined, nor a Makefile present which would evaluate the D.

The build step, where sbt is getting called completes (once the Java issue is taken care of.)
Comment 10 Rick Moritz 2015-02-13 14:16:30 UTC
Having looked at this again, the best way to get a working ebuild would probably be to start from the Debian build scripts in the Maven profile that is delivered with Spark.

Although I am not entirely convinced that building from binaries (Maven jars...) is really compatible with the Gentoo way, it would probably be a good starting point in this case, since they should provide some hints with regard to install locations etc.

You could probably use maven flags to download dependency sources and build those too, and map portage actions to maven actions and vice versa, to obtain a working ebuild.

As soon as I find some more time, I'll also have a look at this myself.
Comment 11 James Horton 2015-02-13 18:34:38 UTC
> Having looked at this again, the best way to get a working ebuild would 
> probably be to start from the Debian build scripts in the Maven profile 
> that is delivered with Spark.

I just posted to gentoo-user about using deb or rpm packages to test
new codes. It is a prelude into converting those packages to ebuilds:

Subject: rpm or deb package installs


> Although I am not entirely convinced that building from binaries 
> (Maven jars...) is really compatible with the Gentoo way, it would probably 
> be a good starting point in this case, since they should provide some 
> hints with regard to install locations etc.

Yep, my apache-mesos ebuild_hack is pretty much stuck on this too: BGO 510912.

> You could probably use maven flags to download dependency sources and build 
> those too, and map portage actions to maven actions and vice versa, to obtain a > working ebuild. As soon as I find some more time, I'll also have a look at this 

Please? I'm just an old hack, working on comprehending the devmanual and gentoo-dev. When I get more confidence (moxy?) I'll probably be a proxy_maintainer_wannabe......

gentoo wiki has this page:
http://wiki.gentoo.org/wiki/RPM

but not this page:

http://wiki.gentoo.org/wiki/dpkg


to mat  

ch what is in portage. Defining a methodology for hacks to 
use deb or rpm packages to end up with ebuilds, would make it viable
path to more proxy maintainers, imho. Keep me posted.

Another old reference link: 
http://www.gentoo-wiki.info/TIP_install_programs_without_portage

James
Comment 12 Patrice Clement gentoo-dev 2015-07-13 13:57:03 UTC
Hi James

I've looked into this yesterday, out of curiosity, as I maintain a bunch of Scala ebuilds already and Spark "sparked" my interest. Thanks for the ebuild you've put together. Unfortunately, we can't afford handing the compiling process over to a shell script (I'm refering to make-distribution.sh). IHMO, Spark being a serious concurrent to Hadoop, it's a tragedy such a big open source project relies on a shell script to compile up, especially when you have tools such as Ant, Gradle, Maven and sbt to do the grunt job for you. Anyway.. We should try to work out what's going on in this shell script and try to come up with our own Ant build.xml (we already have eclasses to control Ant - but not Maven yet, unfortunately) or rewrite the shell script logic in the ebuild directly. 

@cluster: Anyone from your team interested in adding Spark to the tree?
Comment 13 Rick Moritz 2015-07-13 15:25:02 UTC
> Unfortunately, we can't afford handing the compiling
> process over to a shell script (I'm refering to make-distribution.sh)

I built my most recent Spark distribution directly via Maven - so I assume that make-distribution.sh doesn't really do a whole lot (anymore). Most of the variables it sets should be set already, others ought to be set in the ebuild instead.

One more thing: With Spark 1.4 they added a build-option to link against external Hadoop libs (-Phadoop-provided), which we should probably set as default, and link against an already installed Hadoop, or pull in Hadoop dependencies via portage. That way most of the build-options for Hadoop-version can also be implied from the installation environment.

For a first draft, we can probably do without cluster-installation (i.e. push the image to HDFS, in case of YARN-based setup), but simply put the spark shell scripts and jars and conf in the proper places for
Comment 14 James Horton 2015-07-13 19:48:59 UTC
@Patrice. I'm a miserable hack when it comes to all things java/scala/maven....

So you are totally encouraged to take the build and do something correct with
it. Just drop me a line where the work is and I'll test it and provide feedback.
Once you are done I'll proxy maintain it, if you like as I do follow the 
mesos/spark/tachyon/zookeeper codes.....


hth,
James
Comment 15 James Horton 2015-07-13 19:53:25 UTC
@ rick


For spark, my goal is to run in top of a mesos cluster. Each system will have
btrfs and cephfs for the DFS. I'm trying to completely avoid HDFS and the
limitations therein. Do keep this in mind as you extend the ebuilds so that
an apache-spark ebuild can be flexible for a variety of configurations.

I have mesos running on amd64 in bgo too, if you are interested in that.

If you are investigating spark, then you definately should read up on 
tachyon and storm (both apache projects now).  It is wonderful for folks
that actually know java/scala to take over these ebuild. I will help
and test as you like.

James
Comment 16 Patrice Clement gentoo-dev 2015-07-15 09:07:48 UTC
(In reply to Rick Moritz from comment #13)
> I built my most recent Spark distribution directly via Maven - so I assume
> that make-distribution.sh doesn't really do a whole lot (anymore). Most of
> the variables it sets should be set already, others ought to be set in the
> ebuild instead.
> 

Thanks for the hint. Indeed, Spark can be built directly using Maven. Which means we can ask Maven to generate Ant build.xml using the command: mvn ant:ant

And it does work!

$ ls -lsd maven-build.xml */maven-build.xml
  4 -rw-r--r-- 1 monsieurp monsieurp   3057 Jul 15 10:04 assembly/maven-build.xml
380 -rw-r--r-- 1 monsieurp monsieurp 385800 Jul 15 10:03 bagel/maven-build.xml
472 -rw-r--r-- 1 monsieurp monsieurp 479891 Jul 15 10:03 core/maven-build.xml
740 -rw-r--r-- 1 monsieurp monsieurp 754969 Jul 15 10:04 examples/maven-build.xml
384 -rw-r--r-- 1 monsieurp monsieurp 390980 Jul 15 10:03 graphx/maven-build.xml
200 -rw-r--r-- 1 monsieurp monsieurp 201938 Jul 15 10:02 launcher/maven-build.xml
 12 -rw-r--r-- 1 monsieurp monsieurp  12168 Jul 15 10:02 maven-build.xml
460 -rw-r--r-- 1 monsieurp monsieurp 470763 Jul 15 10:03 mllib/maven-build.xml
452 -rw-r--r-- 1 monsieurp monsieurp 461160 Jul 15 10:04 repl/maven-build.xml
464 -rw-r--r-- 1 monsieurp monsieurp 472567 Jul 15 10:03 streaming/maven-build.xml
376 -rw-r--r-- 1 monsieurp monsieurp 383608 Jul 15 10:03 tools/maven-build.xml
 44 -rw-r--r-- 1 monsieurp monsieurp  43365 Jul 15 10:02 unsafe/maven-build.xml

The bad news is.. Maven pulls down a trillion jars from the Web and I bet half of them are not in Portage. :/

Thanks Rick, that's a good start.
Comment 17 Rick Moritz 2015-07-15 09:39:30 UTC
I wouldn't be quire so pessimistic - most of the dependencies are pretty standard and should be available on systems where hadoop, hive and scala build, at least for hadoop-centric builds. Although there are a few slightly esoteric dependencies, and oviously the usual versioning-hell.

Additionally, the different profiles are going to be a challenge.

We might be able to reduce the scope somewhat, by e.g. not building examples or some of the external connectors, if that helps reduce the delta between portage and what's required to build.

Thanks for the heads-up Patrice.

PS: I just looked at the dependency list/tree for the profile I recently built, and despaired - is there a decent automated way to diff out what would need to be added, and what can be pulled from Portage already?
Comment 18 Alec Ten Harmsel 2015-07-20 18:04:51 UTC
make-distribution.sh is a wrapper around maven, that builds the distribution tarballs and documentation and all that jazz. For some insane reason, the developers maintain the maven build and the sbt build.

Rick Moritz:
> Additionally, the different profiles are going to be a challenge.

Building with Hadoop support could be avoided for initial versions. Running `mvn dependency:tree` for Spark gives ~3550 lines of deps (of which probably 500 is 'pretty printing' around the dependencies and probably some of that is duplicates in other modules), and running `mvn dependency:tree` for Hadoop gives >4000 lines (of which a fair amount is not deps, like Spark). I piped it into less and counted over 40 pages of dependencies.

I use/administer Hadoop and Spark on a daily basis and can answer questions about them, but I probably will not have time to help with ebuilds and testing ebuilds until September.
Comment 19 James Horton 2015-08-25 16:17:26 UTC
GOOD NEWS::
Spark downloads are at:: http://spark.apache.org/downloads.html 
Spark-1.5 may be a much better organize set of codes; at least that
is what I read. So::

"It's not yet a final release, but there's a preview:

http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-ANNOUNCE-Spark-1-5-0-preview-package-td13683.html

Building Spark from sources isn't too hard, there's a `make-distribution.sh` script in the root directory. There are a few parameters (like the dependency Hadoop version), but it should be fairly straight forward. More info here:

http://spark.apache.org/docs/latest/building-spark.html  "

Just might make supporting apache-spark on gentoo, easier?

James
Comment 20 James Horton 2015-09-10 20:33:03 UTC
Spark-1.5 has just been released:: 
https://spark.apache.org/releases/spark-release-1-5-0.html

Spark-1.5  boasts a new build script using maven that is
better organized for building from sources::
http://spark.apache.org/docs/latest/building-spark.html


James
Comment 21 James Horton 2015-11-17 17:22:02 UTC
In an email, Patrice Clement (java dev) suggested to me:: 

"As to Spark, since Apache Spark devs have opted for Maven, we can turn Maven pom.xml files into Ant build.xml files and that would make compiling Spark a cakewalk."

I like this statement very,very much. I also have no clue as to how to do this.
Hopefully, some 'java-capable' person at least outlines how to go about doing this?


hth,
James
Comment 22 James 2016-08-08 05:16:43 UTC
Created attachment 442748 [details]
sys-cluster/spark-1.5.2.ebuild

Ok,

So sys-cluster/spark-1.5.2.ebuild  compiles but fails the install phase. Here is the latest error::


>>> Source compiled.
>>> Test phase [not enabled]: sys-cluster/spark-1.5.2

>>> Install spark-1.5.2 into /var/tmp/portage/sys-cluster/spark-1.5.2/image/ category sys-cluster
!!! doexe: /usr/local/portage/sys-cluster/spark/files/1.5.2/beeline does not exist
 * ERROR: sys-cluster/spark-1.5.2::jackslap failed (install phase):
 *   doexe failed
 * 
 * If you need support, post the output of `emerge --info '=sys-cluster/spark-1.5.2::jackslap'`,
 * the complete build log and the output of `emerge -pqv '=sys-cluster/spark-1.5.2::jackslap'`.
 * The complete build log is located at '/var/log/portage/sys-cluster:spark-1.5.2:20160808-060340.log'.
 * For convenience, a symlink to the build log is located at '/var/tmp/portage/sys-cluster/spark-1.5.2/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sys-cluster/spark-1.5.2/temp/environment'.
 * Working directory: '/var/tmp/portage/sys-cluster/spark-1.5.2/work/spark-1.5.2'
 * S: '/var/tmp/portage/sys-cluster/spark-1.5.2/work/spark-1.5.2'
 * QA Notice: file does not exist:
 * 
 * 	doexe: /usr/local/portage/sys-cluster/spark/files/1.5.2/beeline does not exist



But it is using bundles codes so it needs a bit of additional work too. My next steps, after it installs, is to get the latest stable release of Apache-spark (2.0) at least compiling. installing and properly vetted for gentoo standards for tree acceptance. Then make all of this proper for tree inclusion.

Guidance? Anyone?

James
Comment 23 Benda Xu gentoo-dev 2016-08-30 22:33:51 UTC
(In reply to James from comment #22)

> But it is using bundles codes so it needs a bit of additional work too. My
> next steps, after it installs, is to get the latest stable release of
> Apache-spark (2.0) at least compiling. installing and properly vetted for
> gentoo standards for tree acceptance. Then make all of this proper for tree
> inclusion.
> 
> Guidance? Anyone?

Thanks for your work. What is your plan to deal with the bundled codes?

Last time I tried, maven downloaded 2.5GiB of artifacts into my $HOME/.m2 during build.
Comment 24 James 2016-09-01 19:59:44 UTC
@Benda

What is your plan to deal with the bundled codes? Last time I tried, maven downloaded 2.5GiB of artifacts into my $HOME/.m2 during build.

Spark is up to version 2.0 (https://spark.apache.org/). It's pretty easy to 
hack at the ebuild that exist already. Probably much has changes and it's easier to create the ebuild. Java, is just not my area of expertise. You are encouraged to take apache/spark and modify the ebuild I posted to get version 2.0. working. 

Much has changed with the java setup on Gentoo, since I last hacked at this package. If I remember correctly, I had a problem with the (S) Source Directory parameter  on Gentoo not being defined consistent with how the apache folks package up Spark, or something to that effect.

I do not have time to work on apache-spark; besides it really needs a java hacker to work on it.


Good luck!
hth,
James
Comment 25 Benda Xu gentoo-dev 2016-10-08 02:33:03 UTC
FYI, the 0.1 release of java-ebuilder, able to generate the depenency
tree of maven artifacts (of course including org.apache.spark:spark-core), is in the gentoo repository as "app-portage/java-ebuilder".

You are invited to give it a try.

The generated maven overlay is at

  https://github.com/heroxbd/maven-overlay
Comment 26 Benda Xu gentoo-dev 2018-07-02 04:34:37 UTC
Any updates on this bug?
Comment 27 Alec Ten Harmsel 2018-07-03 10:57:24 UTC
I can take a look at updating this ebuild to the latest version (spark 2.3.1 was released June 8, 2018) if there's interest in having this in the tree. However, I don't use Spark anymore on a regular basis.

Using https://github.com/heroxbd/maven-overlay looks interesting - always cool stuff going on in Gentoo.
Comment 28 James 2018-07-31 00:11:35 UTC
Spark 2.3.1 released (Jun 08, 2018)

Hello Alec,

Sure I'd appreciate an update to Spark. 

James
Comment 29 Alec Ten Harmsel 2018-09-21 00:35:18 UTC
Created attachment 547449 [details]
Ebuild for 2.3.1

Here's an ebuild that works - I tested a few examples in the pyspark shell and they completed fine.

There are some repoman warnings that I'm not sure how to take care of:

RepoMan scours the neighborhood...
  ebuild.absdosym               6
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/beeline'... could use relative path on line: 43
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/pyspark'... could use relative path on line: 44
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/spark-class'... could use relative path on line: 45
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/spark-shell'... could use relative path on line: 46
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/spark-sql'... could use relative path on line: 47
   sys-cluster/spark-bin/spark-bin-2.3.1.ebuild: dosym '/usr/lib/spark/bin/spark-submit'... could use relative path on line: 48

I could not get the symlinks to work without absolute paths.

The envd file is just "SPARK_HOME=/usr/lib/spark" - these files are also at https://github.com/trozamon/overlay/tree/master/sys-cluster/spark-bin.
Comment 30 Patrice Clement gentoo-dev 2018-11-03 23:28:37 UTC
Hi Alec

Thank you so much for writing this ebuild. I gave it a try. Using your ebuild, Spark installs just fine. However I run into a weird segfault upon starting Spark with "spark-shell". How did you manage to get Spark running? Your help is much appreciated.
Comment 31 Alec Ten Harmsel 2018-11-03 23:45:06 UTC
(In reply to Patrice Clement from comment #30)
> Hi Alec
> 
> Thank you so much for writing this ebuild. I gave it a try. Using your
> ebuild, Spark installs just fine. However I run into a weird segfault upon
> starting Spark with "spark-shell". How did you manage to get Spark running?
> Your help is much appreciated.

You'll need to make sure to run `env-update && source /etc/profile` - I just realized that's not in the ebuild, so I'll have to look up how to add that.

What version of Java are you using (I have icedtea-3.9.0)? Was there a stack trace and/or what was the text of the error?

For reference, here's the snippet I run in spark-shell to test that it's working:

    val passwd = sc.textFile("/etc/passwd")
    passwd.map(line => line.split(":")(2).toInt).reduce((a, b) => a + b)
Comment 32 Patrice Clement gentoo-dev 2018-11-04 11:18:26 UTC
Thanks Alec! I finally got Spark working. I've made a few tweaks here and there to the ebuild. If you concur, I'm going rename the ebuild "spark-bin". Last but not least: would you like to co-maintain Spark with the Java team?
Comment 33 Alec Ten Harmsel 2018-11-04 20:23:17 UTC
(In reply to Patrice Clement from comment #32)
> I finally got Spark working. I've made a few tweaks here and
> there to the ebuild. If you concur, I'm going rename the ebuild "spark-bin".

Great to here! Yes, tweaks and name change are great.

> Last but not least: would you like to co-maintain Spark with the Java team?

Yes, that sounds wonderful.
Comment 34 Larry the Git Cow gentoo-dev 2018-11-05 09:05:24 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4ba0088e91ae8fcdc0d1f6cb4d499ba5ddabcb3c

commit 4ba0088e91ae8fcdc0d1f6cb4d499ba5ddabcb3c
Author:     Patrice Clement <monsieurp@gentoo.org>
AuthorDate: 2018-11-05 08:43:55 +0000
Commit:     Patrice Clement <monsieurp@gentoo.org>
CommitDate: 2018-11-05 09:05:10 +0000

    sys-cluster/spark-bin: new package.
    
    Apache Spark is a unified analytics engine for large-scale data
    processing.
    
    Closes: https://bugs.gentoo.org/523412
    Signed-off-by: Patrice Clement <monsieurp@gentoo.org>
    Package-Manager: Portage-2.3.49, Repoman-2.3.11

 sys-cluster/spark-bin/Manifest               |  1 +
 sys-cluster/spark-bin/files/99spark          |  1 +
 sys-cluster/spark-bin/metadata.xml           | 16 ++++++++
 sys-cluster/spark-bin/spark-bin-2.3.1.ebuild | 60 ++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)