AUR (en) - apache-spark

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Package Details: apache-spark 4.0.0-1

Package Actions

Git Clone URL:	https://aur.archlinux.org/apache-spark.git (read-only, click to copy)
Package Base:	apache-spark
Description:	A unified analytics engine for large-scale data processing
Upstream URL:	http://spark.apache.org
Keywords:	spark
Licenses:	Apache-2.0
Submitter:	huitseeker
Maintainer:	envolution
Last Packager:	envolution
Votes:	57
Popularity:	0.000005
First Submitted:	2015-10-04 09:31 (UTC)
Last Updated:	2025-05-23 12:54 (UTC)

Dependencies (2)

inetutils (inetutils-git^AUR)
java-runtime-headless (jre10^AUR, jre12^AUR, jdk10^AUR, jdk10-openj9-bin^AUR, jdk7^AUR, jre7^AUR, amazon-corretto-16^AUR, jdk16-adoptopenjdk^AUR, liberica-jre-11-bin^AUR, jdk11-j9-bin^AUR, jre16-openjdk-headless^AUR, jre14-openjdk-headless^AUR, jre15^AUR, jre14^AUR, jre13^AUR, jre16^AUR, jre18-openjdk-headless^AUR, amazon-corretto-19-bin^AUR, liberica-jre-11-full-bin^AUR, jdk13-openjdk-bin^AUR, liberica-jre-8-full-bin^AUR, jre-openj9-headless^AUR, jre12-openjdk-headless^AUR, jdk11-dragonwell-standard-bin^AUR, jdk11-jetbrains-bin^AUR, jdk20-openj9-bin^AUR, zulu-13-bin^AUR, jdk8-dragonwell-extended-bin^AUR, jdk8-dragonwell-standard-bin^AUR, jdk11-dragonwell-extended-bin^AUR, jdk8-j9-bin^AUR, jdk7-j9-bin^AUR, jdk7r1-j9-bin^AUR, jre13-openjdk-headless^AUR, jre15-openjdk-headless^AUR, microsoft-openjdk-11-bin^AUR, microsoft-openjdk-17-bin^AUR, microsoft-openjdk-21-bin^AUR, liberica-nik-24-full-bin^AUR, zulu-8-bin^AUR, liberica-jdk-11-lite-bin^AUR, jre19-openjdk-headless^AUR, zulu-jdk-fx-bin^AUR, zulu-fx-bin^AUR, zulu8-fx-bin^AUR, zulu11-fx-bin^AUR, zulu17-fx-bin^AUR, zulu21-fx-bin^AUR, jdk11-openj9-bin^AUR, jre-openjdk-wakefield-headless^AUR, jre-openjdk-wakefield^AUR, jdk-openjdk-wakefield^AUR, jre-zulu-bin^AUR, jre-zulu-fx-bin^AUR, jdk-openj9-bin^AUR, zulu-11-bin^AUR, jdk8-graalvm-ee-bin^AUR, jre11^AUR, jre17^AUR, jdk8-openj9-bin^AUR, jre-zulu^AUR, jre-zulu-fx^AUR, jre8^AUR, jdk8^AUR, amazon-corretto-17^AUR, amazon-corretto-21-bin^AUR, openjdk-zulu8-ca-fx-bin^AUR, openjdk-zulu11-ca-fx-bin^AUR, openjdk-zulu17-ca-fx-bin^AUR, openjdk-zulu21-ca-fx-bin^AUR, openjdk-liberica8-full-bin^AUR, openjdk-liberica11-full-bin^AUR, openjdk-liberica17-full-bin^AUR, openjdk-liberica21-full-bin^AUR, openjdk-zulu-ca-fx-bin^AUR, openjdk-liberica-full-bin^AUR, jdk8-perf^AUR, zing-21-bin^AUR, jdk17-jetbrains-bin^AUR, jdk21-dragonwell-standard-bin^AUR, java-openjdk-bin^AUR, jre^AUR, jdk^AUR, jre-lts^AUR, jdk-lts^AUR, liberica-jdk-full-bin^AUR, liberica-jdk-21-full-bin^AUR, liberica-jdk-8-full-bin^AUR, liberica-jdk-11-bin^AUR, zulu-17-bin^AUR, liberica-jdk-11-full-bin^AUR, liberica-jdk-21-bin^AUR, zulu-17-fx-bin^AUR, liberica-jdk-17-full-bin^AUR, zulu-21-bin^AUR, amazon-corretto-8^AUR, amazon-corretto-11^AUR, jdk-temurin^AUR, jdk21-temurin^AUR, jdk17-temurin^AUR, jdk11-temurin^AUR, zing-8-bin^AUR, jre21-zulu-bin^AUR, jre17-zulu-bin^AUR, zulu-24-bin^AUR, jdk23-temurin^AUR, liberica-nik-23-full-bin^AUR, zing-23-bin^AUR, jre22-openjdk-headless^AUR, jre22-openjdk^AUR, jdk22-openjdk^AUR, jre23-openjdk-headless^AUR, jre23-openjdk^AUR, jdk23-openjdk^AUR, jdk21-openj9-bin^AUR, jdk17-openj9-bin^AUR, zulu-jre-fx-bin^AUR, jdk21-jetbrains-bin^AUR, jre-jetbrains^AUR, jdk17-dragonwell-standard-bin^AUR, jdk21-dragonwell-extended-bin^AUR, java-openjdk-ea-bin^AUR, jdk-openjdk, jdk11-openjdk, jdk17-openjdk, jdk21-openjdk, jre-openjdk, jre-openjdk-headless, jre11-openjdk, jre11-openjdk-headless, jre17-openjdk, jre17-openjdk-headless, jre21-openjdk, jre21-openjdk-headless, jre8-openjdk-headless)

Required by (3)

polynote (optional)
python-xgboost (optional)
python-xgboost-cuda (optional)

Sources (4)

Latest Comments

« First ‹ Previous 1 2 3 4 5 6 7 Next › Last »

ttc0419 commented on 2023-03-25 08:10 (UTC)

@PolarianDev Added you as co-maintainer, feel free to update it first

PolarianDev commented on 2023-03-23 14:31 (UTC)

@ttc0419 I see in the archives you were the one who requested this to be orphaned, I assume you want the package?

If you do want the package can I have co-maintainer?

If you don't want the package can I claim Maintainer?

PolarianDev commented on 2023-03-23 14:18 (UTC)

Someone orphan requested it, do they mind if I take the package?

dmfay commented on 2020-04-22 20:13 (UTC)

For 2.4.5 (also fixes the worker unit description):

diff --git a/PKGBUILD b/PKGBUILD
index 54ec365..14d1180 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -3,7 +3,7 @@
 # Contributor: Emanuel Fontelles ("emanuelfontelles") <emanuelfontelles@hotmail.com>

 pkgname=apache-spark
-pkgver=2.4.4
+pkgver=2.4.5
 pkgrel=1
 pkgdesc="fast and general engine for large-scale data processing"
 arch=('any')
@@ -26,7 +26,7 @@ source=("https://archive.apache.org/dist/spark/spark-${pkgver}/spark-${pkgver}-b
         'spark-daemon-run.sh'
         'run-master.sh'
         'run-slave.sh')
-sha1sums=('53f99ba8c5a68c941dd17d45393a6040dd0b46c8'
+sha1sums=('338756ea89c2d15985ee24b46cec21bf9c7f2622'
           'ac71d12070a9a10323e8ec5aed4346b1dd7f21c6'
           'a191e4f8f7f8bbc596f4fadfb3c592c3efbc4fc0'
           '3fa39d55075d4728bd447692d648053c9f6b07ec'
diff --git a/apache-spark-slave@.service b/apache-spark-slave@.service
index 453b346..a90e866 100644
--- a/apache-spark-slave@.service
+++ b/apache-spark-slave@.service
@@ -1,5 +1,5 @@
 [Unit]
-Description=Apache Spark Standalone Master
+Description=Apache Spark Worker
 After=network.target

 [Service]

ryukinix commented on 2020-01-17 23:12 (UTC) (edited on 2020-01-17 23:13 (UTC) by ryukinix)

Updating to spark 3.0.0-preview2 it will make works with Python3.8. I'm using this modified version of PKGBUILD: https://github.com/ryukinix/apache-spark-pkgbuild. Until now it's working fine.

5 days ago it was released the v2.4.5 github tag, but is not available yet on apache spark archive the compiled version, for that reason I used 3.0.0 preview2 version which is the most recent version available in the archive.

v2.4.5 and v3.0.0 versions contains the commit that fix the problems with python3.8: https://github.com/apache/spark/commit/811d563fbf60203377e8462e4fad271c1140b4fa

YaLTeR commented on 2019-12-13 08:25 (UTC) (edited on 2019-12-13 08:45 (UTC) by YaLTeR)

Doesn't seem to work with Python 3 (the default)?

└─ pyspark
Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true
Python 3.8.0 (default, Oct 23 2019, 18:51:26)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/opt/apache-spark/python/pyspark/shell.py", line 31, in <module>
    from pyspark import SparkConf
  File "/opt/apache-spark/python/pyspark/__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "/opt/apache-spark/python/pyspark/context.py", line 31, in <module>
    from pyspark import accumulators
  File "/opt/apache-spark/python/pyspark/accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/opt/apache-spark/python/pyspark/serializers.py", line 71, in <module>
    from pyspark import cloudpickle
  File "/opt/apache-spark/python/pyspark/cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/opt/apache-spark/python/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

Update: looks like it's Python 3.8; installing python37 from the AUR and using that seems to work.

tir commented on 2019-07-17 10:35 (UTC) (edited on 2019-07-17 10:39 (UTC) by tir)

I had success with the following PKGBUILD (patch: https://git.io/fj1CT).

# Maintainer: François Garillot ("huitseeker") <francois [at] garillot.net>
# Contributor: Christian Krause ("wookietreiber") <kizkizzbangbang@gmail.com>

pkgname=apache-spark
pkgver=2.4.3
pkgrel=1
pkgdesc="fast and general engine for large-scale data processing"
arch=('any')
url="http://spark.apache.org"
license=('APACHE')
depends=('java-environment>=6' 'java-environment<9')
optdepends=('python2: python2 support for pyspark'
            'ipython2: ipython2 support for pyspark'
            'python: python3 support for pyspark'
            'ipython: ipython3 support for pyspark'
            'r: support for sparkR'
            'rsync: support rsync hadoop binaries from master'
            'hadoop: support for running on YARN')
install=apache-spark.install
source=("https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=spark/spark-${pkgver}/spark-${pkgver}-bin-without-hadoop.tgz"
        'apache-spark-master.service'
        'apache-spark-slave@.service'
        'spark-env.sh'
        'spark-daemon-run.sh'
        'run-master.sh'
        'run-slave.sh')
sha1sums=('54bf6a19eb832dc0cf2d7a7465b785390d00122b'
          'ac71d12070a9a10323e8ec5aed4346b1dd7f21c6'
          'a191e4f8f7f8bbc596f4fadfb3c592c3efbc4fc0'
          '3fa39d55075d4728bd447692d648053c9f6b07ec'
          '08557d2d5328d5c99e533e16366fd893fffaad78'
          '323445b8d64aea0534a2213d2600d438f406855b'
          '65b1bc5fce63d1fa7a1b90f2d54a09acf62012a4')
backup=('etc/apache-spark/spark-env.sh')

PKGEXT=${PKGEXT:-'.pkg.tar.xz'}

prepare() {
  cd "$srcdir/spark-${pkgver}-bin-without-hadoop"
}

package() {
        cd "$srcdir/spark-${pkgver}-bin-without-hadoop"

        install -d "$pkgdir/usr/bin" "$pkgdir/opt" "$pkgdir/var/log/apache-spark" "$pkgdir/var/lib/apache-spark/work"
        chmod 2775 "$pkgdir/var/log/apache-spark" "$pkgdir/var/lib/apache-spark/work"

        cp -r "$srcdir/spark-${pkgver}-bin-without-hadoop" "$pkgdir/opt/apache-spark/"

        cd "$pkgdir/usr/bin"
        for binary in beeline pyspark sparkR spark-class spark-shell find-spark-home spark-sql spark-submit load-spark-env.sh; do
                binpath="/opt/apache-spark/bin/$binary"
                ln -s "$binpath" $binary
                sed -i 's|^export SPARK_HOME=.*$|export SPARK_HOME=/opt/apache-spark|' "$pkgdir/$binpath"
                sed -i -Ee 's/\$\(dirname "\$0"\)/$(dirname "$(readlink -f "$0")")/g' "$pkgdir/$binpath"
        done

        mkdir -p $pkgdir/etc/profile.d
        echo '#!/bin/sh' > $pkgdir/etc/profile.d/apache-spark.sh
        echo 'SPARK_HOME=/opt/apache-spark' >> $pkgdir/etc/profile.d/apache-spark.sh
        echo 'export SPARK_HOME' >> $pkgdir/etc/profile.d/apache-spark.sh
        chmod 755 $pkgdir/etc/profile.d/apache-spark.sh

        install -Dm644 "$srcdir/apache-spark-master.service" "$pkgdir/usr/lib/systemd/system/apache-spark-master.service"
        install -Dm644 "$srcdir/apache-spark-slave@.service" "$pkgdir/usr/lib/systemd/system/apache-spark-slave@.service"
        install -Dm644 "$srcdir/spark-env.sh" "$pkgdir/etc/apache-spark/spark-env.sh"
        for script in run-master.sh run-slave.sh spark-daemon-run.sh; do
            install -Dm755 "$srcdir/$script" "$pkgdir/opt/apache-spark/sbin/$script"
        done
        install -Dm644 "$srcdir/spark-${pkgver}-bin-without-hadoop/conf"/* "$pkgdir/etc/apache-spark"

        cd "$pkgdir/opt/apache-spark"
        mv conf conf-templates
        ln -sf "/etc/apache-spark" conf
        ln -sf "/var/lib/apache-spark/work" .
}

There is another, unrelated issue probably with JLine (as noted here: https://github.com/sanori/spark-sbt/issues/4#issuecomment-401621777), for which the workaround would be:

$ TERM=xterm-color spark-shell
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 10.0.2)
Type in expressions to have them evaluated.
Type :help for more information.

lukaszimmermann commented on 2019-07-15 09:05 (UTC)

Hi, sorry for being unresponsive the last couple of month. Emanuel I added you as co-maintainer, can you perhaps update this package with the new PKGBUILD for Spark 2.4.3.

emanuelfontelles commented on 2019-07-13 17:56 (UTC) (edited on 2019-07-13 17:57 (UTC) by emanuelfontelles)

Hey guys, here are a full PKGBUILD with Spark-2.4.3, with Hadoop binaries, its totally works

# Maintainer: François Garillot ("huitseeker") <francois [at] garillot.net>
# Contributor: Christian Krause ("wookietreiber") <kizkizzbangbang@gmail.com>
# Contributor: Emanuel Fontelles ("emanuelfontelles") <emanuelfontelles@hotmail.com>

pkgname=apache-spark
pkgver=2.4.3
pkgrel=1
pkgdesc="fast and general engine for large-scale data processing"
arch=('any')
url="http://spark.apache.org"
license=('APACHE')
depends=('java-environment>=6' 'java-environment<9')
optdepends=('python2: python2 support for pyspark'
            'ipython2: ipython2 support for pyspark'
            'python: python3 support for pyspark'
            'ipython: ipython3 support for pyspark'
            'r: support for sparkR'
            'rsync: support rsync hadoop binaries from master'
            'hadoop: support for running on YARN')

install=apache-spark.install
source=("https://archive.apache.org/dist/spark/spark-${pkgver}/spark-${pkgver}-bin-hadoop2.7.tgz"
        'apache-spark-master.service'
        'apache-spark-slave@.service'
        'spark-env.sh'
        'spark-daemon-run.sh'
        'run-master.sh'
        'run-slave.sh')
sha1sums=('7b2f1be5c4ccec86c6d2b1e54c379b7af7a5752a'
          'ac71d12070a9a10323e8ec5aed4346b1dd7f21c6'
          'a191e4f8f7f8bbc596f4fadfb3c592c3efbc4fc0'
          '3fa39d55075d4728bd447692d648053c9f6b07ec'
          '08557d2d5328d5c99e533e16366fd893fffaad78'
          '323445b8d64aea0534a2213d2600d438f406855b'
          '65b1bc5fce63d1fa7a1b90f2d54a09acf62012a4')
backup=('etc/apache-spark/spark-env.sh')

PKGEXT=${PKGEXT:-'.pkg.tar.xz'}

prepare() {
  cd "$srcdir/spark-${pkgver}-bin-hadoop2.7"
}

package() {
        cd "$srcdir/spark-${pkgver}-bin-hadoop2.7"

        install -d "$pkgdir/usr/bin" "$pkgdir/opt" "$pkgdir/var/log/apache-spark" "$pkgdir/var/lib/apache-spark/work"
        chmod 2775 "$pkgdir/var/log/apache-spark" "$pkgdir/var/lib/apache-spark/work"

        cp -r "$srcdir/spark-${pkgver}-bin-hadoop2.7" "$pkgdir/opt/apache-spark/"

        cd "$pkgdir/usr/bin"
        for binary in beeline pyspark sparkR spark-class spark-shell find-spark-home spark-sql spark-submit load-spark-env.sh; do
                binpath="/opt/apache-spark/bin/$binary"
                ln -s "$binpath" $binary
                sed -i 's|^export SPARK_HOME=.*$|export SPARK_HOME=/opt/apache-spark|' "$pkgdir/$binpath"
                sed -i -Ee 's/\$\(dirname "\$0"\)/$(dirname "$(readlink -f "$0")")/g' "$pkgdir/$binpath"
        done

        mkdir -p $pkgdir/etc/profile.d
        echo '#!/bin/sh' > $pkgdir/etc/profile.d/apache-spark.sh
        echo 'SPARK_HOME=/opt/apache-spark' >> $pkgdir/etc/profile.d/apache-spark.sh
        echo 'export SPARK_HOME' >> $pkgdir/etc/profile.d/apache-spark.sh
        chmod 755 $pkgdir/etc/profile.d/apache-spark.sh

        install -Dm644 "$srcdir/apache-spark-master.service" "$pkgdir/usr/lib/systemd/system/apache-spark-master.service"
        install -Dm644 "$srcdir/apache-spark-slave@.service" "$pkgdir/usr/lib/systemd/system/apache-spark-slave@.service"
        install -Dm644 "$srcdir/spark-env.sh" "$pkgdir/etc/apache-spark/spark-env.sh"
        for script in run-master.sh run-slave.sh spark-daemon-run.sh; do
            install -Dm755 "$srcdir/$script" "$pkgdir/opt/apache-spark/sbin/$script"
        done
        install -Dm644 "$srcdir/spark-${pkgver}-bin-hadoop2.7/conf"/* "$pkgdir/etc/apache-spark"

        cd "$pkgdir/opt/apache-spark"
        mv conf conf-templates
        ln -sf "/etc/apache-spark" conf
        ln -sf "/var/lib/apache-spark/work" .
}

marcinn commented on 2019-06-14 20:46 (UTC) (edited on 2019-06-14 20:46 (UTC) by marcinn)

PKGBUILD patch for 2.4.3:

diff --git a/PKGBUILD b/PKGBUILD
index 3540d3e..e31fec9 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -2,7 +2,7 @@
 # Contributor: Christian Krause ("wookietreiber") <kizkizzbangbang@gmail.com>

 pkgname=apache-spark
-pkgver=2.4.0
+pkgver=2.4.3
 pkgrel=1
 pkgdesc="fast and general engine for large-scale data processing"
 arch=('any')
@@ -17,14 +17,14 @@ optdepends=('python2: python2 support for pyspark'
             'rsync: support rsync hadoop binaries from master'
             'hadoop: support for running on YARN')
 install=apache-spark.install
-source=("https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=spark/spark-${pkgver}/spark-${pkgver}-bin-without-hadoop.tgz"
+source=("https://archive.apache.org/dist/spark/spark-${pkgver}/spark-${pkgver}-bin-without-hadoop.tgz"
         'apache-spark-master.service'
         'apache-spark-slave@.service'
         'spark-env.sh'
         'spark-daemon-run.sh'
         'run-master.sh'
         'run-slave.sh')
-sha1sums=('ce6fe98272b78a5c487d16f0f5f828908b65d7fe'
+sha1sums=('54bf6a19eb832dc0cf2d7a7465b785390d00122b'
           'ac71d12070a9a10323e8ec5aed4346b1dd7f21c6'
           'a191e4f8f7f8bbc596f4fadfb3c592c3efbc4fc0'
           '3fa39d55075d4728bd447692d648053c9f6b07ec'