Package Details: hadoop 3.3.2-1

Git Clone URL: (read-only, click to copy)
Package Base: hadoop
Description: Hadoop - MapReduce implementation and distributed filesystem
Upstream URL:
Licenses: Apache
Submitter: sjakub
Maintainer: severach (12eason)
Last Packager: severach
Votes: 80
Popularity: 0.48
First Submitted: 2009-04-07 16:39 (UTC)
Last Updated: 2022-04-18 03:37 (UTC)

Dependencies (4)

Sources (9)

Latest Comments

severach commented on 2022-04-20 16:06 (UTC)

I don't use hadoop so I can't help make it better. I only grabbed it because at one time there was a threat that orphaned packages would get deleted. Become a co-maintainer or take the package.

gnaggnoyil commented on 2022-04-19 15:05 (UTC)

I agree with @siavoshkc that the current behavior of usr/bin/hadoop directly sourcing /etc/profile.d/*.sh does not fit Archlinux well. I think AUR/hadoop should therefore do some patches to eliminate possible errors.

lllf commented on 2021-05-05 01:53 (UTC)

ERROR: Failure while downloading

URL no longer exists.

sicalxy commented on 2021-04-22 16:42 (UTC) (edited on 2021-04-22 16:44 (UTC) by sicalxy)

[hadoop-conf]: EnvironmentFile for *.service doesn't work

EnvironmentFile should not use source . /etc/profile.d/ It's just a key-value text file (although it has some shell script features), as mentioned in man page [systemd.exec(5)]:

The text file should contain new-line-separated variable assignments. Empty lines, lines without an "=" separator, or lines starting with ; or # will be ignored

sicalxy commented on 2021-04-22 16:41 (UTC)

[hadoop-profile]: old option should be updated

- export HADOOP_SLAVES=/etc/hadoop/slaves
+ export HADOOP_WORKERS=/etc/hadoop/workers

siavoshkc commented on 2021-01-22 19:26 (UTC)

There seems to be a bug in PKGBUILD and consequently in /usr/bin/hadoop.

When a .sh is placed in /etc/profile.d, /etc/profile should be sourced to put .sh files in /etc/profile.d in effect.

In current hadoop 3.3.0-1, there is a loop that tries to source each .sh file in profile.d directory. Because the structure of files in profile.d are dependent on the /etc/profile script, that leads to error such as 'append_path: Command not found'.

Resolution: Change the loop which starts at line 111 and ends in 113 to: . /etc/profile

Musikolo commented on 2020-04-12 16:39 (UTC)

Hi @qsdrqs,

I don't know how to help you with your question about Yarn, but if you want to find the systemd services available, you can do as follows:

[musikolo@MyPc ~]$ pacman -Ql hadoop | grep 'service$'
hadoop /usr/lib/systemd/system/hadoop-datanode.service
hadoop /usr/lib/systemd/system/hadoop-jobtracker.service
hadoop /usr/lib/systemd/system/hadoop-namenode.service
hadoop /usr/lib/systemd/system/hadoop-secondarynamenode.service
hadoop /usr/lib/systemd/system/hadoop-tasktracker.service

I hope it helps.

qsdrqs commented on 2020-04-01 05:57 (UTC) (edited on 2020-04-01 05:57 (UTC) by qsdrqs)

Hello! How can I start yarn service through this package, the script in hadoop/sbin may not recognize my config in /etc, and I can't find any systemd service on my computer to start it.

Looking forward to you reply!

takaomag commented on 2019-12-04 08:33 (UTC) (edited on 2019-12-04 08:34 (UTC) by takaomag)

When I installed this package by yay, I received the following message in the terminal.

yay -S --needed --noconfirm --noprogressbar hadoop


==> Removing existing $srcdir/ directory...

==> Extracting sources...

-> Extracting hadoop-3.2.1.tar.gz with bsdtar

==> Sources are ready.

removing Untracked AUR files from cache...

:: Cleaning (1/1): /var/lib/x-aur-helper/.cache/yay/hadoop

Removing hadoop-3.2.1.tar.gz

Can not find package name : []

I did not modify the PKGBUILD. Does someone knows any solution?

dxxvi commented on 2017-06-07 05:08 (UTC) (edited on 2017-06-07 06:07 (UTC) by dxxvi)

How do I start this hadoop? I try: sudo systemctl start hadoop-datanode hadoop-jobtracker hadoop-namenode hadoop-secondarynamenode hadoop-tasktracker then check their status: systemctl status hadoop-datanode hadoop-jobtracker hadoop-namenode hadoop-secondarynamenode hadoop-tasktracker All of them failed. The jobtracker has this line: Error: JAVA_HOME is not set and could not be found. ------------------------------------------------------------------------------ Self-answer: JAVA_HOME error: Unable to start namenode and datanode: Hadoop ArchWiki to format a new distributed filesystem; for editing core-site.xml and hdfs-site.xml jobtracker and tasktracker cannot start: running the commands in hadoop-jobtracker.service and hadoop-tasktracker.service under the hadoop account shows the reasons (12eason also mentioned that).

12eason commented on 2017-03-14 22:45 (UTC)

First thing, hdfs, mapred, container-executor, rcc and yarn all need to be linked to /usr/bin along with hadoop. Hdfs especially has a lot of the functions previously done by hadoop. Secondly, the hadoop package provides shell scripts under sbin/ to start and stop instances and these would be less prone to breakage if used in the systemd scripts. As it is, many commands systemd uses are depreciated.

nmiculinic commented on 2017-03-11 17:47 (UTC)

There's mirror problems for hadoop: ==> Making package: hadoop 2.7.3-1 (Sat Mar 11 18:48:07 CET 2017) ==> Retrieving sources... -> Downloading hadoop-2.7.3.tar.gz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 Warning: Transient problem: HTTP error Will retry in 3 seconds. 3 retries Warning: left.

flipflop97 commented on 2016-11-21 13:36 (UTC)

Can you symlink /usr/lib/hadoop/bin/mapred to /usr/bin/mapred

severach commented on 2016-09-13 19:07 (UTC) (edited on 2016-09-13 19:18 (UTC) by severach)

I'm looking to save time for others, not myself. The problem is that xz is very useful in the repos where traffic reduction is worth any cost. xz is counter productive on the AUR.

petronny commented on 2016-09-13 03:40 (UTC)

Hi, I found your discussion about the PKGEXT. But have you ever tried to compress the package in parallel?(by setup 'xz -T0' in /etc/makepkg.conf) I got .pkg.tar.xz: 530% cpu 26.731s with CPU E5-2660 0 @ 2.20GHz And I think it may take much less time on a i3/5/7 cpu

ael commented on 2016-07-18 08:47 (UTC)

`hadoop-jobtracker.service` make use of command `/usr/bin/hadoop jobtracker` but is deprecated. The ouput of that command suggest to use the new yarn command.

severach commented on 2016-02-19 18:22 (UTC)

time makepkg -scCf # E3-1245v1 .pkg.tar: 5 seconds 326MB .pkg.tar.gz: 13 seconds 207MB .pkg.tar.xz: 88 seconds 188MB Saving 120MB is worth 8 seconds. Saving 20MB is not worth 75 seconds. I didn't measure decompression time. gz is so fast that it is often faster than not compressing.

jsivak commented on 2016-02-19 13:17 (UTC)

The build worked; I'll submit the issue to the pacaur dev. Thanks

severach commented on 2016-02-19 10:49 (UTC)

If that works, file a bug with pacaur. pacaur can't handle packages made with PKGEXT.

jsivak commented on 2016-02-19 02:51 (UTC) (edited on 2016-02-19 13:24 (UTC) by jsivak)

Edit: Just re-read the Changelog.. I think pacaur 4.5.3 fixes/addresses this issue --- I'm using pacaur and am getting these messages when trying to install the package: ==> Checking for packaging issue... ==> WARNING: backup entry file not in package : etc/hadoop/fair-scheduler.xml ==> WARNING: backup entry file not in package : etc/hadoop/mapred-queue-acls.xml ==> WARNING: backup entry file not in package : etc/hadoop/mapred-site.xml ==> WARNING: backup entry file not in package : etc/hadoop/masters ==> WARNING: backup entry file not in package : etc/hadoop/taskcontroller.cfg ==> WARNING: backup entry file not in package : etc/hadoop/ ==> Creating package "hadoop"... -> Generating .PKGINFO file... -> Generating .BUILDINFO file... -> Adding install file... -> Generating .MTREE file... -> Compressing package... ==> Leaving fakeroot environment. ==> Finished making: hadoop 2.7.2-1 (Thu Feb 18 21:45:22 EST 2016) ==> Cleaning up... :: Installing hadoop package(s)... :: hadoop package(s) failed to install. Check .SRCINFO for mismatching data with PKGBUILD. Any suggestions? Thanks

monksy commented on 2015-08-13 05:30 (UTC)

From starting the jobtracker and tasktracker services I'm getting the following error: What has happened to those services? sudo -u hadoop hadoop jobtracker DEPRECATED: Use of this script to execute mapred command is deprecated. Instead use the mapred command for it. Sorry, the jobtracker command is no longer supported. You may find similar functionality with the "yarn" shell command. Usage: mapred [--config confdir] [--loglevel loglevel] COMMAND where COMMAND is one of: pipes run a Pipes job job manipulate MapReduce jobs queue get information regarding JobQueues classpath prints the class path needed for running mapreduce subcommands historyserver run job history servers as a standalone daemon distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive hsadmin job history server admin interface Most commands print help when invoked w/o parameters.

Jesse2004 commented on 2014-11-15 01:40 (UTC)

@roheim Hi, thanks for fixing the /etc issue! Now there's another problem: I installed the package but get a "Error: JAVA_HOME is not set and could not be found" when the hadoop command is run. It seems that JAVA_HOME is no longer set by /etc/profile.d/ now.

skywalker1993 commented on 2014-11-02 13:52 (UTC)

the url is not found, so as the others in pkgbuild file.

roheim commented on 2014-10-29 02:25 (UTC)

@confusedfla: done

confusedfla commented on 2014-10-29 02:10 (UTC)

you could add polkit as dependency - it is needed for the systemd services :)

roheim commented on 2014-09-29 10:49 (UTC)

I will do the change ASAP.

contradictioned commented on 2014-09-24 13:08 (UTC)

@Jesse2004: In my work with hadoop I formed the habit of having folders like /etc/hadoop/hadoop-0.22 /etc/hadoop/hadoop-1.0.4 /etc/hadoop/hadoop-2.0.2-testing etc. Such that you can easily mount different alternative configurations. This carried over to this package.

jakebailey commented on 2014-09-24 12:54 (UTC)

@Jesse2004: There isn't a reason for it. It shouldn't be there, but the package maintainer hasn't fixed it (read their comment after mine after I mentioned this 8 months ago).

Jesse2004 commented on 2014-09-24 12:51 (UTC)

Why is the cofiguration files located at /etc/hadoop/hadoop/* but not /etc/hadoop/*? What is the extra level for?

roheim commented on 2014-01-19 09:52 (UTC)

@zikaeroh: I have a src file ready for upload that fixes your problem. But the I am getting an error when uploading.

jakebailey commented on 2014-01-13 06:52 (UTC)

Ignore the comment about the /lib/hadoop directory, it's just a symlink. Should have checked that beforehand. However, moving everything from /etc/hadoop/hadoop to /etc/hadoop fixes all the issues I have with running pig with hadoop (as opposed to the local mode).

jakebailey commented on 2014-01-13 06:28 (UTC)

According to the wiki (and the environment variables the package sets), all of the configs and other directories should be at /etc/hadoop/, but it's actually at /etc/hadoop/hadoop/. Should this be different? Also, there are two hadoop folders in /lib/, one as "hadoop" and the other as "hadoop-2.20". Is that intentional?

contradictioned commented on 2013-05-24 11:00 (UTC)

Indeed the problem was an old pacman. Now installation works again. Regarding the second hint: Serious cleanup would be nice, I think a major rewrite even better. But I'm not quite sure how much this would break old installations on updates.

xgdgsc commented on 2013-05-24 03:32 (UTC)

@contradictioned Please follow the suggestions here: I think you may not be using the latest pacman.

contradictioned commented on 2013-05-22 19:05 (UTC)

Hi, sorry but i can not confirm the "permission denied" Problem. Directory /usr/lib/hadoop-1.1.2 is owned and only writable by 'root', but the installation works (tested on a snapshot of a fresh arch installation in a vm).

commented on 2013-05-22 13:17 (UTC)

==> Starting package()... cp: cannot create directory ‘/usr/lib/hadoop-1.1.2’: Permission denied

xgdgsc commented on 2013-05-21 01:05 (UTC)

Sorry, clicked at the wrong place.

xgdgsc commented on 2013-05-21 01:03 (UTC)

cp: cannot create directory ‘/usr/lib/hadoop-1.1.2’: Permission denied

commented on 2013-05-08 19:03 (UTC)

Current PKGFILE fails with: ==> Starting package()... cp: cannot create directory ‘/usr/lib/hadoop-1.1.2’: Permission denied

commented on 2013-04-05 08:04 (UTC)

Please update this package to 1.1.2

contradictioned commented on 2013-02-12 16:25 (UTC)

So far... I updated the package to 1.1.1, added systemd support (so much thanks to MarkusH!) and added apache-ant as dependency. If you have more input for me, please ask :)

contradictioned commented on 2013-02-11 21:39 (UTC)

MarkusH: Looks nice, but you seem to have forgotten to update the MD5 sum of your conf.diff Others: Sorry for forgetting my duty here. I'm on it right now.

MarkusH commented on 2013-02-06 22:16 (UTC)

I added a bunch of improvements regarding the Systemd unit files. See

contradictioned commented on 2012-12-05 11:02 (UTC)

@EiyuuZack: I'm gonna update that. (And for current reasons I will get systemd compatibility.)

commented on 2012-12-04 09:17 (UTC)

New 1.1.1 release is out, any chance you can update it? Also the current source link is dead.

ytj commented on 2012-10-13 17:51 (UTC)

@alperkanat Could you explain why should I place the binaries into /usr/share/hadoop/bin rather than /usr/bin?

karabaja4 commented on 2012-04-01 02:30 (UTC)

dodolee, and others having similar problems, try adding options=(!strip) to the PKGBUILD. This seems an issue only with i686 builds.

commented on 2012-01-27 14:06 (UTC)

The following error occurs while running makepkg: ==> Tidying install... -> Purging unwanted files... -> Compressing man and info pages... -> Stripping unneeded symbols from binaries and libraries... strip:./usr/share/hadoop/bin/task-controller: File format not recognized ==> ERROR: Makepkg was unable to build hadoop.

cgueret commented on 2012-01-18 14:28 (UTC)

Please replace the jre/jdk dependencies by "java-runtime"/"java-environment"

karabaja4 commented on 2012-01-09 16:04 (UTC)

Adopted and updated.

krevedko commented on 2011-08-16 13:50 (UTC)

--- PKGBUILD 2010-09-02 08:03:56.000000000 +0400 +++ PKGBUILD.fixed 2011-08-16 17:48:42.000000000 +0400 @@ -13,7 +13,7 @@ makedepends=('jdk' 'apache-ant') optdepends=('kfs') conflicts=('hadoop-svn') install=hadoop.install -source=("${pkgver}/hadoop-${pkgver}.tar.gz") +source=("${pkgver}/hadoop-${pkgver}.tar.gz") md5sums=('ec0f791f866f82a7f2c1319a54f4db97') #

krevedko commented on 2011-08-16 12:47 (UTC) 403: Forbidden.

alperkanat commented on 2011-04-18 11:50 (UTC)

in fact you're right. hadoop project doesn't seem to provide a redirected (generic) url that will direct you to the closest mirror. so in this case, you may prefer a more central server instead of a US one.

alperkanat commented on 2011-04-18 11:49 (UTC)

how about this one:

sjakub commented on 2011-04-17 22:51 (UTC)

And what url would you prefer?

alperkanat commented on 2011-04-17 22:51 (UTC)

why do you prefer to place the binaries into /usr/share/hadoop/bin rather than /usr/bin or at least /usr/local/bin ?

alperkanat commented on 2011-04-17 22:47 (UTC)

can you please change the package url to a more generic link since it's very slow for european and asian users?

nitralime commented on 2010-12-19 13:53 (UTC)

Isn't it better to change the dependency on "lzo" to "lzo2"?