Package Details: hadoop 2.8.0-1

Git Clone URL: (read-only)
Package Base: hadoop
Description: Hadoop - MapReduce implementation and distributed filesystem
Upstream URL:
Licenses: Apache
Submitter: sjakub
Maintainer: severach (12eason)
Last Packager: severach
Votes: 63
Popularity: 2.136873
First Submitted: 2009-04-07 16:39
Last Updated: 2017-04-29 20:13

Latest Comments

dxxvi commented on 2017-06-07 05:08

How do I start this hadoop? I try:
sudo systemctl start hadoop-datanode hadoop-jobtracker hadoop-namenode hadoop-secondarynamenode hadoop-tasktracker
then check their status:
systemctl status hadoop-datanode hadoop-jobtracker hadoop-namenode hadoop-secondarynamenode hadoop-tasktracker
All of them failed. The jobtracker has this line:
Error: JAVA_HOME is not set and could not be found.
JAVA_HOME error:
Unable to start namenode and datanode: Hadoop ArchWiki to format a new distributed filesystem; for editing core-site.xml and hdfs-site.xml
jobtracker and tasktracker cannot start: running the commands in hadoop-jobtracker.service and hadoop-tasktracker.service under the hadoop account shows the reasons (12eason also mentioned that).

12eason commented on 2017-03-14 22:45

First thing, hdfs, mapred, container-executor, rcc and yarn all need to be linked to /usr/bin along with hadoop. Hdfs especially has a lot of the functions previously done by hadoop.

Secondly, the hadoop package provides shell scripts under sbin/ to start and stop instances and these would be less prone to breakage if used in the systemd scripts. As it is, many commands systemd uses are depreciated.

nmiculinic commented on 2017-03-11 17:47

There's mirror problems for hadoop:

==> Making package: hadoop 2.7.3-1 (Sat Mar 11 18:48:07 CET 2017)
==> Retrieving sources...
-> Downloading hadoop-2.7.3.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
Warning: Transient problem: HTTP error Will retry in 3 seconds. 3 retries
Warning: left.

flipflop97 commented on 2016-11-21 13:36

Can you symlink /usr/lib/hadoop/bin/mapred to /usr/bin/mapred

severach commented on 2016-09-13 19:07

I'm looking to save time for others, not myself. The problem is that xz is very useful in the repos where traffic reduction is worth any cost. xz is counter productive on the AUR.

petronny commented on 2016-09-13 03:40

Hi, I found your discussion about the PKGEXT.
But have you ever tried to compress the package in parallel?(by setup 'xz -T0' in /etc/makepkg.conf)

I got
.pkg.tar.xz: 530% cpu 26.731s
with CPU E5-2660 0 @ 2.20GHz

And I think it may take much less time on a i3/5/7 cpu

ael commented on 2016-07-18 08:47

`hadoop-jobtracker.service` make use of command `/usr/bin/hadoop jobtracker` but is deprecated. The ouput of that command suggest to use the new yarn command.

Spyhawk commented on 2016-02-19 19:36

@severach> I see, this makes sense. Guess I'll have to find a workaround here. This issue is directly related to the removal of the --pkg option of makepkg in pacman 5.0.0.

severach commented on 2016-02-19 18:22

time makepkg -scCf # E3-1245v1
.pkg.tar: 5 seconds 326MB
.pkg.tar.gz: 13 seconds 207MB
.pkg.tar.xz: 88 seconds 188MB

Saving 120MB is worth 8 seconds. Saving 20MB is not worth 75 seconds. I didn't measure decompression time. gz is so fast that it is often faster than not compressing.

Spyhawk commented on 2016-02-19 15:44

@jsivak> Thank for the report.
@severach> Actually, pacaur handles it correctly if the extension is manually changed to ".pkg.tar" inside the PKGBUILD. This only tars but does not compress the package. This is often used with very big packages to save installation time, such as wps-office for example.

I do not understand the rational behind using "pkg.tar.gz" instead of "pkg.tar" if the objective is to save time. The gain would be minimal compared to the default compression settings, and still much slower that no compression at all.

All comments