Search Criteria
Package Details: ocrmypdf 13.6.0-1
Git Clone URL: | https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy) |
---|---|
Package Base: | ocrmypdf |
Description: | A tool to add an OCR text layer to scanned PDF files, allowing them to be searched |
Upstream URL: | https://github.com/jbarlow83/OCRmyPDF |
Licenses: | MPL2 |
Submitter: | dreuter |
Maintainer: | fbrennan (pigmonkey) |
Last Packager: | pigmonkey |
Votes: | 73 |
Popularity: | 1.78 |
First Submitted: | 2014-01-27 11:36 (UTC) |
Last Updated: | 2022-07-12 17:21 (UTC) |
Dependencies (19)
- ghostscript
- img2pdf (img2pdf-git)
- pngquant
- python (python38, python37, nogil-python, python39, python36, python32, python311)
- python-coloredlogs
- python-importlib_resources
- python-packaging
- python-pdfminer (pdfminer)
- python-pikepdf
- python-pillow (python-pillow-git, python-pillow-simd)
- python-pluggy
- python-reportlab
- python-tqdm
- tesseract (tesseract-ocr-git, tesseract-git)
- unpaper
- python-pip (make)
- python-setuptools (make)
- python-setuptools-scm-git-archive (make)
- jbig2enc (jbig2enc-git, jbig2enc) (optional) – Better compression algorithm; results in smaller PDF files
Required by (4)
- docspell-joex (optional)
- paperless-ng
- paperless-ngx
- phoronix-test-suite-git (optional)
Latest Comments
pigmonkey commented on 2022-08-07 23:11 (UTC)
This package cannot be updated until python-setuptools-scm in community reaches 7.0.5.
zcc2xj commented on 2022-06-04 00:42 (UTC)
@marco.righi
by the way, make sure: add python-cryptography to IgnorePkg? [y/N] y
otherwise you'll pacman -Syu it to 37.0.0
marco.righi commented on 2022-05-29 21:31 (UTC)
@zcc2xj it works! Thx a lot!
zcc2xj commented on 2022-05-28 02:19 (UTC) (edited on 2022-05-28 02:25 (UTC) by zcc2xj)
@marco.righi
try this.
sudo pacman -S downgrade
sudo DOWNGRADE_FROM_ALA=1 downgrade python-cryptography
choose python-cryptography-36.0.0
type number, maybe 33
fbrennan commented on 2022-05-24 22:16 (UTC)
...Did you make sure the file exists? That package is now up to 37.0.0-1 in [extra] and has been flagged OOD again so is likely to be updated again soon.
marco.righi commented on 2022-05-24 09:12 (UTC) (edited on 2022-05-24 09:13 (UTC) by marco.righi)
Can you please resolve?
has result
mb720 commented on 2022-05-16 18:17 (UTC)
After upgrading ocrmypdf to version 13.4.4, I needed to downgrade the cryptography package with
sudo pacman -U /var/cache/pacman/pkg/python-cryptography-36.0.0-1-x86_64.pkg.tar.zst
to get rid of the errorhooregi commented on 2022-05-12 01:12 (UTC)
Run:
To downgrade
python-pdfminer
to the 20220319 version.android_aur commented on 2022-05-10 11:53 (UTC) (edited on 2022-05-10 11:56 (UTC) by android_aur)
@allexj
I also get this error:
pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20220319,>=20191110' distribution was not found and is required by ocrmypdf
Downgrading python-pdfminer via
sudo downgrade python-pdfminer
to version 20220319-1 "fixed" the issue for me (until there is a real fix I guess)allexj commented on 2022-05-08 07:28 (UTC) (edited on 2022-05-08 07:33 (UTC) by allexj)
pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20220319,>=20191110' distribution was not found and is required by ocrmypdf free(): invalid pointer Aborted (core dumped)
Even if I add the line "sed -i "s|20220319|20220506|g" setup.cfg" before setup.py
drik commented on 2022-05-08 00:32 (UTC)
The line is now:
sed -i "s|20220319|20220506|g" setup.cfg
NickJolly commented on 2022-04-04 08:24 (UTC)
@frankspace: Yes, it did help. Thank you once again for sharing. It was too logical and clear not to work. I just successfully retried and compile it manually after implementing the fix in the PKGBUILD. There might have been a typo on the first try. Sorry for taking your time away, but at least it helped me starting learning about how to fix this kind of annoyances by myself. Have a nice day and stay safe.
Ps: still out of date though, thence a workaround by the end user is still needed, unfortunately. At least it has not yet crashed on me, under heavy usage.
frankspace commented on 2022-04-01 14:56 (UTC) (edited on 2022-04-01 14:59 (UTC) by frankspace)
@NickJolly: Sorry about that.
The purpose of the fix was to implement the upstream commit that fixed pdfminer compatibility: https://github.com/ocrmypdf/OCRmyPDF/commit/04996caac34a418cf233c0f3c8ac436b6f2b5920
I unfortunately don't have any idea how to do that with a python package by way of stuff like a git patch or whatever, but the only functional part of that commit is very simple: changing a version number in setup.cfg. Although
sed
's syntax occasionally ranges from opaque to outright insane, that's a pretty simple fix, because no special characters are involved and it's a unique number that occurs only once in a single file.For context, here is my entire (as amended)
package()
section:I just double-checked that it does compile for me, and work afterwards, in a clean chroot. I should point out that I only use AUR helpers to check for packages that need updating, I always compile stuff with makepkg. Also, I use Artix, but that really shouldn't make a difference.
Does that help?
EDIT: I see upstream is claiming their test suite fails here: https://github.com/ocrmypdf/OCRmyPDF/issues/937#issuecomment-1082721212 -- so it's possible this fix works for my (rather simple) use-cases but won't work for everyone. That, I wouldn't have a clue about.
NickJolly commented on 2022-04-01 14:20 (UTC)
Hi there @frankspace. The fix you kindly shared did not work for me. Would you mind elaborating on it? There must be something I am missing. Thank you
pigmonkey commented on 2022-03-30 03:07 (UTC)
This pkgbuild tracks the upstream package from PyPi, so it will not update to 13.4.2 until upstream pushes the new release there.
https://github.com/ocrmypdf/OCRmyPDF/issues/937
frankspace commented on 2022-03-24 06:39 (UTC) (edited on 2022-03-24 06:39 (UTC) by frankspace)
The latest update to python-pdfminer breaks ocrmypdf. Until upstream puts out a new version, the fix is pretty simple: just add a line with
sed -i "s|20211012|20220319|g" setup.cfg
to thepackage()
section before the line with setup.py.malacology commented on 2022-02-13 12:10 (UTC)
@allexj, you need to install python-setuptools to solve it, img2pdf already reply on this package, so I am a little worried about your dependcies
allexj commented on 2022-02-12 10:36 (UTC)
$ ocrmypdf /usr/lib/python3.10/site-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 2.0.5-build-libtorrent-rasterbar-src-libtorrent-rasterbar-2.0.5-bindings-python is an invalid version and will not be supported in a future release warnings.warn( Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 33, in <module> sys.exit(load_entry_point('ocrmypdf==13.3.0', 'console_scripts', 'ocrmypdf')()) File "/usr/lib/python3.10/site-packages/ocrmypdf/main.py", line 35, in run _parser, options, plugin_manager = get_parser_options_plugins(args=args) File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 116, in get_parser_options_plugins plugin_manager = get_plugin_manager(pre_options.plugins) File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 104, in get_plugin_manager pm = OcrmypdfPluginManager( File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 45, in init self.setup_plugins() File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 73, in setup_plugins module = importlib.import_module(name) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/lib/python3.10/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 11, in <module> from ocrmypdf._exec import ghostscript File "/usr/lib/python3.10/site-packages/ocrmypdf/_exec/ghostscript.py", line 21, in <module> from PIL import Image, UnidentifiedImageError ImportError: cannot import name 'UnidentifiedImageError' from 'PIL' (/home/allexj/.local/lib/python3.10/site-packages/PIL/init.py)
hirunatan commented on 2022-01-26 15:57 (UTC)
Perhaps it will be good to notify the user, after installing, that they need to install the tesseract-data language packages, to use it.
https://ocrmypdf.readthedocs.io/en/latest/installation.html#arch-linux-aur
marco.righi commented on 2022-01-11 15:58 (UTC) (edited on 2022-01-11 16:00 (UTC) by marco.righi)
@bsdiceRobert, thanks a lot for your suggestion. I wrote the following code that should re-compile packages one by one. Perhaps the script rebuilds some packages more times but avoids errors that could stop the entire rebuild process.
bsdice commented on 2022-01-11 15:37 (UTC) (edited on 2022-01-11 15:37 (UTC) by bsdice)
FYI the script snippet will not rebuild anything by itself but only check for directories older than the most current /usr/lib/python3.* directory. If you have python3.10 + python3.9 + python3.8 it will look at only 3.9 and 3.8 and then list all packages referencing these obsolete directories. If you reinstall these packages they should be installed for the most recent 3.10 in this example and while doing so, get removed from 3.9 or 3.8. So if you run the snippet again, the number of packages shown will shrink. In theory you could add
after the "pacman" command before the "done", but better do it manually.
bsdice commented on 2022-01-11 11:31 (UTC)
@marco.righi You can try this within a script:
Then use yay pikaur or whatever to rebuild anything found.
marco.righi commented on 2022-01-11 09:16 (UTC)
Do you know a script to rebuild all AUR Python dependencies?
nottoday commented on 2021-12-24 14:30 (UTC)
@jbarlow python-pikepdf is on version 4.2.0-1. I've tried updating it to 4.2.0-2 (from the arch repo). But that still gives the same error.
jbarlow commented on 2021-12-24 00:06 (UTC)
@nottoday Python-pikepdf is likely out of date.
nottoday commented on 2021-12-23 16:49 (UTC) (edited on 2021-12-24 13:35 (UTC) by nottoday)
I have a problem that has started since 13.0.0.
The following command
gives the following error output.
I'm on Manjaro in case that makes a difference.
Thanks in advance.
malacology commented on 2021-12-14 00:59 (UTC)
okay, thanks it is solved
pigmonkey commented on 2021-12-14 00:50 (UTC)
Python AUR packages need to be rebuilt after Python upgrades.
The version bump I just pushed for 13.1.1 will cause this package to get rebuilt, however you will need to manually rebuild any AUR Python dependencies which have not incremented their pkgrel for the new Python (python-coloredlogs, python-humanfriendly). There's nothing we can do about those from this package.
malacology commented on 2021-12-13 22:44 (UTC)
After upgrade to python 3.10
https://github.com/ocrmypdf/OCRmyPDF/issues/872#issuecomment-992025153
jvn01 commented on 2021-10-02 13:53 (UTC)
Give me error " python-distlib-0.3.2-1-any.pkg.tar.zst failed to download"
bot198042362134 commented on 2021-09-23 07:48 (UTC)
There are two missing dependencies: tesseract-data-eng and python-sortedcontainers
To solve this issue simply do:
lightsaber commented on 2021-08-24 18:31 (UTC)
Got this traceback:
pigmonkey commented on 2021-08-08 22:37 (UTC)
This looks like a problem with the pdfminer package.
The latest version of the Arch package removed the dependency on
python-sortedcontainers
. Upstream does not actually need sortedcontainers and has removed the dependency, but that change has not been tagged in a release yet. So the Archpython-pdfminer
needs to either incorporate that unreleased patch, or re-add thepython-sortedcontainers
dependency in their PKGBUILD.In the meantime, downgrading to
python-pdfminer
version 20201018-2 will fix the problem.alkaid commented on 2021-08-08 21:30 (UTC)
Missing dependencies
python-sortedcontainers
The original traceback from python is
fbrennan commented on 2021-07-26 05:09 (UTC)
Thanks @Lucki…I think most users would have
pip
installed, so we missed that one. Or, it wasn't required until recently. Either way, 12.3.0 pkgrel 3 has it, and it will be a make dependency going forwards.Lucki commented on 2021-07-25 01:35 (UTC)
Python complains about
pip
not being available:/usr/bin/python: No module named pip
.pigmonkey commented on 2021-05-29 18:02 (UTC)
I'm not getting that error. Perhaps you need to do a clean rebuild of python-coloredlogs for some reason.
Sproid commented on 2021-05-29 15:19 (UTC)
It is giving me this error:
$ ocrmypdf Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 33, in <module> sys.exit(load_entry_point('ocrmypdf==12.0.3', 'console_scripts', 'ocrmypdf')()) File "/usr/bin/ocrmypdf", line 25, in importlib_load_entry_point return next(matches).load() File "/usr/lib/python3.9/importlib/metadata.py", line 77, in load module = import_module(match.group('module')) File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 680, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 855, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "/usr/lib/python3.9/site-packages/ocrmypdf/__init__.py", line 10, in <module> from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "/usr/lib/python3.9/site-packages/ocrmypdf/helpers.py", line 22, in <module> import pikepdf File "/usr/lib/python3.9/site-packages/pikepdf/__init__.py", line 19, in <module> from ._version import __version__ File "/usr/lib/python3.9/site-packages/pikepdf/_version.py", line 7, in <module> from pkg_resources import DistributionNotFound File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3243, in <module> def _initialize_master_working_set(): File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3226, in _call_aside f(*args, **kwargs) File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3255, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 568, in _build_master ws.require(__requires__) File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 886, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 772, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'coloredlogs>=14.0' distribution was not found and is required by ocrmypdf
I do have "python-coloredlogs 15.0-1" installed.mmberlin commented on 2021-04-29 17:36 (UTC) (edited on 2021-04-29 17:39 (UTC) by mmberlin)
missing (make) dependency: python-setuptools-scm-git-archive
aslakstubsgaard commented on 2021-02-08 15:35 (UTC) (edited on 2021-02-08 15:36 (UTC) by aslakstubsgaard)
using yay. tried doing a clean build of first python-humanfriendly, then subsequently python-coloredlogs, and finally ocrmypdf and it now works again. hope this can help others with the same issue.
pigmonkey commented on 2021-02-04 21:15 (UTC)
I can't recreate that error. This package has a dependency for python-coloredlogs, which in turn is dependent on python-humanfriendly. Whatever AUR helper you're using should pick all that up.
Perhaps you are running an old version of python-humanfriendly. It looks like that AUR package was updated to 1.9 on 2020-12-10.
aslakstubsgaard commented on 2021-02-04 16:51 (UTC) (edited on 2021-02-04 16:52 (UTC) by aslakstubsgaard)
did a fresh build but getting the error:
ginkel commented on 2020-10-26 10:56 (UTC)
ocrmypdf
currently fails to work with the recently updatedpython-pdfminer
package. Downgrading the package topython-pdfminer-20200726-1
works around the issue for now.pigmonkey commented on 2020-10-19 12:42 (UTC)
I still use the package, so I'm happy to continue updating or to step back. No preference.
fbrennan commented on 2020-10-18 23:02 (UTC)
Hello all.
I'm back to using Arch if pigmonkey no longer wants to maintain this package. :-)
But I think they've done a good job so can also just give them the package. I can also just do nothing, but since I'm back in that situation it can be confusing who is responsible to push the update.
Which would you prefer?
pigmonkey commented on 2020-10-14 22:36 (UTC)
tesseract-data-osd
is included with the standardtesseract
Arch package.Looking at the "Required By" section of the tesseract-data-eng package, it does not appear that it is common for other Arch packages to list it as a dependency.
If this is confusing for users, I think it would be acceptable to add it as an optional dependency, so that there is an indication at the end of the install that another package might be needed. But it may be weird for non-English speakers if the package has an optional dependency on the English language pack, but not whatever data pack is needed for the user's native language. I don't really want a 106 item
optdepends
array for every possible language pack.jbarlow commented on 2020-10-14 07:07 (UTC)
OCRmyPDF assumes English unless a language is specified with
-l fra
for example. So strictly speaking it works, but you have to issue the option every time. The test suite also assumes English is installed. I believe most package managers have added an explicit dependency on tesseract-data-eng or whatever it's called in the system, but some have not.I did poll users whether to default to the system language based on locale, but surprisingly non-English users didn't like the idea.
OCRmyPDF does assume tesseract-data-osd is installed so that should be a dependency if Arch breaks that out as a separate package.
pigmonkey commented on 2020-10-13 16:51 (UTC)
Tesseract does require a data package to be installed, but it does not have to be English. If a language is not specified, Tesseract does assume English, hence the error.
I don't think it's appropriate to include
tesseract-data-eng
as a dependency since that might not be the user's language.ioan commented on 2020-10-13 13:45 (UTC)
crmypdf test.pdf test2.pdf Tesseract failed to report available languages. Output from Tesseract:
Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! List of available languages (1): osd
looks like it needs eng data by default
jorges commented on 2020-08-05 19:49 (UTC)
Thanks for the explanation! I just got rid of pyhton-pdfminer.six from AUR and downgraded python-pdfminer to 20200517-1. OCRMyPDF works and all is well!
pigmonkey commented on 2020-07-29 17:39 (UTC)
It's a little convoluted, but here is what I think is happening:
The confusingly-named
python-pdfminer
from community that we use is in fact python-pdfminer.six. You can verify that by looking at its PKGBUILD. The AUR python-pdfminer.six is basically the same package, except it pulls from PyPi instead of Github and is on an outdated version (20200124 instead of community's 20200720).OCRMyPDF claims to support 20200720, but that version of python-pdfminer{,.six} dropped
PDFTextExtractionNotAllowed
. This apparently was unintentional and has been reversed in 20200726. But as of now 20200726 has not been officially tagged.So, we need to wait for upstream python-pdfminer.six to make 20200726 official, and then wait for the community maintainer to update the python-pdfminer package to 20200726. And then we need to wait for upstream OCRMyPDF to release a new version that notes support for 20200726. Then I can update this package and everything will be copacetic.
In the meantime, you can downgrade the community python-pdfminer package to the previous version, or run the much older version provided by the AUR python-pdfminer.six package.
jorges commented on 2020-07-29 11:18 (UTC) (edited on 2020-07-29 11:22 (UTC) by jorges)
I was getting the traceback shown below with python-pdfminer. I was able to solve the problem by removing that package and installing python-pdfminer.six. I other people can confirm this maybe the package dependencies have to be changed?
bsdice commented on 2020-07-23 12:03 (UTC) (edited on 2020-07-23 12:03 (UTC) by bsdice)
Anybody else getting tracebacks when using --threshold?
marlemion commented on 2020-07-16 06:47 (UTC) (edited on 2020-07-16 12:30 (UTC) by marlemion)
Never mind the below. For some reason, some files were missing from my system.
Fully updated arch and updated ocrmypdf to the latest via AUR:
Packages:
What is the problem?
xuanruiqi commented on 2020-07-03 02:28 (UTC)
Now that
python-pillow
incommunity
has been updated to 7.2.0, the block on updating this should be no longer existent.pigmonkey commented on 2020-06-14 18:58 (UTC)
I pinged the python-pillow packager. The package had simply fallen through the cracks and he will be updating it today, but 7.0 introduced some API breakage so the upgraded package will probably hang out in the testing repo for a bit.
fbrennan commented on 2020-06-13 01:03 (UTC)
It might make sense to put it an orphan request for python-pillow-git, then update that, then temporarily require it, @pigmonkey, given how long the community package has been out of date. Though, it's of course up to you, as it might be too much work.
jbarlow commented on 2020-06-13 00:41 (UTC)
Upstream here. I noticed python-pillow in AUR is quite old so this could be a blocker for some time.
ocrmypdf does work with pillow 6.2.1, with all tests passing. You could override the requirement and permit the earlier pillow. (I'd rather not change this upstream, so that upstream reflects the configuration that is being tested.)
On another note, I strongly doubt that pillow-simd would yield any measurable change in performance so it would not be worth the effort to integrate this.
pigmonkey commented on 2020-06-12 21:45 (UTC)
This package is stuck on 9.8.2 until the community
python-pillow
package is upgraded to >=7.0.0.pigmonkey commented on 2020-05-28 22:17 (UTC)
Thanks for identifying the issue. It looks like v9.8.1 fixes this and is in the process of being pushed to pypi.
brianmercer commented on 2020-05-28 21:18 (UTC)
Temporary workaround is to roll back python-pdfminer to the prior version:
pacman -U /var/cache/pacman/pkg/python-pdfminer-20200402-1-any.pkg.tar.zst
and optionally add
IgnorePkg = python-pdfminer
to the /etc/pacman.conf file to keep it from upgrading for now.
chrisberkhout commented on 2020-05-28 21:05 (UTC)
Last line of the error message is
That's from the
python-pdfminer
package, which is in the dependencies, it's just that the current version ispython-pdfminer-20200517-1
andocrmypdf
apparently needs an earlier version.It seems this has happened before: https://github.com/jbarlow83/OCRmyPDF/issues/457
I added a new issue: https://github.com/jbarlow83/OCRmyPDF/issues/566
oriba commented on 2020-05-28 20:26 (UTC)
ocrmypdf, built from this package, does not work anymore. Some days ago it worked. (Sidenote: I also had issues with matplotib, some ugly things may happen these days in the python field).
I got the a quite long message, and one of the things mentioned was "pdfminer.six" together with ContextualVersionConflict.
Looking at the package-dependencies, pdfminer.six is not in there. So it should be added. Also certain versions seem to be needed. Let me know if you want the complete error message, then I may paste it somewhere.
rharish commented on 2020-05-01 12:51 (UTC)
Does there exist a way to avoid using the egg files? Or somehow removing the dependency checks altogether? Installing from the AUR should ensure that the package has its dependencies met, so I don't think that the checks are needed.
I already tried installing it through pip in a virtualenv, along with Pillow-SIMD, and it ignores the checks and directly works with Pillow-SIMD. So those checks can be skipped IMHO.
pigmonkey commented on 2020-04-29 17:55 (UTC)
I'm not sure how to go about supporting Pillow-SMD in the package.
The PKGBUILD installs the package via setuptools, which results in an egg. The egg includes a requirement of
Pillow>=6.2.0
. You can see this at/usr/lib/python3.8/site-packages/ocrmypdf-9.7.2-py3.8.egg-info/requires.txt
(or a similar path, depending on your version). That results in the error you're seeing.I think this would be avoided if the package were installed via pip, but the wiki discourages that. And I think even then you'd end up just moving the problem to a different level: the
python-reportlab
package is also installed via setuptools so is going to have an egg that looks for that same Pillow package. You'd get the same error, but it would be thrown by reportlab rather than ocrmypdf.rharish commented on 2020-04-29 05:41 (UTC) (edited on 2020-04-29 05:41 (UTC) by rharish)
This does not work with Pillow-SIMD. This is the issue that I created upstream. Here are the logs when I run
ocrmypdf --help
:pigmonkey commented on 2020-04-22 17:58 (UTC)
Thanks. It looks like the confusingly named python-pdfminer package in community does indeed provide the needed python-pdfminder.six library rather than the abandoned python-pdfminer library.
That was the last AUR dependency, so maybe there's a chance of this getting adopted into community now.
petRUShka commented on 2020-04-22 10:04 (UTC) (edited on 2020-04-22 10:07 (UTC) by petRUShka)
Dependency
aur/python-pdfminer.six
possible should be replaced withcommunity/python-pdfminer
.fbrennan commented on 2020-04-10 23:20 (UTC)
No, the computer was broken in transit. I still have it, just due to the pandemic the parts to fix it are arriving slowly. And it's made for 240V, and I now live in a 120V country.
bsdice commented on 2020-04-10 22:41 (UTC)
@fbrennan Does that mean some Philippine police jockey can now upload trojaned PKGBUILDs in your name? If true we should summon the help of an AUR admin to delete your key.
fbrennan commented on 2020-04-10 22:29 (UTC)
You've been added @pigmonkey, thank you.
I had to flee the country I was living in, and my desktop computer got broken. It was my Arch install and had my AUR SSH key. I don't know when I'm going to be able to get back to AUR maintenance.
https://www.vice.com/en_us/article/y3mqzb/the-philippines-wants-to-arrest-8chan-founder-fredrick-brennan-its-basically-a-death-sentence
pigmonkey commented on 2020-04-10 21:40 (UTC)
I'd be happy to help maintain this package, if needed.
rany commented on 2020-04-10 09:00 (UTC)
@AlexParkhomenko also done!
AlexParkhomenko commented on 2020-04-10 08:49 (UTC) (edited on 2020-04-10 08:49 (UTC) by AlexParkhomenko)
conflicts=("ocrmypdf" "python-pdfminer")
rany commented on 2020-04-10 08:40 (UTC)
@pescepalla Done!
pescepalla commented on 2020-04-10 06:31 (UTC)
Please add
conflicts=("ocrmypdf")
to the PKGBUILDrien333 commented on 2020-03-31 12:23 (UTC)
Now two versions out of date, see https://github.com/jbarlow83/OCRmyPDF/releases.
pigmonkey commented on 2020-02-22 01:26 (UTC)
Done. https://github.com/jbarlow83/OCRmyPDF/pull/494
jbarlow commented on 2020-02-22 00:19 (UTC)
Could someone please submit a pull request updating the OCRmyPDF documentation (ocrmypdf.readthedocs.io) with directions for installing this package?
jbarlow commented on 2020-02-10 09:57 (UTC)
@brianmercer ocrmypdf 9.6.0 fixes the pdfminer.six version
brianmercer commented on 2020-02-04 15:24 (UTC) (edited on 2020-02-04 15:29 (UTC) by brianmercer)
ocrmypdf won't work if you updated the aur package of pdfminer.six to version 20200124.
ocrmypdf hasn't been updated for the new version of pdfminer.six.
https://github.com/jbarlow83/OCRmyPDF/blob/fd991a2380f1803924b1b8192e42e67a80998dde/setup.py
'pdfminer.six >= 20181108, <= 20200104',
So we're waiting on an update from ocrmypdf, or if necessary you can install an older version of pdfminer.six.
grunix commented on 2020-02-04 12:17 (UTC)
Please bump the version to 9.5.0.
commented on 2020-01-31 11:48 (UTC)
Please, use stable src url:
<https://files.pythonhosted.org/packages/source/>${_name::1}/$_name/$_name-$pkgver.tar.gz
blabred commented on 2020-01-30 16:35 (UTC)
I get this whenever I'm trying to ocr a document:
Traceback (most recent call last): File "/home/adros/anaconda3/bin/ocrmypdf", line 6, in <module> from pkg_resources import load_entry_point File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 3252, in <module> @_call_aside File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 3236, in _call_aside f(args, *kwargs) File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 3265, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 584, in _build_master ws.require(requires) File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 901, in require needed = self.resolve(parse_requirements(requirements)) File "/home/adros/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 787, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'pikepdf<2,>=1.8.1' distribution was not found and is required by ocrmypdf
brianmercer commented on 2020-01-13 00:01 (UTC)
Version 9.2.0 release notes state that qpdf is no longer required as a dependency.
I built it without qpdf and it seemed to compile and run fine.
sagittarius commented on 2019-12-09 14:30 (UTC) (edited on 2019-12-11 13:47 (UTC) by sagittarius)
Please update to 9.1.1 Just change the source link and the SHA256 and it works as is.
$ ocrmypdf --version
9.1.1
Edit PKGBUILD
pkgver=9.1.1
source=("https://files.pythonhosted.org/packages/af/7f/234e357557233d618c5b40d066389de6203c48a1697653285af541fff582/ocrmypdf-9.1.1.tar.gz")
sha256sums=('656dd9cec46b2c3a8a1a4b98e7bb00dd95adb98229d63d397b422679cfbbb88e')
If necessary, rebuild python-pdfminer.six
brianmercer commented on 2019-12-02 14:58 (UTC)
Current version 9.0.5-1 is broken because of an update of python-pdfminer.six in the aur to version 20191110-1.
Error: "The 'pdfminer.six<=20191020,>=20181108' distribution was not found and is required by ocrmypdf"
Please update, thanks.
fbrennan commented on 2019-11-09 05:20 (UTC) (edited on 2019-11-09 05:20 (UTC) by fbrennan)
Github archives are unusable without hacks for Python AUR packages. That's because they don't include the
.git
directory, required bypython-setuptools
. The PyPI archive must be used.jbarlow commented on 2019-11-09 05:12 (UTC)
@brianmercer CI is set up to build wheels and deploy to PyPI whenever a git tag is pushed, so tags should always be consistent with PyPI. But it's probably cleaner to install the .tar.gz from PyPI than Git - smaller download since it's not pulling the whole history.
brianmercer commented on 2019-11-08 18:23 (UTC)
I'm no expert on PKGBUILD.
Is there any downside to rewriting the PKGBUILD to use git tags instead of versions from pypi? It looks like @jbarlow is pretty diligent with the github tags. https://github.com/jbarlow83/OCRmyPDF/tags
And PKGBUILD supports git tags. https://wiki.archlinux.org/index.php/VCS_package_guidelines#The_pkgver.28.29_function
Does an ocrmypdf-git package need to be by commit or can it be by tag or release? Do the aur managers like yay (which I use) check devel versions by commit or can it check by tags?
pigmonkey commented on 2019-11-06 17:11 (UTC)
I'd be happy to help maintain the package if fbrennan is no longer interested.
rien333 commented on 2019-11-06 09:50 (UTC) (edited on 2019-11-06 09:50 (UTC) by rien333)
This is several versions out of date: https://github.com/jbarlow83/OCRmyPDF/releases. An update would be great, especially because there is a fix for a fatal
ocrmypdf
crash (see https://github.com/jbarlow83/OCRmyPDF/issues/448)jbarlow commented on 2019-11-05 10:31 (UTC)
@brianmercer That is completely correct. ruffus, defusedxml, lxml should all be removed. -Upstream
brianmercer commented on 2019-11-05 02:47 (UTC)
It looks like version 9.0 removed the dependency on ruffus. And version 7.0 removed the dependency on defusedxml. And version 3.0 removed the dependency on lxml.
bsdice commented on 2019-10-25 13:23 (UTC)
@brianmercer et al. Package needs to be updated to 9.0.3 which fixes https://github.com/jbarlow83/OCRmyPDF/commit/17ac9d7a9a296ae3d50146fbefad5281e2851b0f
The backstory is that ghostscript tightened security after taviso took a stab at it security-wise back in summer of 2018. You can fix it yourself in the meantime, by:
1) Downloading raw PKGBUILD file into a temp directory
2) Edit these lines to say
pkgver=9.0.3
source=("https://files.pythonhosted.org/packages/6b/8c/d8a9132e050ac25ea5da63fabc1a1fc0246beee72701b372c35221a40237/ocrmypdf-9.0.3.tar.gz")
sha256sums=('3d9b92f6a01d0711e4156c6b36638d9d946d010e2925ec473ec7f666096cceeb')
3) makepkg -Ccfi
brianmercer commented on 2019-10-21 18:37 (UTC)
I started getting a set of errors with the 9.0.1 version. I edited the PKGBUILD to upgrade to version 9.0.3 of ocrmypdf and they went away.
These are the errors:
ERROR - GPL Ghostscript 9.50: Setting Overprint Mode to 1 not permitted in PDF/A-2, overprint mode not set
Error: /invalidfileaccess in --file-- Operand stack: --nostringval-- --nostringval-- (/usr/lib/python3.7/site-packages/ocrmypdf/data/sRGB.icc) (r) Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:737/1123(ro)(G)-- --dict:1/20(G)-- --dict:76/200(L)-- Current allocation mode is local Last OS error: Permission denied Current file position is 580 GPL Ghostscript 9.50: Unrecoverable error, exit code 1 ERROR - SubprocessOutputError: Ghostscript PDF/A rendering failed
Fifis commented on 2019-07-31 13:03 (UTC)
For the latest ocrmypdf 8.3.2, I had to update pikepdf to 1.5.0.post0. Had a bit of trouble overwriting old package versions, e. g.
sudo pacman -S --overwrite="*" python-ply python-pycparser img2pdf python-cffi python-defusedxml python-lxml python-reportlab
to get ocrmypdf 8.3.2 to work.john-soda commented on 2019-02-10 23:36 (UTC)
@fbrennan I really don't know what the problem is, that it can't reach setuptools_scm_git_archive. I downloaded the package manually and edited the PKGBUILD that it points to my local downloaded version. Now it works! Thanks for your help.
fbrennan commented on 2019-02-09 05:38 (UTC)
@john-soda I can install the latest version just fine for me, it seems to me you have a DNS resolution problem for the pypi domain.
john-soda commented on 2019-02-02 11:04 (UTC)
When I want to install ocrmypdf I get always the Error:
Could not find suitable distribution for Requirement.parse('setuptools_scm_git_archive')
Here the full log https://pastebin.com/xsqzeqr0
How can I install the newest version?
fbrennan commented on 2019-01-17 08:18 (UTC)
My apologies to all stakeholders waiting on me. I came down with a serious illness. Rest assured this is not forgotten or abandoned. I will update it in due time. Thanks
jbarlow commented on 2019-01-12 08:49 (UTC) (edited on 2019-01-12 08:52 (UTC) by jbarlow)
@fbrennan
v8 makes pdfminer.six "technically optional". setup.py still lists it as required, but downstream maintainers at their option may delete pdfminer.six from setup.py in their scripts, at the cost of the --redo-ocr feature. I will support this arrangement until the packaging situation for pdfminer.six improves. (I am doing it this way because "pip install ocrmypdf" works fine with pdfminer.six.)
v8 also drops python-xmp-toolkit because of the difficulties some downstream consumers had with it.
Thanks again for maintaining ocrmypdf for the ArchLinux community.
-Upstream
fbrennan commented on 2018-11-27 06:19 (UTC)
I thought of that @bsdice but it breaks the AUR Rules of Submission. https://wiki.archlinux.org/index.php/Arch_User_Repository#Rules_of_submission
The more and more AUR dependencies that get added the more difficult this gets to maintain and the more people we need to rely on. Fortunately I maintain
python-pikepdf
,python-ruffus
andpython-xmp-toolkit
, the major AUR deps of this package before the recent update. I don't think it's kosher for me to make a metapackage which would install every dependency and have a replaces/conflicts/provides either.For now I recommend users assure that
python-sortedcontainers
is installed before attempting to buildocrmypdf
& deps. I'm sure the maintainer ofpython-pdfminer.six
will add it to the manifest as soon as they can.bsdice commented on 2018-11-27 05:40 (UTC)
@fbrennan: A workaround could have been to create a package called "python-pdfminer-six" and use the following statements:
Unfortunately python-pdfminer.six in AUR is missing a dependancy. Workaround is to still use http://termbin.com/k46k.
Harvey commented on 2018-11-26 15:44 (UTC)
python-pdfminer.six has been updated to version 20181108-1 ;)
fbrennan commented on 2018-11-26 11:50 (UTC) (edited on 2018-11-26 11:52 (UTC) by fbrennan)
Unfortunately my friends, we've hit a snag. Someone else is using the name
python-pdfminer.six
:-(https://aur.archlinux.org/pkgbase/python-pdfminer.six/#news
I put a working PKGBUILD there. But unfortunately I cannot upload my new
ocrmypdf
, which works fine, until this user makes a decision. That is becauseocrmypdf
requires a higher version than theirs.https://github.com/jbarlow83/OCRmyPDF/blob/0f5c484b626632aa68259eda16ff2c1b87a42104/requirements/main.txt#L7
I sincerely apologize for the long wait. If you are good with
makepkg
andpacman
, you can use these two PKGBUILDS:If not, you will just have to wait for
python-pdfminer.six
to be updated, by either ishitatsuyuki or me if he orphans.fbrennan commented on 2018-11-23 23:08 (UTC)
I hope to release the upgrade to 7.3.1 today (GMT+8).
I apologize for the wait after the orphan notification.
Harvey commented on 2018-11-22 11:58 (UTC)
Version 7.3.1 looks very promising, depending on the release notes https://github.com/jbarlow83/OCRmyPDF/blob/master/docs/release_notes.rst Is there any chance for an update? I see there is a new dependency to pdfminer.six 20181108...
marlemion commented on 2018-11-06 09:18 (UTC) (edited on 2018-11-06 09:32 (UTC) by marlemion)
@bsdice: Thanks, but that did not help. Same error. On another machine, ocrmypdf is working. So it must be some issue on that machine...
Btw. ocrmypdf was working for ages on that machine, but I had to hold back leptonica for other reasons, so it was stuck to a certain version for some time....
Found the Problem: I had installed python2-jmespath-0.9.3-2. This package installs /usr/bin/jp2.py. For some reason, python looked at this jp2.py instead of /usr/lib/python3.x/site-packages/jp2.py. After removing python2-jmespath-0.9.3-2, it works. However, such a behaviour is irritating.
bsdice commented on 2018-11-06 09:16 (UTC)
@marlemion: Replace aur/img2pdf-git 0.2.1.r8.geedf73e-1 with normal img2pdf 0.3.1-1 and see what happens. pacman -Rd img2pdf-git ; pacman -S --asdeps img2pdf ; or something like that.
marlemion commented on 2018-11-06 08:57 (UTC)
I would like to update to the most recent version of ocrmypdf. Builds fine, but throws this error:
Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 11, in <module> load_entry_point('ocrmypdf==7.2.1', 'console_scripts', 'ocrmypdf')() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 484, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2725, in load_entry_point return ep.load() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2343, in load return self.resolve() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 2349, in resolve module = import(self.module_name, fromlist=['name'], level=0) File "/usr/lib/python3.7/site-packages/ocrmypdf/main.py", line 36, in <module> from ._pipeline import build_pipeline File "/usr/lib/python3.7/site-packages/ocrmypdf/_pipeline.py", line 26, in <module> import img2pdf File "/usr/lib/python3.7/site-packages/img2pdf.py", line 28, in <module> from jp2 import parsejp2 ImportError: cannot import name 'parsejp2' from 'jp2' (/usr/bin/jp2.py)
img2pdf-git has been rebuilt. No effect.
fbrennan commented on 2018-10-02 03:52 (UTC)
I think lossy mode should still be selectable because it's only dangerous in certain situations and leads to really small files otherwise. It just shouldn't be default.
jbarlow commented on 2018-10-01 18:33 (UTC)
@bsdice: I'm aware of the JBIG2 6/8 issue. However, I never intended to enable lossy mode. I attribute the issue to the help text of jbig2enc misleading. I had to inspect the jbig2enc source to confirm it would indeed select lossy encoding.
In any case it is an easy fix to switch to lossless JBIG2 which still gets better results than CCITT G4 so I will do in the next release. I haven't decided if I will keep lossy mode.
Generally it is ideal to report upstream issues to upstream since users other than ArchLinux are affected. It so happens I subscribe to the AUR comments, but ocrmypdf is deployed in a lot of places I don't follow.
@fbrennan: I recommend just waiting till the next version.
fbrennan commented on 2018-10-01 10:42 (UTC)
Should the PKGBUILD be changed to reflect the possible danger of jbig2enc?
bsdice commented on 2018-09-29 22:03 (UTC)
Here is a cautionary note for people using this AUR for archival purposes:
The default of ocrmypdf is --optimize 1 ("do safe, lossless optimizations"). If you have jbig2enc installed, this means b/w documents will be re-encoded from CCITT G4 to JBIG2 in so-called "symbol mode", see https://github.com/jbarlow83/OCRmyPDF/blob/master/src/ocrmypdf/exec/jbig2enc.py#L42
Unfortunately it has been shown by D. Kriesel that JBIG2 is able to alter the contents of documents, e.g. by changing a "6" into an "8" due to their similarity at low resolution. In the aftermath German BSI (https://www.bsi.bund.de/DE/Publikationen/TechnischeRichtlinien/tr03138/index_htm.html), Swiss KOST (https://kost-ceco.ch/cms/index.php?id=312,569,0,0,1,0), and maybe others have issued statements forbidding JBIG2 altogether for archival purposes of legally relevant documents. Instead it is recommended to keep using lossless CCITT G4 compression.
Users of this package should therefore use this tool with "--optimize 0" (do not optimize) until further notice. Upstream should use jbig2 only at "--optimize 4" ("do dangerous aggressive lossy optimizations"), which does not exist at this point.
The drawback of G4 is of course larger file sizes, but I prefer that over having to doubt every document scanned, whether numbers or letters are really that what was printed in the original document.
sagittarius commented on 2018-09-06 08:52 (UTC) (edited on 2018-09-06 08:54 (UTC) by sagittarius)
@fbrennan No worries. We're sure you're doing your best and I'm very glad of it. And the least I can do is to report some issues as a user. My very little contribution. So thank you fbrennan. BTW, problem solved: v7.04 works great ;-)
fbrennan commented on 2018-09-04 14:33 (UTC)
@sagittarius Sorry, I am doing my best. I am new at this. I updated python-pikepdf -- updating that package should solve your problem. I'll make sure that this never happens again, I forgot how strict it is about package versions.
@jbehmel Your question has already been answered. Github archives are unusable without hacks for AUR packages. That's because they don't include the
.git
directory, required bypython-setuptools
.jbehmel commented on 2018-09-03 13:55 (UTC)
Hey,I've just asked myself why You are not using this link: https://github.com/jbarlow83/OCRmyPDF/archive/v7.0.4.tar.gz
sagittarius commented on 2018-09-01 09:44 (UTC)
For v7.04,
$ ocrmypdf gives:
Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 578, in _build_master ws.require(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 895, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 786, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (pikepdf 0.3.1 (/usr/lib/python3.7/site-packages), Requirement.parse('pikepdf<0.4,>=0.3.2'), {'ocrmypdf'})
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 6, in <module> from pkg_resources import load_entry_point File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3105, in <module> @_call_aside File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3089, in _call_aside f(args, *kwargs) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3118, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 580, in _build_master return cls._build_from_requirements(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 593, in _build_from_requirements dists = ws.resolve(reqs, Environment()) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 781, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'pikepdf<0.4,>=0.3.2' distribution was not found and is required by ocrmypdf</module></module>
Seems there is an issue with pikepdf<0.4,>=0.3.2 for only 0.3.1 is available
jbarlow commented on 2018-08-26 18:53 (UTC)
Most Linux distributions don't have jbig2enc packaged, so jbig2enc is technically optional to ease packaging. But ArchLinux has already jbig2enc, it should be required, because there's no reason to not take it when it's available.
Further details: https://ocrmypdf.readthedocs.io/en/latest/jbig2.html
bsdice commented on 2018-08-25 20:43 (UTC)
Popular package ;-) (also great software, using it alot)
As per comment by jbarlow on 2018-08-13 22:33 "jbig2enc should be added"
Should jbig2enc be added as a dependency or optional dependency or not at all?
fbrennan commented on 2018-08-24 07:15 (UTC)
python-pytz
is a dependency ofpython-xmp-toolkit
which is a dependency ofocrmypdf
.Using an AUR helper, such as
yay
, might help with packages that have many AUR dependencies. Updating one AUR package in isolation will not work.connaisseur commented on 2018-08-24 05:52 (UTC)
Found just out, that an additional dependendcy has rised: python-pytz
Could you please verify / check - and possibly update the PKGBUILD?
bsdice commented on 2018-08-21 15:37 (UTC)
@sleeping Yes, as commented by me at 2018-08-13 04:55
Compare https://github.com/jbarlow83/OCRmyPDF/blob/1e23ea5364f1f39850d72e7a73d233067993dd4a/setup.py#L257
sleeping commented on 2018-08-21 12:38 (UTC)
Missing dependency: python-reportlab
sagittarius commented on 2018-08-18 14:42 (UTC) (edited on 2018-08-18 14:45 (UTC) by sagittarius)
Just for information, I had to rebuild few packages in this order to be able to launch ocrmypdf 7.0.3:
yaourt -S python-ruffus
yaourt -S python-pikepdf
yaourt -S python-xmp-toolkit
yaourt -S img2pdf-git
$ ocrmypdf --version
7.0.3
jbarlow commented on 2018-08-18 06:48 (UTC)
@fbrennan I was using setuptools_scm_git_archive a few years ago, but it was causing some problems (details of which I don't remember), so I removed it. Maybe I should try it again.
Either way, easiest thing to do is wait for PyPI. Normally it's only 20 minutes behind Github.
fbrennan commented on 2018-08-18 06:13 (UTC)
@jbarlow At least with the way we're building it right now, we can't build Github tarballs due to a known issue in setuptools (or in Github?). Apparently setuptools puts its metadata inside the Git repository, and because Github's tarballs don't include a git repository, the package won't build from Github.
Last night I found out that I could patch your setup.py (see https://pypi.org/project/setuptools_scm_git_archive/ for details) and force it to build anyway, but I thought by that point I must be doing something wrong and you never intended for downstream packagers to do something like that, so decided to wait for PyPI.
jbarlow commented on 2018-08-17 18:49 (UTC)
Aww, thanks everyone. :)
v7.0.3 is PyPI as of a few days. Normally Github and PyPI are nearly in lockstep, but Travis was having network problems last weekend and failed to deploy v7.0.3 to PyPI (which it does for me). PyPI releases are just distributions of the tagged releases on Github. It's a little better to use PyPI's sdist since it is smaller than a Github checkout.
bsdice commented on 2018-08-17 16:44 (UTC)
Check https://pypi.org/project/ocrmypdf/#files
fbrennan commented on 2018-08-17 16:36 (UTC)
Thank you everyone. Maintaining the package is the least I could do because I use ocrympdf a lot, and I found the developer extremely cordial and helpful when I had a problem with it while OCR'ing an Esperanto PDF.
Regarding version 7.0.3, someone flagged the package over this, that version is not yet on PyPI. As far as I know, what's on Github is development, while what's on PyPI is stable. So I'm assuming 7.0.3 is beta since it's not yet on PyPI. As soon as it is on PyPI I will update the PKGBUILD.
If my understanding of this is wrong, feel free to enlighten me.
sagittarius commented on 2018-08-17 10:24 (UTC) (edited on 2018-08-17 14:33 (UTC) by sagittarius)
Thanks to the maintainers and jbarlow for this utility is clearly ULTIMATE (necessary, indispensable, decisive for manipulating PDF files).
I've used the git version of img2pdf, rebuild some AUR packages (python-pikepdf, pybind11, pngquant...) and it works great :D
jbarlow commented on 2018-08-13 20:33 (UTC)
I'm the author of ocrmypdf and pikepdf - great to see the community here working away on the update. Several changes here are due deprecated features being removed in Python 3.7.
As of pikepdf 0.3.1, just released today, pybind11.patch will be unnecessary.
A few comments on dependencies compared to https://pastebin.com/84Tb6K6S: - jbig2enc should be added - leptonica should be added explicitly (>= 1.76.0, implied by tesseract) - qpdf should be added explicitly (>= 8.1.0, implied by pikepdf)
bsdice commented on 2018-08-13 02:55 (UTC) (edited on 2018-08-13 02:58 (UTC) by bsdice)
@fbrennan Thanks for adopting it! Glad I could help out the community.
Here are two fixes that escaped my attention:
(1) PKGBUILD of ocrmypdf is missing one depends=( ... 'python-reportlab>=3.3.0' ... )
(2) PKGBUILD of python-xmp-toolkit similarly is missing one depends=(... 'python-pytz')
Everything should be checked with namcap -i <pkgbuild|final .xz=""> anyhow.</pkgbuild|final>
May I also suggest to you to ask the pikepdf guy on Github why pybind11.patch is needed and also if that is the correct fix.
fbrennan commented on 2018-08-13 02:22 (UTC) (edited on 2018-08-13 07:20 (UTC) by fbrennan)
Thank you for the guide @bsdice ...
I adopted the package and will push a revised package for 7.0.2. (Unfortunately, have to wait for the
python-ruffus
package to either be disowned or updated. Will update as soon as that's done.)mutantmonkey commented on 2018-08-12 19:34 (UTC)
Unfortunately, I haven't had much time to maintain this package as of late. I'm orphaning it so that someone with more time can take over.
bsdice commented on 2018-08-07 23:42 (UTC)
Finally, another new package called "python-xmp-toolkit" is needed, PKGBUILD: https://pastebin.com/xcngPwUq
I have based the PKGBUILD on the git-package: https://aur.archlinux.org/packages/python-xmp-toolkit-git/
In the end, the software will work again:
$ ocrmypdf --version
7.0.2
bsdice commented on 2018-08-07 23:36 (UTC) (edited on 2018-08-07 23:45 (UTC) by bsdice)
Next, update package "python-ruffus" https://aur.archlinux.org/pkgbase/python-ruffus/ and install the updated package python-ruffus-2.7.0-1-any.pkg.tar.xz (you can imho skip the python2 package).
PKGBUILD diff: https://pastebin.com/z9Zs1wZ7
bsdice commented on 2018-08-07 23:32 (UTC)
Next, you need a new package called "python-pikepdf". PKGBUILD: https://pastebin.com/hYRVaiqT
Patch file "pybind11.patch": https://pastebin.com/7BqByXUa
This package does not yet exist in Arch, feel free to create it.
The patch will throw out install_requires and introduce pybind11 as a setup_requires item. Otherwise, this library somehow can't find pybind11 and will quit with a backtrace.
bsdice commented on 2018-08-07 23:27 (UTC)
Then, rebuild this package https://aur.archlinux.org/packages/pybind11 for Python 3.7. The fix got in just a couple of hours ago.
pacaur -S pybind11 or what else tool for AUR you prefer. If nothing, download PKGBUILD and rebuild/install manually.
bsdice commented on 2018-08-07 23:24 (UTC)
Next, you will need the python package "img2pdf" as a dependency.
PKGBUILD: https://pastebin.com/QwptWeRZ
Create and install as usual with makepkg -Ccfi
bsdice commented on 2018-08-07 23:21 (UTC) (edited on 2018-08-07 23:22 (UTC) by bsdice)
Here is the diff for PKGBUILD: https://pastebin.com/84Tb6K6S
bsdice commented on 2018-08-07 23:18 (UTC)
This package is broken, as of today 2018/08/08. I am going to post here some information on how to build the latest 7.0.2. It will be somewhat difficult.
Here is the error message:
$ ocrmypdf Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 570, in _build_master ws.require(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 888, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 779, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (ruffus 2.7.0 (/usr/lib/python3.7/site-packages), Requirement.parse('ruffus==2.6.3'), {'ocrmypdf'})
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 6, in <module> from pkg_resources import load_entry_point File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3095, in <module> @_call_aside File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3079, in _call_aside f(args, *kwargs) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3108, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 572, in _build_master return cls._build_from_requirements(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 585, in _build_from_requirements dists = ws.resolve(reqs, Environment()) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 774, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'ruffus==2.6.3' distribution was not found and is required by ocrmypdf</module></module>
sagittarius commented on 2018-08-05 09:31 (UTC)
It would be great to have v7.0.2
vfrico commented on 2018-05-04 16:25 (UTC) (edited on 2018-05-15 17:54 (UTC) by vfrico)
This PKGBUILD compiles today:
https://0bin.net/paste/WhJ6LfPKH3N0YDpJ#cxC8Le-REOl/guW9WQIxb9/+daxT0a5ufp9o9Z4fXLJ
Also available as a gist:
https://gist.github.com/vfrico/9709e013f01c9a8bc384e3ea85f66c3f
john-soda commented on 2018-04-11 16:43 (UTC)
For everybody, if you want to install the newest ocrmypdf change pkgbuild file to this.
Maintainer: mutantmonkey aur@mutantmonkey.in
Contributor: Daniel Reuter daniel.robin.reuter@googlemail.com
pkgname=ocrmypdf pkgver=6.1.3 pkgrel=1 pkgdesc="A tool to add an OCR text layer to scanned PDF files, allowing them to be searched" url="https://github.com/jbarlow83/OCRmyPDF" arch=('any') license=('custom') depends=('python>=3.5' 'python-cffi>=1.9.1' 'python-pillow>=4.0.0' 'python-pypdf2>=1.26' 'python-reportlab>=3.3.0' 'python-ruffus>=2.6.3' 'ghostscript>=9.15' 'qpdf>=7.0.0' 'tesseract>=3.04' 'unpaper>=6.1' 'img2pdf>=0.2.3' 'python-setuptools_scm' 'python-defusedxml>=0.5.0' 'python-pytest-runner>=3.5.0') makedepends=('python-setuptools') source=("https://pypi.python.org/packages/8c/6b/fbd6d134ffa0acd14ba0323d8e4acd739c27f6b1296c5983dfbe86fe821c/ocrmypdf-${pkgver}.tar.gz") sha256sums=('9320a3913df54d94fce8db4b1ece32e557e313dc0f1a423ab4c533f49771e6c5')
package() { cd "${srcdir}/${pkgname}-${pkgver}" python setup.py install --root="$pkgdir/" --optimize=1 install -Dm644 LICENSE $pkgdir/usr/share/licenses/$pkgname/LICENSE }
john-soda commented on 2018-04-10 13:11 (UTC) (edited on 2018-04-10 13:19 (UTC) by john-soda)
I was able to install ocrmypdf 5.4.x. But when I try to update or uninstall the old version and install the new one I always get following error message:
File "/usr/lib/python3.6/site-packages/setuptools/command/easy_install.py", line 667, in easy_install raise DistutilsError(msg) distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pytest-runner')
When I install the newest ocrmypdf via git and pip,
pip3 install git+https://github.com/jbarlow83/OCRmyPDF.git
it works.
john-soda commented on 2017-11-18 11:16 (UTC)
jbarlow commented on 2017-11-09 08:15 (UTC)
mutantmonkey commented on 2017-08-12 18:20 (UTC)
rabarrett commented on 2017-08-12 17:20 (UTC)
rabarrett commented on 2017-08-08 18:46 (UTC)
rabarrett commented on 2017-08-08 18:29 (UTC)
sagittarius commented on 2017-07-26 09:46 (UTC)
mutantmonkey commented on 2017-07-24 02:39 (UTC)
sagittarius commented on 2017-01-23 19:11 (UTC) (edited on 2017-01-23 19:12 (UTC) by sagittarius)
hason commented on 2016-10-21 08:27 (UTC) (edited on 2016-10-21 08:28 (UTC) by hason)
mutantmonkey commented on 2016-02-23 05:02 (UTC)
martimcfly commented on 2016-01-08 12:44 (UTC)
mutantmonkey commented on 2016-01-08 02:57 (UTC)
martimcfly commented on 2016-01-04 00:15 (UTC)
OlafLostViking commented on 2015-10-22 14:42 (UTC)
Falkenber9 commented on 2015-10-10 09:13 (UTC)
sagittarius commented on 2015-08-12 10:51 (UTC)
Falkenber9 commented on 2015-07-27 13:45 (UTC)
allspark commented on 2015-02-08 13:20 (UTC)
dbrgn commented on 2014-10-01 05:36 (UTC)
dbrgn commented on 2014-09-23 12:59 (UTC)
sagittarius commented on 2014-09-17 23:43 (UTC)
dreuter commented on 2014-09-16 17:31 (UTC)
Chais commented on 2014-09-16 12:56 (UTC)
dreuter commented on 2014-03-09 17:44 (UTC)
p3t3r commented on 2014-03-05 17:39 (UTC)