Package Details: ocrmypdf 16.2.0-1

Git Clone URL: https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy)
Package Base: ocrmypdf
Description: A tool to add an OCR text layer to scanned PDF files, allowing them to be searched
Upstream URL: https://github.com/ocrmypdf/OCRmyPDF
Licenses: MPL2
Submitter: dreuter
Maintainer: fbrennan (pigmonkey)
Last Packager: pigmonkey
Votes: 110
Popularity: 2.08
First Submitted: 2014-01-27 11:36 (UTC)
Last Updated: 2024-04-19 19:30 (UTC)

Pinned Comments

fbrennan commented on 2023-05-12 22:54 (UTC)

The flag was invalid and has been removed with no action taken as no new version was released. There's nothing to do for this package; no new release has been made. Rebuild, as @eclairevoyant has said.

Latest Comments

« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 .. 22 Next › Last »

allexj commented on 2022-05-08 07:28 (UTC) (edited on 2022-05-08 07:33 (UTC) by allexj)

pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20220319,>=20191110' distribution was not found and is required by ocrmypdf free(): invalid pointer Aborted (core dumped)

Even if I add the line "sed -i "s|20220319|20220506|g" setup.cfg" before setup.py

drik commented on 2022-05-08 00:32 (UTC)

The line is now: sed -i "s|20220319|20220506|g" setup.cfg

NickJolly commented on 2022-04-04 08:24 (UTC)

@frankspace: Yes, it did help. Thank you once again for sharing. It was too logical and clear not to work. I just successfully retried and compile it manually after implementing the fix in the PKGBUILD. There might have been a typo on the first try. Sorry for taking your time away, but at least it helped me starting learning about how to fix this kind of annoyances by myself. Have a nice day and stay safe.

Ps: still out of date though, thence a workaround by the end user is still needed, unfortunately. At least it has not yet crashed on me, under heavy usage.

frankspace commented on 2022-04-01 14:56 (UTC) (edited on 2022-04-01 14:59 (UTC) by frankspace)

@NickJolly: Sorry about that.

The purpose of the fix was to implement the upstream commit that fixed pdfminer compatibility: https://github.com/ocrmypdf/OCRmyPDF/commit/04996caac34a418cf233c0f3c8ac436b6f2b5920

I unfortunately don't have any idea how to do that with a python package by way of stuff like a git patch or whatever, but the only functional part of that commit is very simple: changing a version number in setup.cfg. Although sed's syntax occasionally ranges from opaque to outright insane, that's a pretty simple fix, because no special characters are involved and it's a unique number that occurs only once in a single file.

For context, here is my entire (as amended) package() section:

package () {
  cd "${srcdir}/${pkgname}-${pkgver}"
#until they push a new version, needed to work with current pdfminer
  sed -i "s|20211012|20220319|g" setup.cfg
  python setup.py install --root="$pkgdir/" --optimize=1
  install -Dm644 LICENSE $pkgdir/usr/share/licenses/$pkgname/LICENSE.rst
}

I just double-checked that it does compile for me, and work afterwards, in a clean chroot. I should point out that I only use AUR helpers to check for packages that need updating, I always compile stuff with makepkg. Also, I use Artix, but that really shouldn't make a difference.

Does that help?

EDIT: I see upstream is claiming their test suite fails here: https://github.com/ocrmypdf/OCRmyPDF/issues/937#issuecomment-1082721212 -- so it's possible this fix works for my (rather simple) use-cases but won't work for everyone. That, I wouldn't have a clue about.

NickJolly commented on 2022-04-01 14:20 (UTC)

Hi there @frankspace. The fix you kindly shared did not work for me. Would you mind elaborating on it? There must be something I am missing. Thank you

pigmonkey commented on 2022-03-30 03:07 (UTC)

This pkgbuild tracks the upstream package from PyPi, so it will not update to 13.4.2 until upstream pushes the new release there.

https://github.com/ocrmypdf/OCRmyPDF/issues/937

frankspace commented on 2022-03-24 06:39 (UTC) (edited on 2022-03-24 06:39 (UTC) by frankspace)

The latest update to python-pdfminer breaks ocrmypdf. Until upstream puts out a new version, the fix is pretty simple: just add a line with sed -i "s|20211012|20220319|g" setup.cfg to the package() section before the line with setup.py.

malacology commented on 2022-02-13 12:10 (UTC)

@allexj, you need to install python-setuptools to solve it, img2pdf already reply on this package, so I am a little worried about your dependcies

allexj commented on 2022-02-12 10:36 (UTC)

$ ocrmypdf /usr/lib/python3.10/site-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 2.0.5-build-libtorrent-rasterbar-src-libtorrent-rasterbar-2.0.5-bindings-python is an invalid version and will not be supported in a future release warnings.warn( Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 33, in <module> sys.exit(load_entry_point('ocrmypdf==13.3.0', 'console_scripts', 'ocrmypdf')()) File "/usr/lib/python3.10/site-packages/ocrmypdf/main.py", line 35, in run _parser, options, plugin_manager = get_parser_options_plugins(args=args) File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 116, in get_parser_options_plugins plugin_manager = get_plugin_manager(pre_options.plugins) File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 104, in get_plugin_manager pm = OcrmypdfPluginManager( File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 45, in init self.setup_plugins() File "/usr/lib/python3.10/site-packages/ocrmypdf/_plugin_manager.py", line 73, in setup_plugins module = importlib.import_module(name) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/lib/python3.10/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 11, in <module> from ocrmypdf._exec import ghostscript File "/usr/lib/python3.10/site-packages/ocrmypdf/_exec/ghostscript.py", line 21, in <module> from PIL import Image, UnidentifiedImageError ImportError: cannot import name 'UnidentifiedImageError' from 'PIL' (/home/allexj/.local/lib/python3.10/site-packages/PIL/init.py)

hirunatan commented on 2022-01-26 15:57 (UTC)

Perhaps it will be good to notify the user, after installing, that they need to install the tesseract-data language packages, to use it.

https://ocrmypdf.readthedocs.io/en/latest/installation.html#arch-linux-aur