AUR (en) - ocrmypdf

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Package Details: ocrmypdf 16.10.1-1

Package Actions

Git Clone URL:	https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy)
Package Base:	ocrmypdf
Description:	A tool to add an OCR text layer to scanned PDF files, allowing them to be searched
Upstream URL:	https://github.com/ocrmypdf/OCRmyPDF
Licenses:	MPL2
Submitter:	dreuter
Maintainer:	fbrennan (pigmonkey)
Last Packager:	pigmonkey
Votes:	129
Popularity:	1.98
First Submitted:	2014-01-27 11:36 (UTC)
Last Updated:	2025-04-25 03:37 (UTC)

Dependencies (21)

ghostscript
img2pdf (img2pdf-git^AUR)
pngquant
python (python37^AUR, python311^AUR, python310^AUR)
python-deprecation
python-importlib_resources
python-packaging
python-pdfminer
python-pikepdf
python-pillow (python-pillow-simd-git^AUR)
python-pluggy
python-reportlab
python-rich
python-tqdm
tesseract (tesseract-git^AUR)
unpaper (unpaper-git^AUR)
python-build (make)
python-hatch-vcs (make)
python-installer (make)
python-wheel (make)
Show 1 more dependencies...

Required by (5)

Sources (1)

https://files.pythonhosted.org/packages/source/o/ocrmypdf/ocrmypdf-16.10.1.tar.gz

Pinned Comments

fbrennan commented on 2023-05-12 22:54 (UTC)

The flag was invalid and has been removed with no action taken as no new version was released. There's nothing to do for this package; no new release has been made. Rebuild, as @eclairevoyant has said.

Latest Comments

« First ‹ Previous 1 .. 4 5 6 7 8 9 10 11 12 13 14 .. 22 Next › Last »

pigmonkey commented on 2021-02-04 21:15 (UTC)

I can't recreate that error. This package has a dependency for python-coloredlogs, which in turn is dependent on python-humanfriendly. Whatever AUR helper you're using should pick all that up.

Perhaps you are running an old version of python-humanfriendly. It looks like that AUR package was updated to 1.9 on 2020-12-10.

aslakstubsgaard commented on 2021-02-04 16:51 (UTC) (edited on 2021-02-04 16:52 (UTC) by aslakstubsgaard)

did a fresh build but getting the error:

pkg_resources.DistributionNotFound: The 'humanfriendly>=9.1' distribution was not found and is required by coloredlogs

ginkel commented on 2020-10-26 10:56 (UTC)

ocrmypdf currently fails to work with the recently updated python-pdfminer package. Downgrading the package to python-pdfminer-20200726-1 works around the issue for now.

pkg_resources.DistributionNotFound: The 'pdfminer.six!=20200720,<=20200726,>=20191110' distribution was not found and is required by ocrmypdf

pigmonkey commented on 2020-10-19 12:42 (UTC)

I still use the package, so I'm happy to continue updating or to step back. No preference.

fbrennan commented on 2020-10-18 23:02 (UTC)

Hello all.

I'm back to using Arch if pigmonkey no longer wants to maintain this package. :-)

But I think they've done a good job so can also just give them the package. I can also just do nothing, but since I'm back in that situation it can be confusing who is responsible to push the update.

Which would you prefer?

pigmonkey commented on 2020-10-14 22:36 (UTC)

tesseract-data-osd is included with the standard tesseract Arch package.

Looking at the "Required By" section of the tesseract-data-eng package, it does not appear that it is common for other Arch packages to list it as a dependency.

If this is confusing for users, I think it would be acceptable to add it as an optional dependency, so that there is an indication at the end of the install that another package might be needed. But it may be weird for non-English speakers if the package has an optional dependency on the English language pack, but not whatever data pack is needed for the user's native language. I don't really want a 106 item optdepends array for every possible language pack.

jbarlow commented on 2020-10-14 07:07 (UTC)

OCRmyPDF assumes English unless a language is specified with -l fra for example. So strictly speaking it works, but you have to issue the option every time. The test suite also assumes English is installed. I believe most package managers have added an explicit dependency on tesseract-data-eng or whatever it's called in the system, but some have not.

I did poll users whether to default to the system language based on locale, but surprisingly non-English users didn't like the idea.

OCRmyPDF does assume tesseract-data-osd is installed so that should be a dependency if Arch breaks that out as a separate package.

pigmonkey commented on 2020-10-13 16:51 (UTC)

Tesseract does require a data package to be installed, but it does not have to be English. If a language is not specified, Tesseract does assume English, hence the error.

I don't think it's appropriate to include tesseract-data-eng as a dependency since that might not be the user's language.

ioan commented on 2020-10-13 13:45 (UTC)

crmypdf test.pdf test2.pdf Tesseract failed to report available languages. Output from Tesseract:

Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! List of available languages (1): osd

looks like it needs eng data by default