Package Details: ocrmypdf 16.2.0-1

Git Clone URL: https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy)
Package Base: ocrmypdf
Description: A tool to add an OCR text layer to scanned PDF files, allowing them to be searched
Upstream URL: https://github.com/ocrmypdf/OCRmyPDF
Licenses: MPL2
Submitter: dreuter
Maintainer: fbrennan (pigmonkey)
Last Packager: pigmonkey
Votes: 110
Popularity: 1.54
First Submitted: 2014-01-27 11:36 (UTC)
Last Updated: 2024-04-19 19:30 (UTC)

Pinned Comments

fbrennan commented on 2023-05-12 22:54 (UTC)

The flag was invalid and has been removed with no action taken as no new version was released. There's nothing to do for this package; no new release has been made. Rebuild, as @eclairevoyant has said.

Latest Comments

« First ‹ Previous 1 .. 11 12 13 14 15 16 17 18 19 20 21 22 Next › Last »

jbarlow commented on 2018-10-01 18:33 (UTC)

@bsdice: I'm aware of the JBIG2 6/8 issue. However, I never intended to enable lossy mode. I attribute the issue to the help text of jbig2enc misleading. I had to inspect the jbig2enc source to confirm it would indeed select lossy encoding.

In any case it is an easy fix to switch to lossless JBIG2 which still gets better results than CCITT G4 so I will do in the next release. I haven't decided if I will keep lossy mode.

Generally it is ideal to report upstream issues to upstream since users other than ArchLinux are affected. It so happens I subscribe to the AUR comments, but ocrmypdf is deployed in a lot of places I don't follow.

@fbrennan: I recommend just waiting till the next version.

fbrennan commented on 2018-10-01 10:42 (UTC)

Should the PKGBUILD be changed to reflect the possible danger of jbig2enc?

bsdice commented on 2018-09-29 22:03 (UTC)

Here is a cautionary note for people using this AUR for archival purposes:

The default of ocrmypdf is --optimize 1 ("do safe, lossless optimizations"). If you have jbig2enc installed, this means b/w documents will be re-encoded from CCITT G4 to JBIG2 in so-called "symbol mode", see https://github.com/jbarlow83/OCRmyPDF/blob/master/src/ocrmypdf/exec/jbig2enc.py#L42

Unfortunately it has been shown by D. Kriesel that JBIG2 is able to alter the contents of documents, e.g. by changing a "6" into an "8" due to their similarity at low resolution. In the aftermath German BSI (https://www.bsi.bund.de/DE/Publikationen/TechnischeRichtlinien/tr03138/index_htm.html), Swiss KOST (https://kost-ceco.ch/cms/index.php?id=312,569,0,0,1,0), and maybe others have issued statements forbidding JBIG2 altogether for archival purposes of legally relevant documents. Instead it is recommended to keep using lossless CCITT G4 compression.

Users of this package should therefore use this tool with "--optimize 0" (do not optimize) until further notice. Upstream should use jbig2 only at "--optimize 4" ("do dangerous aggressive lossy optimizations"), which does not exist at this point.

The drawback of G4 is of course larger file sizes, but I prefer that over having to doubt every document scanned, whether numbers or letters are really that what was printed in the original document.

sagittarius commented on 2018-09-06 08:52 (UTC) (edited on 2018-09-06 08:54 (UTC) by sagittarius)

@fbrennan No worries. We're sure you're doing your best and I'm very glad of it. And the least I can do is to report some issues as a user. My very little contribution. So thank you fbrennan. BTW, problem solved: v7.04 works great ;-)

fbrennan commented on 2018-09-04 14:33 (UTC)

@sagittarius Sorry, I am doing my best. I am new at this. I updated python-pikepdf -- updating that package should solve your problem. I'll make sure that this never happens again, I forgot how strict it is about package versions.

@jbehmel Your question has already been answered. Github archives are unusable without hacks for AUR packages. That's because they don't include the .git directory, required by python-setuptools.

jbehmel commented on 2018-09-03 13:55 (UTC)

Hey,I've just asked myself why You are not using this link: https://github.com/jbarlow83/OCRmyPDF/archive/v7.0.4.tar.gz

sagittarius commented on 2018-09-01 09:44 (UTC)

For v7.04,

$ ocrmypdf gives:

Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 578, in _build_master ws.require(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 895, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 786, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (pikepdf 0.3.1 (/usr/lib/python3.7/site-packages), Requirement.parse('pikepdf<0.4,>=0.3.2'), {'ocrmypdf'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 6, in <module> from pkg_resources import load_entry_point File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3105, in <module> @_call_aside File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3089, in _call_aside f(args, *kwargs) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 3118, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 580, in _build_master return cls._build_from_requirements(requires) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 593, in _build_from_requirements dists = ws.resolve(reqs, Environment()) File "/usr/lib/python3.7/site-packages/pkg_resources/init.py", line 781, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'pikepdf<0.4,>=0.3.2' distribution was not found and is required by ocrmypdf</module></module>

Seems there is an issue with pikepdf<0.4,>=0.3.2 for only 0.3.1 is available

jbarlow commented on 2018-08-26 18:53 (UTC)

Most Linux distributions don't have jbig2enc packaged, so jbig2enc is technically optional to ease packaging. But ArchLinux has already jbig2enc, it should be required, because there's no reason to not take it when it's available.

Further details: https://ocrmypdf.readthedocs.io/en/latest/jbig2.html

bsdice commented on 2018-08-25 20:43 (UTC)

Popular package ;-) (also great software, using it alot)

As per comment by jbarlow on 2018-08-13 22:33 "jbig2enc should be added"

Should jbig2enc be added as a dependency or optional dependency or not at all?

fbrennan commented on 2018-08-24 07:15 (UTC)

python-pytz is a dependency of python-xmp-toolkit which is a dependency of ocrmypdf.

Using an AUR helper, such as yay, might help with packages that have many AUR dependencies. Updating one AUR package in isolation will not work.