Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 891833 - media-libs/leptonica-1.83.0 breaks app-text/tesseract, which breaks app-text/pdfsandwich
Summary: media-libs/leptonica-1.83.0 breaks app-text/tesseract, which breaks app-text/...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: James Le Cuirot
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-23 13:48 UTC by Stefan Schmiedl
Modified: 2023-01-27 23:20 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
proposed patch (i659.patch,328 bytes, patch)
2023-01-24 08:14 UTC, Stefan Schmiedl
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Schmiedl 2023-01-23 13:48:16 UTC
After `emerge @world -DuU` during the weekend, tesseract stopped processing PDF files from scanner.

Reproducible: Always

Steps to Reproduce:
1. emerge leptonica-1.83.0
2. run pdfsandwich --verbose
3. emerge leptonica-1.82.0-r1
4. run pdfsandwich --verbose
Actual Results:  
Output from step 2 where leptonica 1.83.0 fails:

# pdfsandwich -lang deu -gray -verbose -o 'test.pdf' 20230116_095121_3.pdf
pdfsandwich version 0.1.7
Checking for convert:
convert -version
Version: ImageMagick 7.1.0-48 Q16 x86_64 20449 https://imagemagick.org
Copyright: (C) 1999 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP(4.5)
Delegates (built-in): bzlib cairo fftw fontconfig freetype gslib gvc jng jp2 jpeg ltdl lzma png ps rsvg tiff wmf x xml zip zlib
Compiler: gcc (12.2)
Checking for unpaper:
unpaper -V
7.0.0
Checking for tesseract:
tesseract -v
tesseract 5.3.0
 leptonica-1.83.0
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.3) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libopenjp2 2.5.0
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.6.1 zlib/1.2.13 liblzma/5.2.9 bz2lib/1.0.8
 Found libcurl/7.87.0 OpenSSL/1.1.1s zlib/1.2.13 libidn2/2.3.4 nghttp2/1.51.0
Checking for gs:
gs -v
GPL Ghostscript 10.00.0 (2022-09-21)
Copyright (C) 2022 Artifex Software, Inc.  All rights reserved.
Checking for pdfinfo:
pdfinfo -v
pdfinfo version 23.01.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Checking for pdfunite:
pdfunite -v
pdfunite version 23.01.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Input file: "20230116_095121_3.pdf"
Output file: "test.pdf"
Number of pages in inputfile: 1
More threads than pages. Using 1 threads instead.
Processing page 1.
identify -format "%w\n%h\n"  "/tmp/pdfsandwich_tmp7852e4/pdfsandwich_inputfileb08248.pdf[0]"
convert -units PixelsPerInch  -colorspace gray -depth 8 -background white -flatten -alpha Off -density 300x300  "/tmp/pdfsandwich_tmp7852e4/pdfsandwich_inputfileb08248.pdf[0]" /tmp/pdfsandwich_tmp7852e4/pdfsandwich214f21.pgm
unpaper --overwrite  --no-grayfilter --layout none /tmp/pdfsandwich_tmp7852e4/pdfsandwich214f21.pgm /tmp/pdfsandwich_tmp7852e4/pdfsandwich20e86d_unpaper.pgm
Processing sheet #1: /tmp/pdfsandwich_tmp7852e4/pdfsandwich214f21.pgm -> /tmp/pdfsandwich_tmp7852e4/pdfsandwich20e86d_unpaper.pgm
[pgm_pipe @ 0x55b31216f9c0] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55b31216f9c0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55b31216f9c0] Encoder did not produce proper pts, making some up.
convert -units PixelsPerInch -density 300x300 /tmp/pdfsandwich_tmp7852e4/pdfsandwich20e86d_unpaper.pgm /tmp/pdfsandwich_tmp7852e4/pdfsandwich2d5f64.tif
OMP_THREAD_LIMIT=1 tesseract /tmp/pdfsandwich_tmp7852e4/pdfsandwich2d5f64.tif /tmp/pdfsandwich_tmp7852e4/pdfsandwich60ad08  -l deu pdf
Error in l_generateCIDataForPdf: cid not made from file
Error during processing.
ERROR: Command "OMP_THREAD_LIMIT=1 tesseract /tmp/pdfsandwich_tmp7852e4/pdfsandwich2d5f64.tif /tmp/pdfsandwich_tmp7852e4/pdfsandwich60ad08  -l deu pdf " failed.
Terminating pdfsandwich. All temporary files are kept.


Expected Results:  
Output from step 4 where leptonica-1.82.0 succeeds:
# pdfsandwich -lang deu -gray -verbose -o 'test.pdf' 20230116_095121_3.pdf
pdfsandwich version 0.1.7
Checking for convert:
convert -version
Version: ImageMagick 7.1.0-48 Q16 x86_64 20449 https://imagemagick.org
Copyright: (C) 1999 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP(4.5)
Delegates (built-in): bzlib cairo fftw fontconfig freetype gslib gvc jng jp2 jpeg ltdl lzma png ps rsvg tiff wmf x xml zip zlib
Compiler: gcc (12.2)
Checking for unpaper:
unpaper -V
7.0.0
Checking for tesseract:
tesseract -v
tesseract 5.3.0
 leptonica-1.82.0
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.3) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libopenjp2 2.5.0
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.6.1 zlib/1.2.13 liblzma/5.2.9 bz2lib/1.0.8
 Found libcurl/7.87.0 OpenSSL/1.1.1s zlib/1.2.13 libidn2/2.3.4 nghttp2/1.51.0
Checking for gs:
gs -v
GPL Ghostscript 10.00.0 (2022-09-21)
Copyright (C) 2022 Artifex Software, Inc.  All rights reserved.
Checking for pdfinfo:
pdfinfo -v
pdfinfo version 23.01.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Checking for pdfunite:
pdfunite -v
pdfunite version 23.01.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Input file: "/var/schnupp/shares/lieferscheine/Eingangslieferscheine_neu/20230116_095121_3.pdf"
Output file: "test.pdf"
Number of pages in inputfile: 1
More threads than pages. Using 1 threads instead.
Processing page 1.
identify -format "%w\n%h\n"  "/tmp/pdfsandwich_tmp93c02c/pdfsandwich_inputfile09c03e.pdf[0]"
convert -units PixelsPerInch  -colorspace gray -depth 8 -background white -flatten -alpha Off -density 300x300  "/tmp/pdfsandwich_tmp93c02c/pdfsandwich_inputfile09c03e.pdf[0]" /tmp/pdfsandwich_tmp93c02c/pdfsandwich09012e.pgm
unpaper --overwrite  --no-grayfilter --layout none /tmp/pdfsandwich_tmp93c02c/pdfsandwich09012e.pgm /tmp/pdfsandwich_tmp93c02c/pdfsandwich29d08d_unpaper.pgm
Processing sheet #1: /tmp/pdfsandwich_tmp93c02c/pdfsandwich09012e.pgm -> /tmp/pdfsandwich_tmp93c02c/pdfsandwich29d08d_unpaper.pgm
[pgm_pipe @ 0x562b4dcf59c0] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x562b4dcf59c0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x562b4dcf59c0] Encoder did not produce proper pts, making some up.
convert -units PixelsPerInch -density 300x300 /tmp/pdfsandwich_tmp93c02c/pdfsandwich29d08d_unpaper.pgm /tmp/pdfsandwich_tmp93c02c/pdfsandwich038143.tif
OMP_THREAD_LIMIT=1 tesseract /tmp/pdfsandwich_tmp93c02c/pdfsandwich038143.tif /tmp/pdfsandwich_tmp93c02c/pdfsandwich91968e  -l deu pdf
OCR pdf generated. Renaming output file to /tmp/pdfsandwich_tmp93c02c/pdfsandwich42b301.pdf

OCR done. Writing "test.pdf"
mv "/tmp/pdfsandwich_tmp93c02c/pdfsandwich42b301.pdf" "test.pdf"

test.pdf generated.

Done.


# emerge --info
Portage 3.0.44 (python 3.9.16-final-0, !../../usr/portage/profiles/default/linux/amd64/17.1, gcc-12, glibc-2.36-r6, 5.15.80-gentoo-x86_64 x86_64)
=================================================================
System uname: Linux-5.15.80-gentoo-x86_64-x86_64-Intel-R-_Xeon-R-_Gold_5217_CPU_@_3.00GHz-with-glibc2.36
KiB Mem:     8157972 total,   2801880 free
KiB Swap:   15624188 total,  15611900 free
Timestamp of repository gentoo: Mon, 23 Jan 2023 03:31:59 +0000
Head commit of repository gentoo: 609edc8785aab21b88bd51eb48dd978a1e4ab1f2

sh bash 5.2_p15-r1
ld GNU ld (Gentoo 2.40 p1) 2.40
app-misc/pax-utils:        1.3.6-r1::gentoo
app-shells/bash:           5.2_p15-r1::gentoo
dev-lang/perl:             5.36.0-r2::gentoo
dev-lang/python:           3.9.16::gentoo, 3.11.1::gentoo
dev-lang/rust-bin:         1.66.1::gentoo
dev-util/cmake:            3.25.2::gentoo
dev-util/meson:            1.0.0::gentoo
sys-apps/baselayout:       2.9::gentoo
sys-apps/openrc:           0.46::gentoo
sys-apps/sandbox:          2.30-r1::gentoo
sys-devel/autoconf:        2.71-r5::gentoo
sys-devel/automake:        1.16.5::gentoo
sys-devel/binutils:        2.40::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/clang:           15.0.7-r1::gentoo
sys-devel/gcc:             12.2.1_p20221231::gentoo
sys-devel/gcc-config:      2.10::gentoo
sys-devel/libtool:         2.4.7-r1::gentoo
sys-devel/llvm:            15.0.7::gentoo
sys-devel/make:            4.4::gentoo
sys-kernel/linux-headers:  6.1::gentoo (virtual/os-headers)
sys-libs/glibc:            2.36-r6::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo.git
    priority: -1000
    volatile: True
    sync-git-verify-commit-signature: yes

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps=y --nospinner --keep-going --autounmask-write=y --quiet-build=y --jobs=5"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://ftp.uni-erlangen.de/pub/mirrors/gentoo"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
MAKEOPTS="-j2 -l4"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/zsh"
USE="X acl addns ads amd64 bcmath bzip2 cairo calendar caps cgi cli crypt cups curl dri erlang exif extensions extraengine extras fam fftw fontconfig fortran ftp gd gdbm gif gmp graphviz gs headless hwdb iconv idn imap iptables ipv6 javascript jpeg jpeg2k kpathsea ldap ldb libev libevent libglvnd libtirpc libxml2 lua lzma memcache mhash mssql multilib mysqli ncurses nls nptl odbc openmp pam pbxt pcntl pcre pdo perl plotutils png profiling python rar readline ruby savedconfig seccomp smbclient smbtav2 soap sockets split-usr sqlite sqlite3 ssl suhosin svg swig syslog tcl tcpd threads tidy tiff tools truetype unicode utils vim vim-syntax vmware_guest_linux wddx webdav wide-int winbind wmf xattr xetex xml xmlreader xmlrpc xmlwriter xtradb zip zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9" RUBY_TARGETS="ruby27 ruby30 ruby31" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LANG, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 James Le Cuirot gentoo-dev 2023-01-23 22:11:43 UTC
I don't use Tesseract and I don't want to be the middleman in diagnosing this, so please take the issue upstream. Tesseract is one of the main consumers of Leptonica, so I'd suggest starting with Leptonica upstream.
Comment 2 Stefan Schmiedl 2023-01-24 00:18:28 UTC
done. see https://github.com/DanBloomberg/leptonica/issues/659
Comment 3 Stefan Schmiedl 2023-01-24 08:14:27 UTC
Created attachment 849127 [details, diff]
proposed patch

proposed patch, see github, "works for me"
Comment 4 Stefan Schmiedl 2023-01-26 21:20:40 UTC
... and as of 30 min ago there is a new release 1.83.1 upstream at github
Comment 5 Larry the Git Cow gentoo-dev 2023-01-27 23:20:11 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=584602b41e3bc62cf9180fddf1c509d1d9415bd6

commit 584602b41e3bc62cf9180fddf1c509d1d9415bd6
Author:     James Le Cuirot <chewi@gentoo.org>
AuthorDate: 2023-01-27 23:19:46 +0000
Commit:     James Le Cuirot <chewi@gentoo.org>
CommitDate: 2023-01-27 23:19:46 +0000

    media-libs/leptonica: Bump to 1.83.1, drop broken 1.83.0
    
    Closes: https://bugs.gentoo.org/891833
    Signed-off-by: James Le Cuirot <chewi@gentoo.org>

 media-libs/leptonica/Manifest                                           | 2 +-
 .../leptonica/{leptonica-1.83.0.ebuild => leptonica-1.83.1.ebuild}      | 0
 2 files changed, 1 insertion(+), 1 deletion(-)