CMap issue in PDF file generated with recent TeX Live + Ghostscript
shunsaku.hirata74 at gmail.com
Sat Oct 16 06:16:22 CEST 2021
Probably this is a bug in Ghostscript?
The difference between chartest5a-2020.pdf and chartest5a-2021.pdf
is that for the 2021 version ToUnicode CMap is attached while for the
2020 version it's not.
The reason for the difference in pdftotext outputs seems that ToUnicode
mapping information is modified in some way when they are processed
Actually, the following character mapping can be found for out-2021.pdf,
where characters <13> and <14> ("guillemot left" and "guillemot right")
are mapped to
U+0144: LATIN SMALL LETTER N WITH ACUTE
U+017C: LATIN SMALL LETTER Z WITH DOT ABOVE
In the pdflatex generated PDF chartest5a-2021.pdf, they are correctly
mapped to U+00AB and U+00BB.
2021年10月14日(木) 20:30 Vincent Lefevre <vincent at vinc17.net>:
> After generating a PDF file with pdflatex, I usually run ps2pdf
> (from Ghostscript) to make the PDF much smaller (thanks to the
> font conversion from Type 1 to Type 1C).
> While there were no issues with TeX Live up to 2020, CMap gets
> broken when the original PDF file has been obtained with a recent
> TeX Live version.
> For instance, consider the following .tex file:
> Test: « don't finite float offer affine ».
> I've attached 4 PDF files:
> * chartest5a-2020.pdf generated by pdflatex with
> texlive 2020.20210202-3 under Debian/unstable;
> * chartest5a-2021.pdf generated by pdflatex with
> texlive 2021.20210921-1 under Debian/unstable;
> * out-2020.pdf: generated from chartest5a-2020.pdf with ps2pdf;
> * out-2021.pdf: generated from chartest5a-2021.pdf with ps2pdf.
> The ps2pdf script actually runs:
> /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out-2020.pdf -P- -dSAFER -dCompatibilityLevel=1.4 chartest5a-2020.pdf
> /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out-2021.pdf -P- -dSAFER -dCompatibilityLevel=1.4 chartest5a-2021.pdf
> While the chartest5a-*.pdf files seem fine, pdftotext gives the
> following output on the out-*.pdf files:
> Test: « don’t finite float offer affine ».
> Test: ń donŠt Ąnite Ćoat offer affine ż.
> So out-2020.pdf is correct, but out-2021.pdf is not.
> Note that in both cases, the Ghostscript version is the same (9.53.3,
> but there is the same issue for this testcase with Ghostscript 9.54).
> Thus the different behavior comes from the difference between the
> TeX Live versions.
> What is causing this difference? Is this a bug in TeX Live, or only
> in Ghostscript?
> Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
More information about the tex-live