[EXT] Re: different hyphenation between pdftex and luatex

Philip Taylor P.Taylor at Hellenic-Institute.Uk
Sat Sep 11 13:06:20 CEST 2021

David Carlisle wrote:
> The hyphenation in luatex is different in many ways to that of classic
> tex even when they use the same \pattern data so it's not that
> surprising that you get different results for some constructs.
> Here the main issue is that apart from some legacy compatibility
> luatex has moved away from the (frankly weird) reliance on lowercase
> codes for determining which characters take part in hyphenation. So
> luatex sees the whole construct as a single word and looks it up using
> the patterns, pdftex sees the * as a word boundary and doesn't start
> the next word until after it sees some white space so skips most of
> this.
> If you set the lccode of * to itself then you get the same result as luatex

Ah yes, /mea culpa/ — it is \lccode, not \catcode, that matters here 
(see below).  But I would, with respect, beg to differ with your 
assertion that TeX has a "frankly weird" reliance on \lccodes; at the 
time that TeX was written, every last byte counted, and if overloading 
the significance of \lccode allowed other more important features to be 
included, then I would suggest that the decision was a wise one at the time.

>  TEX looks for potentially hyphenatable words by searching ahead 
> from each glue
> item that is not in a math formula. The search bypasses characters whose
> \lccode is zero, or ligatures that begin with such characters; it also 
> bypasses whatsits
> and implicit kern items, i.e., kerns that were inserted by TEX itself 
> because of information
> stored with the font. If the search finds a character with nonzero 
> \lccode, or if it finds a ligature
> that begins with such a character, that character is called the 
> starting letter. But if any
> other type of item occurs before a suitable starting letter is found, 
> hyphenation is abandoned
> (until after the next glue item). Thus, a box or rule or mark, or a 
> kern that was explicitly inserted
> by \kern or \/, must not intervene between glue and a hyphenatable 
> word. If the starting
> letter is not lowercase (i.e., if it doesn’t equal its own \lccode), 
> hyphenation is abandoned
> unless \uchyph is positive.
>  If a suitable starting letter is found, let it be in font f. 
> Hyphenation is abandoned unless
> the \hyphenchar of f is a number between 0 and 255, inclusive. If this 
> test is
> passed, TEX continues to scan forward until coming to something that’s 
> not one of the following
> three “admissible items”: (1) a character in font f whose \lccode is 
> nonzero; (2) a ligature
> formed entirely from characters of type (1); (3) an implicit kern. The 
> first inadmissible
> item terminates this part of the process; the trial word consists of 
> all the letters found in admissible
> items. Notice that all of these letters are in font f.

/** Phil./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-live/attachments/20210911/5506ee7f/attachment.html>

More information about the tex-live mailing list.