# [EXT] Re: different hyphenation between pdftex and luatex

Philip Taylor P.Taylor at Hellenic-Institute.Uk
Sat Sep 11 13:06:20 CEST 2021

David Carlisle wrote:
> The hyphenation in luatex is different in many ways to that of classic
> tex even when they use the same \pattern data so it's not that
> surprising that you get different results for some constructs.
>
> Here the main issue is that apart from some legacy compatibility
> luatex has moved away from the (frankly weird) reliance on lowercase
> codes for determining which characters take part in hyphenation. So
> luatex sees the whole construct as a single word and looks it up using
> the patterns, pdftex sees the * as a word boundary and doesn't start
> the next word until after it sees some white space so skips most of
> this.
>
> If you set the lccode of * to itself then you get the same result as luatex

Ah yes, /mea culpa/ — it is \lccode, not \catcode, that matters here
(see below).  But I would, with respect, beg to differ with your
assertion that TeX has a "frankly weird" reliance on \lccodes; at the
time that TeX was written, every last byte counted, and if overloading
the significance of \lccode allowed other more important features to be
included, then I would suggest that the decision was a wise one at the time.

>  TEX looks for potentially hyphenatable words by searching ahead
> from each glue
> item that is not in a math formula. The search bypasses characters whose
> \lccode is zero, or ligatures that begin with such characters; it also
> bypasses whatsits
> and implicit kern items, i.e., kerns that were inserted by TEX itself
> because of information
> stored with the font. If the search finds a character with nonzero
> \lccode, or if it finds a ligature
> that begins with such a character, that character is called the
> starting letter. But if any
> other type of item occurs before a suitable starting letter is found,
> hyphenation is abandoned
> (until after the next glue item). Thus, a box or rule or mark, or a
> kern that was explicitly inserted
> by \kern or \/, must not intervene between glue and a hyphenatable
> word. If the starting
> letter is not lowercase (i.e., if it doesn’t equal its own \lccode),
> hyphenation is abandoned
> unless \uchyph is positive.
>
>  If a suitable starting letter is found, let it be in font f.
> Hyphenation is abandoned unless
> the \hyphenchar of f is a number between 0 and 255, inclusive. If this
> test is
> passed, TEX continues to scan forward until coming to something that’s
> not one of the following
> three “admissible items”: (1) a character in font f whose \lccode is
> nonzero; (2) a ligature
> formed entirely from characters of type (1); (3) an implicit kern. The