# [tex-live] Hyphenation patterns, Unicode, XeTeX, and language.dat

Jonathan Kew jonathan_kew at sil.org
Thu Aug 17 16:59:20 CEST 2006

On 17 Aug 2006, at 2:17 pm, Morten Høgholm wrote:

> On Thu, 17 Aug 2006 14:51:35 +0200, Jonathan Kew
> <jonathan_kew at sil.org> wrote:
>
>> Then we modify the "language.__.dat" files in texmf/tex/generic/
>> config to refer to the xu- wrapper files (in the cases where one is
>> necessary), and the pre-built "language.dat" will change similarly.
>>
>> The net result will be that standard 8-bit TeX will load exactly the
>> same patterns as it currently does (it'll just do some extra \input
>> operations during format creation, but this is insignificant), and
>> XeTeX will load the same set of patterns, but encoded for use with
>> Unicode text.
>>
>>
>> Before actually making changes to something as central as
>> language.dat, however, I'd like to hear any concerns or objections to
>> this proposed strategy, or alternative suggestions that could make
>> things simpler for us all.
>
> Rather than changing language.dat, wouldn't it be easier to add a
> hook like \patternfileprefix in hyphen.cfg and change
> \process at language accordingly:
>
> \def\process at language#1 #2 #3/{%
>  ...
>   \input \patternfileprefix#2\relax
> ...}
>
> Then a format like XeTeX can just define it in the ini file or
> elsewhere.

That would certainly be a possible solution. (Actually, I already
looked at hyphen.cfg to see if such a hook happened to exist, as I
wondered about using such an approach.)

However, it would mean providing a prefixed "wrapper" for *every*
pattern file, even those like hyphen.tex or ukhyphen.tex which are
pure ASCII (or Latin-1 letters represented with ^^xx codes, which
equate directly to Unicode codepoints). Close to half the pattern
files in TL's texmf/tex/generic/hyphen are currently "safe" files of
this nature. So I'm not sure this is really easier/better than
changing the (single-line) language.__.dat files.

(One slightly more elaborate approach, then, would be to test for the
existence of \patternfileprefix#2, and if this is not available, load
the original file without a prefix.)

The other factor is that I've primarily been looking at solutions
that could be implemented entirely at the TeX Live (or other
distribution) level, without touching any of the canonical LaTeX or
Babel files. But if there's agreement that it would be preferable to
add such a hook, I'd be happy to go that way.

Further thoughts, anyone?

JK