Typesetting Ethiopic

Remarks on the typesetting of Ethiopic using a TeX variant.

I needed to typeset some text including fragments in Arabic, Hebrew and Ethiopic. There is good support for Arabic and Hebrew, but it took some effort to get a working Ethiopic setup.

Setup

First the instructions that worked for me, on a recent Linux system.

Font

First get a font. The first pointer to an Ethiopic font I encountered was to the font CODE2000.TTF found at home.att.net/~jameskass. It is a shareware font, of good quality.

I installed it in /usr/X11R6/lib/X11/fonts/truetype/ and did

# ln -s /usr/X11R6/lib/X11/fonts/truetype/CODE2000.TTF /usr/share/texmf/fonts/truetype/CODE2000.ttf
# cd /usr/share/texmf/fonts
# ttf2tfm truetype/CODE2000.ttf tfm/code2k@Unicode@
# echo 'code2k@Unicode@ CODE2000.ttf' >> /etc/ttf2pk/ttfonts.map
# texhash

Other fonts are available, both commercial and free. May report later on some of those.

Omega and OTPs

Typesetting Arabic requires support for finding the right shapes for the letters, depending on the context. This is done beautifully by the system Omega, an extension of TeX that can handle large fonts and transformations depending on context. There is also Lambda, which is to Omega what LaTeX is to TeX. I used Lambda.

Not having a keyboard with Arabic / Hebrew / Ethiopic keys, it is easiest to type a transliteration of the required text, where the language support package transforms the transliteration into the correct symbols. Several systems for the transliteration of Ethiopic exist, each in several flavours. I chose SERA.

The file ethin.otp converts transliterated text:

% input: 8-bit
input: 1;

% output: 16-bit
output: 2;

expressions:
% Sorted for decreasing length, then increasing Unicode
%
``' `1' `0' `0' `0' `0' => "\CharLet{7C}";
``' `s' `W' `a' => "\CharLET{27}";
``' `1' `0' `0' => "\CharLet{7B}";
`l' `W' `a' => "\CharLET{0F}";
`H' `W' `a' => "\CharLET{17}";
`m' `W' `a' => "\CharLET{1F}";
``' `s' `e' => "\CharLET{20}";
...
`P' => "\CharLet{35}";
`S' => "\CharLet{3D}";
`f' => "\CharLet{4D}";
`p' => "\CharLet{55}";
`:' => "\CharLet{61}";
`,' => "\CharLet{63}";
`;' => "\CharLet{64}";
. => \1;

It is the composition of two transformations: eth2uni.otp that converts SERA to Unicode:

...
``' `1' `0' `0' `0' `0' => @"137C;
``' `s' `W' `a' => @"1227;
``' `1' `0' `0' => @"137B;
...

and uni2font.otp that converts Unicode to font positions, via

@"1200 => "\CharLET{00}";
@"1201 => "\CharLET{01}";
...
@"12FF => "\CharLET{FF}";
@"1300 => "\CharLet{00}";
...
@"137C => "\CharLet{7C}";

(Probably there is a more compact way to write this, but something like

@"1200-@"12FF => "\CharLET{" #((\1 div: @"10)-@"F0) #((\1 mod: @"10)+@"30) "}";

is not good enough because one has to test for hex digits larger than 9. One could redefine the macros to work in decimal.)

Given these *.otp files, one converts them to *.ocp and installs:

# otp2ocp ethin.otp
# cp ethin.otp /usr/share/texmf/omega/otp/oethiopic
# cp ethin.ocp /usr/share/texmf/omega/ocp/oethiopic
# texhash

It would be cleaner to use eth2uni.otp and uni2font.otp separately, but that works only on small pages, and lambda segfaults on big pages. Folding the two transformations into ethin.otp makes things work on the (still rather small) files I have tried so far.

The resulting mapping is shown in the two tables below.

An input file

Stuff belongs in a style file, but that is for later. Here the input that makes things work.

\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{omega}
\usepackage[LET,Let,T1,OT1]{fontenc}

\ocp\EthIn=ethin
\ocplist\EthiopicOCP=
  \addbeforeocplist 1 \EthIn
  \nullocplist

\def\CharLET#1{\fontencoding{LET}\selectfont\char"#1 }
\def\CharLet#1{\fontencoding{Let}\selectfont\char"#1 }

\newenvironment{ethiopic}{\fontencoding{LET}\fontfamily{c2000}\selectfont%
                          \pushocplist\EthiopicOCP}%
                         {\popocplist}
\newcommand\geez[1]{\Large\begin{ethiopic}#1\end{ethiopic}}

\begin{document}
\geez{ fedel }
\end{document}

The resulting output is

Ethiopic

Ethiopic uses a syllabary with more than 256 glyphs. This means that one needs two 256-char fonts to cover it. In other words, there will be font changes inside words, and it is most convenient to have a setup where each character says from what font it must be taken.

(That is the reason for the CharLET and CharLet macros above.)

Characters are numbered by the Unicode range U+1200 up to U+137C.

The Unicode book has in version 3.0 the wrong glyph for U+125C. In version 4.0 this is corrected.

Characters are named in the Unicode book by names like ETHIOPIC SYLLABLE HA.

Characters are often represented using the SERA System for representing Ethiopic in ASCII. This system uses transliterations of 1-6 bytes per symbol. See the tables above.

Syllables come in series of length 7 or 8 or 12.

The vowels of the syllables in a series of length 7 (or 8) are indicated by A, U, I, AA, EE, E, O (and WA) in the Unicode names. SERA uses in these cases e, u, i, a, E, -, o, Wa, respectively, where - stands for the empty string. When no written consonant precedes, that is, for U+12A5, the ETHIOPIC SYLLABLE GLOTTAL E, the SERA system uses I.

The vowels of the syllables in a series of length 12 are indicated by A, U, I, AA, EE, E, O, WA, WI, WAA, WEE, WE in the Unicode names. SERA uses in these cases e, u, i, a, E, -, o, We, Wi, Wa, WE, W, respectively.

(Thus: -Wa corresponds to WA in a series of 8, for example lWa is ETHIOPIC SYLLABLE LWA, but -Wa corresponds to WAA in a series of 12, for example kWa is ETHIOPIC SYLLABLE KWAA.)

Bugs

I used the Hebrew package makor. It works beautifully, but as given in CTAN there is one bug: the file makor/fonts/type1/makor2/ezra/Ezra.pfb should be called ezra.pfb in lower case, when one is on a case-sensitive filesystem

There is a LaTeX package called ethiop. On my system this doesn't work. Indeed, the documentation that it comes with (the file ethiodoc.tex) is no longer accepted by current LaTeX/babel setup. I don't know who is to blame. (One can get that file through LaTeX by enclosing all fragments \selectlanguage{ethiop}...\selectlanguage{english} in braces, and turning {\eth :} into {\eth :{}}.)

Bug in lambda: With two tables in the same file it crashes: Segmentation fault.

Thanks

Useful information and remarks were given by Mike Fabian, Behdad Esfahbod, Benjamin Bayart, Javier Bezos.