Shaping Tamil Fonts with AAT for macOS

I presented the paper below at the 8th Tamil Internet Conference held in Koeln, Germany in 2009. Mac OS X, as it was called at that time, did not support Open Type shaping until Lion (10.7). A lot has changed since. OpenType implementation in macOS today is very good, especially with complex script support. However, AAT is still actively supported. I enjoy AAT as much as I do OpenType. The paper below will be useful to understand the basic elements of AAT shaping.


Building Tamil Unicode Fonts for Mac OS X

(Tamil Internet 2009, Koeln, Germany. October 2009)

1.0 Introduction

The Tamil script, like all other Indic scripts, is a syllabic script.  It has 12 independent vowels (உயிர் எழுத்துகள்), 18 consonants (மெய் எழுத்துகள்), Aytham (ஆய்த எழுத்து) and compound forms (உயிர்மெய் எழுத்துகள்) that represent combinations of a consonant and a vowel¹.  In addition, there are grantha letters that are used to write words of non-Tamil origin.

The Unicode Standard (Unicode) encodes the following groups of characters²:

  1. a) Letters: Independent vowels, consonants and aytham
  2. b) Dependant vowel signs
  3. c) Tamil numerals
  4. d) Various signs and symbols

Having these encoded characters alone in a Tamil Unicode font (Tamil font) is not sufficient to render a readable Tamil text. The compound forms are also required.

The compound forms are not encoded with single (atomic) code points. Thus, a Tamil Unicode font will contain glyphs that do not have a character code. In other words, there will be more glyphs in the font than there are Tamil characters encoded in Unicode.

In addition to the glyphs, the font should contain rules that dictate the formation of compound forms from the respective consonant-vowel pairs. These rules are called shaping rules.

In a Tamil font designed for Microsoft Windows platforms, the shaping rules are defined in OpenType (OT) tables. In Mac OS X, these rules are defined with AAT tables.

Windows XP and later versions of the Windows platform includes a Tamil font called Latha.  Mac OS X has one called InaiMathi since version 10.4 (Tiger).

This paper presents the steps required to build a Tamil font for Mac OS X.

2.0 Prerequisites

In the interest of space and time, this paper assumes that the reader is familiar with the following:

  1. How Unicode defines characters? (http://unicode.org)
  2. Difference between a character and a Glyph
  3. Code-point order vs presentation order
  4. Using the Terminal application in Mac OS X

3.0 Glyphs in a typical Tamil font

The figure below shows the glyphs in a typical Tamil font.

Typical Tamil Glyph Repertoire
Figure 1: Glyphs in a Tamil font.

The choice of glyphs may vary from font to font.  Some may have the pure consonants (base+pulli) pre-composed as with the font above.  Others may just have one pulli glyph and use kerning tables to position it above the base glyphs.  The font above has all possible combinations pre-composed for simplicity.

4.0 Adding shaping rules

Once the required glyphs are drawn, the only remaining step is to add the shaping rules.

Two shaping frameworks are popular: OpenType (OT) used in Windows (and some Linux platforms) and Apple Advanced Typography (AAT) used in Mac OS X.

Both these frameworks differ in design philosophies. OT attempts to do some of the common processing in an external engine called Uniscribe. Uniscribe eliminates the need to define glyph reordering and two part vowel handling in a Tamil font. While this appears to make things easier for the developer, it does limit flexibility. Since most font developers use a common template for shaping rules, across all their fonts, the real benefit Uniscribe provides at the expense of complexity and performance is hard to realise.

AAT on the other hand, provides complete freedom to the developer.  There is no script processor.  Therefore, all of the shaping rules are defined by the developer and are included in the font. Once a working font has been built, the same rules can be applied to all other fonts by simply compiling the definitions into the font. AAT is a simple and an elegant framework with less overheads in rendering.

4.1 Shaping rules with AAT

The rules are defined in a text file, called the Morph Input File (MIF) and compiled into the font file using Apple’s font tools³.

The command to add the rules from a MIF file is:

ftxenhancer –m <MIF filename> <TTF filename>

For example:

ftxenhancer –m my_tamil.mif my_font.ttf

5.0 Creating a MIF file

A MIF file can be created with any text editor.  The rules are defined in tables arranged in the order they should be executed. For example, the letter க்ஷ is not assigned a code-point in Unicode.  Since this letter has compound forms when combined with vowel signs, it may be best to define the shaping rule for this before hand.

Shaping rules are defined using glyph names and not character codes. While it is desirable to use Adobe Standard names for glyphs, the common practice is to use friendly names or adopt a naming convention that clearly describes how the glyphs are formed. There are no rules for defining glyph names. It is entirely up to the developer.

The convention used for the font in Figure 1 is as below:

  • Consonants: tgc_<unicode name>.  Example: tgc_ka, tgc_mi, tgc_ttoo etc
  • Vowels : tgv_<unicode name>. Example: tgv_a, tgv_au etc
  • Grantha: tgg_<unicode name>. Example: tgg_sha, tgg_juu, tgg_sri etc
  • Vowel Signs: tgm_<unicode name>. Example: tgm_a, tgm_au etc

The following are the shaping rules that are needed for the font in figure 1.

  1. Substitute BASE+I, BASE+II, BASE+U, BASE+UU and BASE+PULLI with combinations with their respective compound forms. The feature used for this purpose is called Ligature Substitution.
  2. Rearrange E, EE and AI vowel signs so that they appear before the BASE glyph. The feature used for this purpose is Rearrangement.
  3. Place the left and right marks of O, OO and AU vowel signs on either side of the BASE glyph. The feature used for this purpose is Insertion.

The sections below describe each of the features.

5.1 Ligature Substitution

Ligature substitution is done in two steps.

First க்ஷ ligature is formed by substituting the characters that make up this glyph: KA + PULLI + SSA (க + புள்ளி + ஷ ). This is to ensure that this glyph is already available when vowel combinations are substituted. Likewise SRI can be substituted for SHA + PULLI + RA + VOWEL_SIGN_II  (ஶ + புள்ளி + ர +  ீ).

Second, all compound forms for I, II, U, UU and PULLI signs are substituted.

Examples of these tables are given below:

Table in a Tamil MIF file
Figure 2: First table in a Tamil MIF file
Tamil Ligature Substitution
Figure 3: Ligature substitution for compound forms. This table follows the earlier table so that tgg_xa is already available from the first substitution

5.2 Rearrangement

In a string of Tamil text, vowel signs are stored after the base character. This is the same with any Indic script. For example, the word மலைநாடே is stored in memory as below:

Memory representation of the word மலைநாடே.
Figure 4: Memory representation of the word மலைநாடே.

When this word is rendered, the AI and EE vowel signs need to be re-ordered so that they appear before the base glyph.  In Windows OpenType, this process is not necessary, as the Uniscribe engine will do the reordering internally.  In Mac OS X, this rule needs to be defined.  It can be easily done with the Rearrangement feature in AAT.

Rearrangement and Insertion features use state tables to mark the positions and perform a defined action when desired glyphs are seen together. Unlike rearrangements in other Indic scripts where conjuncts and consonant signs may be involved, Tamil rearrangement is a simple act of swapping the positions of the base glyph and vowel sign.

Figure 5 shows the state table required for this feature.  When a base glyph, defined in the Cons group, is seen, it is marked and the next glyph is examined. If the next glyph is a member of the RVowel group, the base and vowel sign are swapped. Since reordering is a required feature, and involves all of the base glyphs in Tamil, this table can be used in any Tamil font created for Mac OS X.

Rearrangement Table
Figure 5: Rearrangement Table

5.3 Insertion

Insertion involves base glyphs with vowel signs O, OO and AU. The word கோ is represented in memory as shown below:

The word கோ in memory
Figure 6: The word கோ in memory

Although three glyphs are needed to present the compound form, there are only two characters in memory. A glyph needs to be inserted somewhere in order to get the required three.

There are many techniques to do this. One of them is to draw the vowel sign OO in the font as just the kaal and insert vowel sign EE before KA.  This will provide the desired effect.  However, the glyph for vowel sign OO becomes misleading.

A more efficient technique can be as described in Figure 7.

Insertion
Figure 7: Technique to perform insertion without changing the shape of vowel sign OO.

The steps can be performed with the following MIF file entries:

Step 1:  Insertion.  This is done with state tables:

Right side insertion
Figure 8: State table to perform insertion of right side glyph for a two part vowel sign. Vowel sign AA is inserted for O and OO. AU Length mark is inserted for AU.

Step 2: Replacing O, OO and AU vowel sign glyphs with their respective left side signs.  This can be done with substitution as shown below:

substitute right
Figure 9: Substituting O, OO and AU vowel signs with their respective right side signs.

Step 3: Since this is the same as rearrangement discussed in 5.2, Step 1 and 2 can be done just before the reordering process in the MIF file. By doing so, rearrangements for step 3 can be done along with the rearrangements for E, EE and AI.

6.0 Putting it all together

The completed MIF file will have tables in the following order:

  1. Substitutions for க்ஷ and ஶ்ரீ. (Fig 2)
  2. Substitutions for I, II, U, UU and Pulli forms (Fig 3)
  3. Inserting right side vowel sign for O, OO and AU (Fig 8)
  4. Substituting O, OO and AU with their respective left side signs (Fig 9)
  5. Rearranging E, EE and AI vowel signs with their base glyphs (Fig 5)

Once the MIF file is created, it can then be added to the font with ftxenhancer as described in section 4.0.

References:

  1. http://en.wikipedia.org/wiki/Tamil_script
  2. http://www.unicode.org/charts/PDF/U0B80.pdf
  3. http://developer.apple.com/fonts
  4. Refer to Apple font tools documentation for details.

Leave a Reply