Updated: 2/02/2003; 11:19:27 PM
Stephen Rapley
    notes

daily link  Wednesday, 22 January 2003

Arabic letters on screen (part 2)

This is a test of the behaviour within the browser of unicode Arabic letters at the earlier part of the code listing ie starting at hex 060C or in this case the letters that make up the word مستقبل or mustaqbil, a word for a radio receiver.

They are miim (hex 0645), siin (0633), taa (062A), qaaf (0642) and baa (0628). These letters correspond to the decimal based html entities: م = م | &#amp;#1587; = س | &#amp;#1578; = ت | &#amp;#1602; = ق | &#amp;#1576; = ب | &#amp;#1604; = ل

Using unicode fonts the entities automatically align in a 'right-to-left' direction, even within a line of English heading the other way.

sidebar: One of the fascinating structural characteristics of Arabic (and presumably Hebrew) is the importance of the "consonantal root system". Combinations of consonants represent fundamental core concepts that are modified or altered by the addition of other letters or changes in vowels. The classic example is k-t-b which carries the idea of writing: kataba=to write, kitaab=book, maktab=office, maktaba=library, kaatib=clerk etc.

Interestingly the radio receiver root مستقبل is shared by the word for the future and مستعمل for user.

When these unicode Arabic letters are presented with a space between the letters they are presented in their normal isolated form. In these examples the font-size has been increased to 36 point to aid legibility for those (like me) unused to the script. They are also coded for a font called "Traditional Arabic". Earlier examples of Arabic in this document rely on the ability of the browser's default font, usually Times New Roman to handle unicode.

م س ت ق ب ل

When these same entities are presented without spaces between them, the letters assume the appropriate form (initial, medial, final or isolated) according to their position in the word and the orthographic rules of the language.

مستقبل

Of course the simplest way to create Arabic material in Windows is by installing the appropriate regional support via the control panel and your install disks. A cumbersome workaround is to use Character Map with a unicode font. Common ones are Times New Roman, Lucida Sans Unicode, Tahoma, Microsoft Sans Serif and Courier New.

Here's our word of the day in each of them, assuming they're installed on your system:

Times New Roman:
مستقبل

Lucida Sans Unicode:
مستقبل

Tahoma:
مستقبل

Microsoft Sans Serif:
مستقبل

Courier New:
مستقبل

What's eluding me now is how to encode vowel markings to aid pronunciation? Again this probably depends upon the control panel/regional settings and shortcuts for adding diacritics to main letters.

 
11:27:02 PM  permalink    
See these topics too: Arabic writing 
  Checkout what Google suggests:  


Very long term projects

Wissenschaftliche Jahrhunderprojekte. "Begonnen 1894, beendet irgendwann im 21. Jahrhundert – manche Projekte beschäftigen Wissenschaftler über mehrere Generationen hinweg.".
Die SZ schreibt über Langzeitprojekte wie das Grimmsche Wörterbuch oder den Thesaurus Linguae Latinae, welches das umfangreichste Wörterbuch der kateinische Sprache werden soll.
Beeindruckend einerseits, dass die Deutschen auf ihren angestammten Gebieten der Geisteswissenschaft vorne dran sind. Aber wie im Artikel auch zu lesen ist: Heute würde man solche Projekte nicht mehr beginnen.
Privatarchiv
|ö| = KerLone [Mosaikum 1.0
12:01:21 PM  permalink  source  
See these topics too: archives collections history writing language 
  Checkout what Google suggests:  


Copyright 2003 © Stephen Rapley