Languages and Scripts

Background

TitleFactory supports and several scripts.  This section describes supported scripts and the degree that one can expect from the product.

The user interface for TitleFactory supports ASCII or non-ASCII text as of TitleFactory Release 2.0.2 BLDA.  ACSII is an internal format that allows a single character to be represented in a single byte.  In Fact ASCII only allows 128 characters.  Unfortunately, a single byte does not allow for all of the various scripts to be represented.  There are much more than 128 characters in all of the world's scripts. 

TitleFactory supports non_ASCII scripts if the text is placed into UTF-8 format.  Every script that we have tried from ASCII, to Greek, to Russian, to Chinese, to Czech can be placed into a UTF8 format, in NotePad or Word or many other text editors.  In fact TitleFactory is being used all over the world and is being used to create image text files supporting a variety of scripts for DVD production. 

If you have a script that you believe cannot be placed into UTF-8 format, please contact us, and we will see if can provide an alternative conversion method for your particular script.

How to Process Non ASCII Text in TitleFactory

To process text other than ASCII text or extended ASCII text (Latin-1, or ISO-8859-1), the input text file should be encoded in UniCode UTF-8 format.  Many common word or text processing packages such as Microsoft Word, NotePad, etc. can save text in UTF-8 format.

UniCode is a text format that allows for multi-byte character sets of varying length.  UTF-8 is a subset of the UniCode specification, and allows each character of text to be internally represented with 2-6 bytes.  Note that a encoding of UniCode, however, is not the same as an encoding of UTF-8 for most software packages.

To create images with TitleFactory, with text other that Western European (which includes North/South American) scripts, one or more parameters must be set properly.

  1. Within TitleFactory, the Encoding setting on the input settings window must be set to UTF-8.
  2. On the TitleFactory output settings window, the encoding setting must be set to the appropriate script.  For Western scripts, Eastern European scripts, Cryllic (such as Russian) scripts, and Southeast Asian scripts (such as Chinese and Japanese) this encoding parameter should be set to UTF-8.  This indicates to TitleFactory that no special encoding is required. 
  3. For Right to Left Languages (such as Arabic and Hebrew), the output encoding setting should be set to the proper script.
  4. Specify a font which supports the output windows encoding.  Most fonts including commonly used fonts such as Arial, Times Roman, and Courier support UniCode.  Additionally most fonts support all Western European and Eastern European scripts including Cryllic.  Many fonts even support Right to Left languages such as Arabic and Hebrew, to some degree.  To make sure that your font supports UniCode, you can use a free program such as BabelMap to view the supported characters within each font.
  5. Scripts without ending punctuation must not use the 'UnWrapped' input text mode as this may cause problems.

Note that some fonts have re-used the Latin 1 code sets to represent a non-Latin 1 script.  An example of this is the Hindi script known as DV-TTNandan.  In this case the script is actually represented in the area normally reserved for Latin 1 scripts. In this case, Even though TitleFactory may work, even if the Encoding setting on the input settings window is et to ASCII, it would still be better to convert the file to UTF-8 and specify UTF-8 on the Input Settings Window.

Also, when utilizing a non-western script and the Textmode is Un-Wrapped or Wrapped, make sure that the 'Ending Punctuation' is set to the characters that the script uses to end a sentence.  This is necessary if the Textmode is Un-Wrapped or Wrapped, that is when, TitleFactory needs to parse the text.

Special Notes on Script Support

 

ImageMagick is a set of low level routines that are used to annotate text on images.  While ImageMagick supports UTF-8, it does not perform any special script encoding.  Script encoding, therefore is performed with TitleFactory for certain scripts.  At present, special encoding is only performed on Right to Left scripts.

 

What do we know works?  Only those scripts that have been tested. 

 

Successfully tested scripts include:               

Untested scripts include:                 

Note also that since the user interface does not support non-ASCII character, the use of TitleFactory for non-Latin scripts is extremely limited.

 

 

Language Tree
Tower of Babel Everyone spoke with the same language
Family Branch Language Speakers Region Script Computer
1 Indo-European Germanic English 430 Global Latin utf-8 or ASCII
German 120 Germany Latin utf-8 or ASCII
Yiddish - Hebrew -
Dutch 22 Netherlands Latin utf-8
Afrikaans South Africa Latin utf-8
Swedish 9 North Europe Latin utf-8
Danish 5.4 Latin utf-8
Norwegian 4.5 Latin utf-8
Icelandic 0.3 Latin utf-8
Latin
(Romance)
Spanish 310 Global Latin utf-8
Portuguese 175 Brazil Latin utf-8
French 115 Global Latin utf-8
Italian 63 Italy Latin utf-8
Romanian 22 Romania Latin utf-8
Hellenic Greek 11 Greece Cryllic utf-8
Slavic Russian 280 Soviet Union Cryllic utf-8
Bulgarian 7.6 Bulgaria Cryllic utf-8
Polish 39 Poland Latin utf-8
Ukrainian 48 Ukraine Cryllic utf-8
Czech 10 Czech Republic Cryllic utf-8
Indic Hindi 320 India Davanagari -
Bengali 185 Bangladesh Davanagari -
Urdu 88 Pakistan Nastaliq -
Punjabi 75 Pakistan Gurumukha -
Konkani - India Latin -
Other Nepali, Assamese, Oriya, kashmiri, Sindhi, Gujerati, Sinhalese, Maldavian, Romany
2 Altic Japonic Japanese 125 Japan Japanese utf-8
Korean Korean 68 Korea Hangul utf-8
Mongolian Mongolian, Buryat, Kalmyk
Tungusic Evenki, Lamut, Manchu, Nanai, Sibo
Turkic Turkish 83 Turkey Latin -
Other Azeri, Turkmen, Kazakh, Kirghiz, tatar, Bashkir, Uzbek, Uigur, Chuvash, Balkar, Nogai, Salar
3 Sino-Tibetan Sinitic Mandarin 900 China Chinese utf-8
Other Wu, Gan, Min, Hakka, Xiang, Cantonese, Yue
Tibeto-Burman Burmese 42 Burma Indian -
Other Tibetan, Yi, Lisu, Moso, Lahu, Karen, Kachin, Chin, Bodo, Garo, Meithei, Lushei, Newari, Murmi, Jonkha, Mizo, Lepcha, Manipuri
Tai Thai 62 Thailand South Indian iso-8859-11, tis-620
Other Lao, Chuang, Puyi, Tung, Nung, Shan, kam-Sui, Zhuang, Li, Be
Southern Miao, Yao, She
4 Afro-Asiatic Semitic Arabic 185 Middle East Arabic utf-8
Hebrew - Israel Hebrew utf-8
Maltese 0.4 Malta Latin utf-8
Aramaic - Middle East Latin utf-8
Other Amharic, Tigrinya, Tigre, Aramaic, Gurage, Harari, Geez
Berber Shluh, Tamazight, Riffian, Kabyle, Shawia, Tuareg
Cushitic Somali, Galla, Sidamo, Beja, Afar, Saho
Chadic Hausa
5 Austro-Asiatic Viet-Muong Vietnamese 81 Vietnam Latin utf-8
Muong -
Mon-khmer Khmer, Mon, Palaung, Wa, Bahnar, Sedang, Khasi, Nicobarese, So, Nancowry, Sengoi, Temiar
Munda Santali, Mundari, Ho, Savara, Korku
6 Uralic Finnic Finnish, Estonian, Mordvin, Udmurt, Mari, Votyak, Komi, Sami
Ugric Hungarian, Ostyak, Vogul
Samoyed Nenets, Selkup, Nganasan, Enets, Kamas
Yukaghir Yukaghir - Eastern Siberia Pictogram -
7 Malayo-Polynesian
Austronesian
Formosan Amis, Atayal, Paiwan, Tsou
Western Indonesia 140 Indonesia Latin utf-8
Malay Indonesia Latin -
Tagalog 85 Philippines Latin utf-8
Other Javanese, Sundanese, Madurese, Visayan, Malagasy, Achinese, Batak, Buginese, Balinese, Ilocano, Bikol, Igorot, Maranao, Pampangan, Pangasinan, Jarai, Rhade, Cham
Micronesian Marshallese, Gilbertese, Chamorro, Ponapean, Yapese, Palau, Trukese, Nauruan
Melanesian Fijian, Motu, Yabim
Polynesian Maori, Uvea, Samoan, Tongan, Niuean, Rarotongan, Tahitian, Tuamotu, Marquesan, Hawaiian, Rapa, Nui
8 Caucasian Kartvelian Georgian, Laz, Svan, Chan, Mingrelian
Abkhaz-Adyghean Abaza, Abkhaz, Adyghe, Kabardian, Circassian
Nakh Chechen, Ingush, Tsova-Tush
Daghestanian Tsez, Hunzib, Beshta, Avar, Andi, Chamali, Lak, Dargwa, Lezgian, Tabasaran, Tsakhur
9 Dravidian Southern Tamil 66 South India Tamil -
Other Telugu, Kannada, Malayalam, Tulu
Central Brahui, Gondhi, Kurukh, Kui
10 Niger-Congo English and French appear to be the official languages of most of these nations.
Mande Mende, Malinke, Bambara, Dyula, Soninke, Susu, Kpelle, Vai, Loma
West Altantic Fulani, Wolof, Serer, Dyola, Temne, Kissi, Gola, Balante
Voltaic Mossi, Gurma, Dagomba, Kabre, Senufo, Bariba
Kwa Yoruba, Ibo, Ewe, Twi, Fanti, Ga, Adangme, Fon, Edo, Urhobo, Idoma, Nupe, Agni, Baule, Kru, Grebo, Bassa
Bantu Luba, Kongo, Lingala, Mongo, Ruanda, Rundi, Kikuyu, Kamba, Sukuma, Nyamwezi, Hehe, Chagga, Makonde, Yao, Ganda, Nkole, Chiga, Gisu, Toro, Nyoro, Nyanja, Tumbuka, Bemba, Tonga, Lozi, Lwena, Lunda, Shona, Fang, Bulu, Yaundé, Duala, Bubi, Mbundu, Chokwe, Ambo, Herero, Makua, Thonga, Sotho, Tswana, Pedi, Swazi, Zulu, Matebele, Xhosa, Venda
Swahili
Efik Efik, Ibibio, Tiv
Adamwan Mbum
Eastern Zande, Sango, Gbaya, Banda
Ijo Ijo
11 Other There are over 100 language families
 

 

from http://www.teachinghearts.org/

 

 

 

Copyright © 2002-2009 . All rights reserved.