May 09, 2024, 11:03:14 AM
collapse

Author Topic: Punjabi Machine Transliteration  (Read 908 times)

Offline Mર. ◦[ß]гคг રừlểz™

  • Jimidar/Jimidarni
  • ***
  • Like
  • -Given: 15
  • -Receive: 17
  • Posts: 1272
  • Tohar: 0
  • Gender: Male
    • View Profile
  • Love Status: Single / Talaashi Wich
Punjabi Machine Transliteration
« on: August 08, 2010, 07:46:27 PM »
Punjabi Machine Transliteration
M. G. Abbas Malik
Department of Linguistics
Denis Diderot, University of Paris 7
Paris, France
abbas.malik@gmail.com

Abstract
Machine Transliteration is to transcribe a
word written in a script with approximate
phonetic equivalence in another language.
It is useful for machine translation,
cross-lingual information retrieval,
multilingual text and speech processing.
Punjabi Machine Transliteration (PMT)
is a special case of machine transliteration
and is a process of converting a word
from Shahmukhi (based on Arabic script)
to Gurmukhi (derivation of Landa,
Shardha and Takri, old scripts of Indian
subcontinent), two scripts of Punjabi, irrespective
of the type of word.
The Punjabi Machine Transliteration
System uses transliteration rules (character
mappings and dependency rules) for
transliteration of Shahmukhi words into
Gurmukhi. The PMT system can transliterate
every word written in Shahmukhi.
1 Introduction
Punjabi is the mother tongue of more than 110
million people of Pakistan (66 million), India (44
million) and many millions in America, Canada
and Europe. It has been written in two mutually
incomprehensible scripts Shahmukhi and Gurmukhi
for centuries. Punjabis from Pakistan are
unable to comprehend Punjabi written in Gurmukhi
and Punjabis from India are unable to
comprehend Punjabi written in Shahmukhi. In
contrast, they do not have any problem to understand
the verbal expression of each other. Punjabi
Machine Transliteration (PMT) system is an
effort to bridge the written communication gap
between the two scripts for the benefit of the millions
of Punjabis around the globe.
Transliteration refers to phonetic translation
across two languages with different writing systems
(Knight & Graehl, 1998), such as Arabic to
English (Nasreen & Leah, 2003). Most prior
work has been done for Machine Translation
(MT) (Knight & Leah, 97; Paola & Sanjeev,
2003; Knight & Stall, 1998) from English to
other major languages of the world like Arabic,
Chinese, etc. for cross-lingual information retrieval
(Pirkola et al, 2003), for the development
of multilingual resources (Yan et al, 2003; Kang
& Kim, 2000) and for the development of crosslingual
applications.
PMT is a special kind of machine transliteration.
It converts a Shahmukhi word into a Gurmukhi
word irrespective of the type constraints
of the word. It not only preserves the phonetics
of the transliterated word but in contrast to usual
transliteration, also preserves the meaning.
Two scripts are discussed and compared.
Based on this comparison and analysis, character
mappings between Shahmukhi and Gurmukhi are
drawn and transliteration rules are discussed.
Finally, architecture and process of the PMT system
are discussed. When it is applied to Punjabi
Unicode encoded text especially designed for
testing, the results were complied and analyzed.
PMT system will provide basis for Cross-
Scriptural Information Retrieval (CSIR) and
Cross-Scriptural Application Development
(CSAD).
2 Punjabi Machine Transliteration
According to Paola (2003), “When writing a foreign
name in one’s native language, one tries to
preserve the way it sounds, i.e. one uses an orthographic
representation which, when read
aloud by the native speaker of the language,
sounds as it would when spoken by a speaker of
the foreign language – a process referred to as
Transliteration”. Usually, transliteration is referred
to phonetic translation of a word of some
1137
specific type (proper nouns, technical terms, etc)
across languages with different writing systems.
Native speakers may not understand the meaning
of transliterated word.
PMT is a special type of Machine Transliteration
in which a word is transliterated across two
different writing systems used for the same language.
It is independent of the type constraint of
the word. It preserves both the phonetics as well
as the meaning of transliterated word.
3 Scripts of Punjabi
3.1 Shahmukhi
Shahmukhi derives its character set form the
Arabic alphabet. It is a right-to-left script and the
shape assumed by a character in a word is context
sensitive, i.e. the shape of a character is different
depending whether the position of the
character is at the beginning, in the middle or at
the end of the word. Normally, it is written in
Nastalique, a highly complex writing system that
is cursive and context-sensitive. A sentence illustrating
Shahmukhi is given below:
X}Z Ìáââ y6– ÌÐâ< 6ڻ– ~@

Database Error

Please try again. If you come back to this error screen, report the error to an administrator.

* Who's Online

  • Dot Guests: 3267
  • Dot Hidden: 0
  • Dot Users: 0

There aren't any users online.

* Recent Posts

Majh on sale by Gujjar NO1
[April 07, 2024, 03:08:25 PM]


Best DP of the Week by Gujjar NO1
[March 29, 2024, 03:14:49 PM]


your MOOD now by EvIL_DhoCThoR
[March 26, 2024, 05:58:11 AM]


~~say 1 truth abt the person above ya~~ by Gujjar NO1
[March 21, 2024, 11:04:24 AM]


Hello Old Friends/Friendaynaz by Gujjar NO1
[March 14, 2024, 03:42:51 AM]


This Site Need Fix/Update by Gujjar NO1
[March 13, 2024, 11:48:37 AM]


Test, just a test by Gujjar NO1
[March 11, 2024, 12:32:30 PM]


Good morning (first word ki keha) by Gujjar NO1
[February 27, 2024, 01:10:20 AM]


Throw something at the user above u by Gujjar NO1
[February 26, 2024, 01:13:56 PM]


PJ te kinnu dekhan nu jii karda tuhada ??? by Gujjar NO1
[February 15, 2024, 10:48:50 AM]


Just two line shayari ... by Gujjar NO1
[February 15, 2024, 10:46:34 AM]


which pj member do u miss ryt now? by ❀¢ιм Gяєωʌℓ ❀
[August 30, 2023, 03:26:27 AM]


Hello Old Friends/Friendayna by ☬🅰🅳🅼🅸🅽☬
[July 07, 2023, 08:01:42 AM]


Request Video Of The Day by mundaxrisky
[May 23, 2023, 05:23:51 PM]


ਚਿੱਟਾ ਤੇ ਕਾਲ਼ਾ ਆਊਡੀਓਬੂਕ by ਰੂਪ ਢਿੱਲੋਂ
[March 30, 2023, 07:50:56 PM]


@pump_upp - best crypto pumps on telegram ! by J.y.o.T
[February 05, 2023, 01:53:09 PM]


What is the first thing you do, when you wake up in the morning? by Cutter
[January 12, 2023, 08:23:23 AM]


Verifpro.net - paypal, ebay, banks, crypto, docs and more! by J.y.o.T
[January 11, 2023, 02:59:45 PM]


Chita Te Kala Novel Latest Review by ਰੂਪ ਢਿੱਲੋਂ
[September 14, 2022, 07:03:31 PM]


Book Review by ਰੂਪ ਢਿੱਲੋਂ
[May 19, 2022, 05:25:18 PM]


Books, Novels & Stories by ਰੂਪ ਢਿੱਲੋਂ
[May 19, 2022, 05:20:16 PM]


New Book Release: Chita Te Kala Novel by ਰੂਪ ਢਿੱਲੋਂ
[May 19, 2022, 05:06:16 PM]


What Is the Best Compliment You've Ever Received? by mundaxrisky
[October 15, 2018, 07:24:41 PM]


Last textmessage that u received by mundaxrisky
[October 15, 2018, 07:12:26 PM]


name one thing you can't live without ? by mundaxrisky
[October 15, 2018, 07:09:02 PM]