0% found this document useful (0 votes)
83 views6 pages

Mishra Panini Paper

This document proposes a model for simulating the Pāninian system of Sanskrit grammar on a computer. It defines key data structures like sound sets and language components to represent the fundamental units of the language based on their phonemic attributes. Grammatical rules are represented as process strips that link rules to language components. Basic operations are defined to manipulate these data structures and simulate the generative process defined in Pānini's grammar.

Uploaded by

MnvMadhva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views6 pages

Mishra Panini Paper

This document proposes a model for simulating the Pāninian system of Sanskrit grammar on a computer. It defines key data structures like sound sets and language components to represent the fundamental units of the language based on their phonemic attributes. Grammatical rules are represented as process strips that link rules to language components. Basic operations are defined to manipulate these data structures and simulate the generative process defined in Pānini's grammar.

Uploaded by

MnvMadhva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Simulating the Pān.

inian System of Sanskrit Grammar

Anand Mishra
Department of Computational Linguistics
Ruprecht Karls University, Heidelberg
http://sanskrit.sai.uni-heidelberg.de

Abstract and the taddhita suffix /a/ is represented by


the key a 4. Given such a collection of unique
We propose a model for the computer keys, we define the set of fundamental compo-
representation of the Pān.inian system nents as follows:
of sanskrit grammar. Based on this
Definition 1 The collection of unique keys
model, we render the grammatical data
corresponding to the basic constituents of the
and simulate the rules of As.t.ādhyāyı̄ on
language, we define as the set F of fundamen-
computer. We then employ these rules
tal components.
for generation of morpho-syntactical
components of the language. These Further, we decompose this set F into two dis-
generated components we store in a p- joint sets P and M, where P is the set of keys
subsequential automata. This we use corresponding to the phonemes and M con-
to develop a lexicon on Pān.inian prin- taining the keys of the rest of the constituting
ciples. We report briefly its implemen- elements (morphemes/lexemes).
tation.
P = {a 0, i 0, u 0, . . .}
1 A Representation of As.t.ādhyāyı̄ M = {bhU a, tip 0, laT 0, . . .}

The general grammatical process of


As.t.ādhyāyı̄ (Katre, 1989) can be viewed as F =P ∪M (1)
consisting of the following three basic steps: P ∩M=φ (2)

1. Prescription of the fundamental com- 1.2 Attributes


ponents which constitute the language. The fundamental units of the language are
given an identity by assigning a number of at-
2. Characterization of these fundamen-
tributes to them. The various technical terms
tal components by assigning them a num-
introduced in the grammar come under this
ber of attributes.
category, as also the it -markers and sigla
3. Specification of grammatical opera- or pratyāhāras. For example, the attributes
tions based on the fundamental compo- hrasva, gun.a and ac characterize the element
nents and their attributes. a 0 as short vowel /a/ and attributes like
pratyaya, prathama and ekavacana tell that
1.1 Fundamental Components ti p (= tip 0) is a third-person singular suf-
In his grammar Pān.ini furnishes a number fix. Again, each attribute is assigned a unique
of elements (phonemes/morphemes/lexemes) key in our database.
which constitute the building blocks of the lan- Definition 2 The collection of unique keys
guage. We assign each of them a unique key corresponding to the terms, which characterize
in our database. Thus the phoneme /a/ has a fundamental component, we define as the set
the key a 0, the kr.t suffix /a/ has the key a 3 A of attributes.
Corresponding to the sets P and M we can Corresponding to the two phonemes bh and ū
decompose the set A into two disjoint sets Aπ in bhū, we have an ordered collection of two
and Aµ , Aπ being the set of unique keys of sound sets √1 and √2 . Consider the first one
the attributes to the elements of P and Aµ to √1 : Its first element bh is from the phoneme
elements of M. set P.1 The second element bhū tells that
the phoneme of this sound set is a part of
Aπ = {hrasva 0, udAtta 0, it 0, . . .} the fundamental unit bhū. The third element
Aµ = {dhAtu 0, pratyaya 0, zit 9, . . .} stores the attribute dhātu (verbal root) to this
sound set. Similarly, the second sound set √2
has phoneme attributes which tell it to be an
A = Aπ ∪ Aµ (3)
udātta (high pitched) dı̄rgha (long) ac (vowel).
We note that any two of the four sets Example 2: Similarly, the language com-
P, M, Aπ , Aµ are mutually disjoint. ponent corresponding to the morpheme l at. is:

2 Basic Data Structures ∏ = [√] where


√ = {l, l at., pratyaya, ait, .tit, . . .}
Given the set of fundamental components
(F = P ∪ M) and the set of attributes (A = Attribute .tit says that it has .t as it - marker.
Aπ ∪Aµ ), we now define our data structure for Example 3: The morphemes bhū followed
representing the Pān.inian process. by l at. can now be represented together by the
2.1 Sound Set √ language component
Definition 3 A sound set √ is a collection of ∏ = [√1 , √2 , √3 ] where
elements from sets P, M and A having exactly
√1 = {bh, bhū, dhātu, . . .}
one element from the set P.
√2 = {u, bhū, dhātu, udātta, dı̄rgha, ac, . . .}
√ = {πp , µi , αj |πp ∈ P, µi ∈ M, √3 = {l, l at., pratyaya, ait, .tit, . . .}
αj ∈ A, i, j ≥ 0} (4)
Different linguistic units can now be identified
This is an abstract data structure. Although it by carrying out intersection with the appro-
corresponds to a phoneme or one sound unit, priate subsets of P, M and A. For example
it represents more than just a phoneme. to get the verbal root or dhātu in ∏ we take
the intersection of an identity set ∂ = {dhātu}
2.2 Language Component ∏ with each of √i ’s in ∏ and store the index i
Definition 4 A language component ∏ is an when the intersection-set is not empty. In this
ordered collection of at least one or more sound case we get the index list [1,2]. The list of √i ’s
sets. corresponding to these indices then gives the
searched morpheme. Thus, the verbal root is
∏ = [√0 , √1 , √2 , . . . √n ] such that k∏k > 0 (5) given by the language component [√1 , √2 ].
Convention: We use square brackets [ ] to represent 2.3 Process Strip σ
an ordered collection and curly brackets { } for an un-
Definition 5 A process strip σ is an ordered
ordered collection.
collection of pairs, where the first element of
Language expressions at every level
the pair is the number of a particular grammat-
(phonemes, morphemes, lexemes, words, sen-
ical rule (e.g. rulep ) and the second element
tences) can now be represented as a language
is a language component ∏.
component.
Example 1: We represent the verbal root σ = [(rulep , ∏p ), (ruleq , ∏q ), . . .] (6)
bhū as a language component ∏.
The rule number corresponds to the
∏ = [√1 , √2 ] where As.t.ādhyāyı̄ order and binds the process
√1 = {bh, bhū, dhātu, . . .} 1
Actually, it is the unique key bh 0 corresponding
√2 = {u, bhū, dhātu, udātta, dı̄rgha, ac, . . .} to the phoneme bh which is stored in the sound set.
strip with a function implementing that rule Example 5: Consider the language compo-
in the actual program. Thus, the process strip nent ∏ corresponding to the verbal root bhū.
simulates the Pān.inian process by storing
in the language component ∏p the effect of ∏ = [√1 , √2 ] where
applying the rule rulep . √1 = {bh, bhū, dhātu, . . .}
3 Basic Operations √2 = {u, bhū, dhātu, udātta, dı̄rgha, ac, . . .}

Having defined our data-structure, we now in- Rule vartamāne lat. (3.2.123) says that the
troduce the basic operations on them. morpheme l at. is added after a dhātu if the
present action is to be expressed. To imple-
3.1 Attribute Addition
ment this rule, we first look for the indices of
Let α ⊂ A ∪ M and √ be a sound set. Then sound sets which have the attribute dhātu and
attribute addition is defined as then append the sound set corresponding to
l at. after the last index. We get,
ha√ (√, α) = √ ∪ α (7)

This operation can be applied to a number of ∏ = [√1 , √2 , √3 ] where


sound sets given by indices [i, i + 1, . . . , j] in a √1 = {bh, bhū, dhātu, . . .}
given language component ∏ √2 = {u, bhū, dhātu, udātta, dı̄rgha, ac, . . .}
√3 = {l, l at., pratyaya, ait, .tit, . . .}
ha∏ (∏, α, [i, . . . , j]) = [√1 , . . . , √i ∪ α, . . . ,
√j ∪ α, . . . , √n ] (8) 3.3 Substitution
Example 4: Consider the language compo- We define substitution in terms of the above
nent corresponding to the morpheme śap two operations.
Let [i, i + 1, i + 2, . . . , j] be the indices of
∏ = [√] where sound sets to be replaced in the language com-
√ = {a, śap, pratyaya, śit, pit, . . .} ponent ∏ = [√1 , . . . , √i , √i+1 , . . . , √n ].
Let ∏k = [√1k , √2k , √3k , . . . , √mk ] be the re-
Rule tiṅ śit sārvadhātukam (3.4.113) says placement, then the substitution is defined as
that affixes in the siglum ti ṅ and those hav-
ing ś as it marker are assigned the attribute hs (∏, ∏k , [i, . . . , j]) =
sārvadhātuka. We implement this rule by
hg (ha∏ (∏, {δ}, [i, . . . , j]), ∏k , j) (10)
checking if there are sound sets with attributes
pratyaya together with ti ṅ or śit and adding where δ ∈ A is the attribute which says that
the attribute sārvadhātuka if the condition is this sound set is no more active and has been
fulfilled. In this case, we get: replaced by some other sound set.
∏ = [√] where Example 6: Consider the language compo-
nent corresponding to the verbal root n.ı̄ ñ
√ = {a, śap, pratyaya, śit, pit, sārvadhātuka}

3.2 Augmentation ∏ = [√1 , √2 ] where


√1 = {n., n.ı̄ ñ, dhātu, ñit}
Let
√2 = {i, n.ı̄ ñ, dhātu, ñit, dı̄rgha, ac}
∏ = [√1 , . . . , √i , √i+1 , . . . , √n ]
∏k = [√1k , √2k , √3k , . . . , √mk ] Rule n.ah. nah. (6.1.065) says that the initial
retroflex n. of a dhātu is replaced by dental
and i be an integer index such that i ≤ k∏k, n. To implement this rule we first search
then augmentation of ∏ by ∏k at index i is the sound sets corresponding to dhātu, check
defined as whether the first one has a retroflex n. and if
the conditions are fulfilled, add the attribute
hg (∏, ∏k , i) = [√1 , . . . , √i , √1k , √2k , √3k , . . . , δ in that sound set and append the sound
√mk , √i+1 , . . . , √n ] (9) set corresponding to n after it. Further we
transfer all attributes (except the phoneme at- 5 Example
tributes) from the n. - sound set to n - sound
set for sthānivadbhāva. We get, We take a verbal root bhū and generate the
final word bhavati meaning “he/she/it be-
∏ = [√1 , √2 , √3 ] where comes”. We initialize the process strip σ0 by
loading the language component correspond-
√1 = {n., n.ı̄ ñ, dhātu, ñit, δ}
ing to the verbal root and adding a00000 as
√2 = {n, n.ı̄ ñ, dhātu, ñit} the rule number.
√3 = {i, n.ı̄ ñ, dhātu, ñit, dı̄rgha, ac}
σ0 = [(a00000, ∏0 )] where
4 Grammatical Process ∏0 = [√0a , √0b ] with
4.1 Representing a Rule of Grammar √0a = {bh, bhū, dhātu}
We represent a rule of grammar through a √0b = {u, bhū, dhātu, dı̄rgha, udātta}
function fq , which takes a process strip σp and
adds a new pair (ruleq , ∏q ) to it where ruleq Rule vartamāne lat. (3.2.123) says that the
is the number of the present rule and ∏q is the morpheme l at. is added after a dhātu if the
new modified language component after appli- present action is to be expressed. The applica-
cation of one or more of the three operations tion now involves following steps: Look in the
defined above on the input language compo- last language component ∏ of the process-strip
nent ∏p . σ. If there are sound sets √ with the identity
set ∂ = { dhātu } in it, get their indices in index
fq (σp ) = σq where list. This returns the index list [1,2]. If index
σp = [. . . , (rulep , ∏p )] list is non empty then augment the language
σq = [. . . , (rulep , ∏p ), (ruleq , ∏q )] component ∏0 by attaching the language com-
ponent corresponding to the morpheme l at..
∏q = ha , hg , hs (∏p , . . .)
This is attached in this case at index 2 as the
new morpheme comes after dhātu. Extend the
4.2 Structure of a rule
process strip σ0 accordingly.
The general structure of a rule is as follows:
————————————————— fa32123 (σ0 ) = σ1
Function fq with input strip:
σp = [. . . , (rulep , ∏p )]
————————————————— σ1 = [(a00000, ∏0 ), (a32123, ∏1 )] where
check applicability conditions ∏1 = [√0a , √0b , √1a ] with
if conditions not fulfilled then
√0a = {bh, bhū, dhātu}
return unextended σp
else √0b = {u, bhū, dhātu, dı̄rgha, udātta}
create new modified ∏q √1a = {l, l at., pratyaya, ait, .tit}
return extended σq
————————————————— Rule tip tas jhi sip thas tha mip vas mas
Thus, given a particular state (represented ta ātām jha thās āthām dhvam it. vahi mahiṅ
by σp ) in the process of generation, the system (3.4.078) provides for substitution of l at.. We
provides for checking the applicability of a rule take the first suffix ti p for replacement. The
fq , and if the conditions are fulfilled, the rule is sound sets to be replaced are determined by
applied and the changed language component taking intersection with the set { l at., l it., l ot.,
together with the rule number is stored in the . . . } which has the morphemes having cover
modified state (represented by σq ). term l. In this case it is at the index 3. We
As the rule numbers are also stored, we can replace this sound set with ti p i.e. add the
implement the rules of tripādı̄ and make their attribute δ to the sound set at index 3 and
effects invisible for subsequent applications. augment the language component at this in-
The order in which rules are applied is pro- dex.
vided manually through templates. fa34078 (σ1 ) = σ2
σ2 = [. . . , (a32123, ∏1 ), (a34078, ∏2 )] σ5 = [. . . , (a14013, ∏4 ), (a73084, ∏5 )]
∏2 = [√0a , √0b , √1a , √2a , √2b ] ∏5 = [√0a , √0b , √5a , √3a , √1a , √2a , √2b ]
√0a = {bh, bhū, dhātu} √0a = {bh, bhū, dhātu, aṅga}
√0b = {u, bhū, dhātu, dı̄rgha, udātta} √0b = {u, bhū, dhātu, aṅga, dı̄rgha, udātta, δ}
√1a = {l, l at., pratyaya, ait, .tit, δ, } √5a = {o, bhū, dhātu, aṅga}
√2a = {t, ti p, pratyaya, sārvadhātuka, pit} √3a = {a, śap, pratyaya, hrasva, śit, pit}
√2b = {i, ti p, pratyaya, sārvadhātuka, pit} √1a = {l, l at., pratyaya, ait, .tit, δ}
√2a = {t, ti p, pratyaya, sārvadhātuka, pit}
Rule kartari śap (3.1.068) says that the mor-
√2b = {i, ti p, pratyaya, sārvadhātuka, pit}
pheme śap is added after dhātu but before
sārvadhātuka suffix and denotes agent. Check Rule ec ah. ay av āy āv ah. (6.1.078) says that
if sound set with sārvadhātuka follows one before ac (vowel) e, o, ai, au are respectively
with dhātu. If yes then augment the language replaced by ay, av, āy, āv.
component for śap after dhātu.
fa61078 (σ5 ) = σ6
fa31068 (σ2 ) = σ3
σ6 = [. . . , (a73084, ∏5 ), (a61078, ∏6 )]
σ3 = [. . . , (a34078, ∏2 ), (a31068, ∏3 )] ∏6 = [√0a , √0b , √5a , √6a , √6b , √3a , √1a , √2a , √2b ]
∏3 = [√0a , √0b , √3a , √1a , √2a , √2b ] √0a = {bh, bhū, dhātuaṅga}
√0a = {bh, bhū, dhātu} √0b = {u, bhū, dhātu, aṅga, dı̄rgha, udātta, δ}
√0b = {u, bhū, dhātu, dı̄rgha, udātta} √5a = {o, bhū, dhātu, aṅga, δ}
√3a = {a, śap, pratyaya, hrasva, śit, pit} √6a = {a, av, bhū, dhātu, aṅga, hrasva}
√1a = {l, l at., pratyaya, ait, .tit, δ} √6b = {v, av, bhū, dhātu, aṅga}
√2a = {t, ti p, pratyaya, sārvadhātuka, pit} √3a = {a, śap, pratyaya, hrasva, śit, pit}
√2b = {i, ti p, pratyaya, sārvadhātuka, pit} √1a = {l, l at., pratyaya, ait, .tit, δ}
√2a = {t, ti p, pratyaya, sārvadhātuka, pit}
Rule yasmāt pratyaya vidhis tad ādi pratyaye
aṅgam (1.4.013) makes the part before the suf- √2b = {i, ti p, pratyaya, sārvadhātuka, pit}
fix śap an aṅga with respect to it. Finally we collect all √i s not having a δ, i.e.
which are not already replaced. This gives us
fa14013 (σ3 ) = σ4
the desired form bhavati.

6 PaSSim (Pān.inian Sanskrit


σ4 = [. . . , (a31068, ∏3 ), (a14013, ∏4 )]
Simulator)
∏4 = [√0a , √0b , √3a , √1a , √2a , √2b ]
In the following we give a brief description of
√0a = {bh, bhū, dhātu, aṅga}
PaSSim (Pān.inian Sanskrit Simulator) we are
√0b = {u, bhū, dhātu, aṅga, dı̄rgha, udātta} developing at the University of Heidelberg.2
√3a = {a, śap, pratyaya, hrasva, śit, pit} The program aims towards developing a lex-
√1a = {l, l at., pratyaya, ait, .tit, δ} icon on Pān.inian principles. The user enters
√2a = {t, ti p, pratyaya, sārvadhātuka, pit} an inflected word or pada and the system fur-
nishes a detailed, step by step process of its
√2b = {i, ti p, pratyaya, sārvadhātuka, pit}
generation. It is written in PythonTM and con-
Rule sārvadhātuka ārdhadhātukayoh. sists of the following modules (See Figure 1):
(7.3.084) says that before sārvadhātuka 6.1 Database
or ārdhadhātuka replace the i k vowels by
This module is for inputting, updating, en-
gun.a vowels. As śap is sārvadhātuka, we get
hancing and organizing the primary database
fa73084 (σ4 ) = σ5 2
http://sanskrit.sai.uni-heidelberg.de
Grammar to generate the morpho-syntactic
word forms or padas.
6.4 FSA
This module is for the sake of effecient rep-
resentation of generated words together with
the initializing fundamental component(s) and
list of rule numbers. These are stored as a p-
subsequential transducer (Mohri, 1996).4 The
output string associated with a word, thus pro-
vides the initializing fundamental components
and a list of rules. Grammar applies these rules
one after another and outputs the final as well
as intermediate results.
6.5 Display
This module provides HTML5 / LATEX output.
It outputs the content according to the given
style sheet for conventions regarding script,
color-scheme etc. The phonological, morpho-
Figure 1: PaSSim (Pān.inian Sanskrit Simulator)
logical, syntactical and semantical information
gathered during the process of generation is
rendered in modern terms through a mapping
of fundamental components and attributes.
of Pān.inian attributes corresponding to it.
The organization of database serves the pur-
pose of incorporating static information of
Pān.inian formulations. For example, u ṅ is References
stored with static attributes dhātu, bhvādi, Böhtlingk, Otto von. 1887. Pān.ini’s Grammatik.
anit. and that its second phoneme is it - marker Olms, Hildesheim. Primary source text for our
etc. Thus, the effect of many definition rules database.
of As.t.ādhyāyı̄ are stored in the database. The Dı̄ks.ita, Pus.pā. 2006-07. As..tādhyāyı̄ sahajabodha.
database is in ASCII and each fundamental Vols. 1-4. Pratibha Prakashan, Delhi, India.
component or attribute has a unique key cor-
Katre, Sumitra M. 1989. As..tādhyāyı̄ of Pān.ini.
responding to which is a hash. Motilal Banarsidass, Delhi, India.
6.2 Grammar Mohri, Mehryar. 1996. On some Applications of
Finite-State Automata Theory to Natural Lan-
This is the main module. It contains ab- guage Processing. Journal of Natural Language
stract classes corresponding to SoundSets, Engineering, 2:1-20.
LanguageComponents and ProcessStrips.
Śāstrı̄, Cārudeva. 1971. Vyākaran.acandrodaya.
Further it has a number of functions like Vols. 1-5. Motilal Banarsidass, Delhi, India.
a61065(), which simulate the individual rules
of As.t.ādhyāyı̄. Vasu, Srisa Chandra and Vasu, Vaman Dasa. 1905.
The Siddhānta-Kaumudı̄ of Bhat..tojı̄ Dı̄ks.ita.
Vols. 1-3. Panini Office, Bhuvanesvara As-
6.3 Templates rama, Allahabad, India. Primary source text
This module is to organize the prakriyā. A for prakriyā.
template prescribes the rules in order of ap-
sahajabodha (Dı̄ks.ita, 2006) — which have been very
plicability for a group of primary verbs or beneficial to us.
nominal stems. Templates are specified manu- 4
Sequential transducers can be extended to allow
ally, taking into account the prakriyā texts e.g. a single additional output string (subsequential trans-
ducers) or a finite number p of output strings (p - sub-
Siddhānta-Kaumudı̄ (Vasu, 1905).3 It uses sequential transducers) at final states. These allow one
to deal with the ambiguities in natural language pro-
3
We would like to acknowledge two texts in Hindi — cessing (Mohri, 1996).
5
Vyākaran.acandrodaya (Śāstrı̄, 1971) and As.t.ādhyāyı̄ See: http://sanskrit.sai.uni-heidelberg.de

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy