Hypertext Sanskrit Tools I

Introduction to lexicon, stems and surface morphology

We shall start with an invocation of Gaṇeśa.


Clicking on the above link invokes our Sanskrit segmenter, that returns the padapāṭha of our Gaṇeśa invocation. We now have two Web windows: the present one, which is a digest of the course transcript, and a new window, where I can demonstrate the tools progressively. In this new window, you see first the input sentence, in blue devanāgarī script, then a graphic is shown where the sentence is written again, now decomposed as a stream of phonemes represented in the IAST alphabet. This horizontal line is then analysed as a sequence of padas represented as colored rectangles. These padas are the steps that we follow when we read the sentence pade pade as a list of words.

We see that the tool has segmented the continuous sentence नमोगणेशायविघ्नेश्वराय into a sequence of 3 padas: namaḥ, gaṇeśāya, and vighneśvarāya, that is, in devanāgarī: नमः, गणेशाय and विघ्नेश्वराय. However, the sentence is NOT the concatenation of the three padas. The end "aḥ" of namaḥ, in contact with the beginning g of gaṇeśāya, has turned into "o". This phenomenon is called sandhi or junction, and segmentation is called sandhi-viccheda or undoing sandhi at word junction. In this particular case, the sandhi rule may be expressed as a rewrite rule on phonemic streams : ⟨aḥ|g → og⟩. More generally, the rule is that final aḥ turns into o in front of a voiced consonant, but turns into as in front of a mute one (think of namaste).

We shall come back to sandhi later, but first we show basics of our user interface.

First we notice, on our invocation window, that the color rectangles have internal structure, accessible by clicking on them. Thus blue gaṇeśāya, when clicked, reveals that it is a form of gaṇeśa, namely

[gaṇeśa]{m. sg. dat.}
where the stem gaṇeśa bears a link to its meaning in the dictionary, and the tag {m. sg. dat.} indicates its declension (vibhakti) in the masculine (पुंस्), singular (एकवचन), and dative (चतुर्थी). The dative case is the correct vibhakti, which may be used to represent the semantic role (कारक) of giving (सम्प्रदान). It is through this vibhakti that we can discriminate the role of the blue pada as expressing a gift "to Gaṇeśa". Consistently with our homage namaḥ, which requires a donatory expressed in the dative.

Now we click on the link gaṇeśa in order to find its dictionary definition. And we show the active link to its declension m. where we check the dative singular form. We also notice our invocation, in unsandhied form.

Actually, there is also a possibility to reach the Sanskrit Heritage dictionary, which has a richer hypertext structure, as we show by demonstrating the Index tool. And now the quotation is an active link to the corpus repository.

Let us complete our understanding of this Gaṇeśa invocation by discussing attribute vighneśvarāya, dative form of vighneśvara, i.e. vighna-īśvara the "Lord of obstacles". So we are praying to Gaṇeśa to remove obstacles to our understanding of Sanskrit. Remember. When Vyāsa recited Mahābhārata and Gaṇeśa was the scribe, Vyāsa composed tricky verses with knots (granthi) to slow him down. In a similar way, we are hoping that our software (gaṇakayantra) will remove the complex sandhi knots that are hampering our comprehension.

So here we have a mystery: how come Monier-Williams was invoked in the beginning of my presentation, instead of the Heritage dictionary. So now I show the various entry points to the tools.

If you access https://sanskrit.inria.fr, you see (fr) indicating that it accesses by default the Sanskrit-to-French Heritage dictionary. Whereas if you access https://sanskrit.inria.fr/index.en.html, you see (en) indicating that it accesses by default the Sanskrit-to-English Monier-Williams dictionary. The tools are the same, it is just this parameter that is remembered throughout the session, but this parameter is reset every time you click explicitly a "Lexicon access" field in one of the menus. I now show an example on switching from English to French.

Let me profit of it to discuss the nature of our tools. They are Web services, which does not mean that they store a bunch of Web pages that you may inspect with your Web navigator, like a hypertext database. They are rather processes that span Internet and invoke Web servers that dynamically deliver data to you formated as HTML pages. I demonstrate it with the site counter.

Here are a few more useful URLs:
The present lesson is at : https://sanskrit.inria.fr/COURSE/Lessons/HST_1.html.
You may access a mirror of the Saṃsādhanī site at :
You may find the Reference manual of the Heritage tools at this site, in English, at : https://sanskrit.inria.fr/manual.html.
But I give this link for reference rather than for suggesting you to read it now, since I shall explain its facilities during the course in a different order.
Finally I show here how to access the Table of transliteration schemes

Now we quickly demonstrate the Grammar and Stemmer tool. We point out that the input to our various tools allows multiple notations, while the output font may be set to IAST romanization or to devanāgarī.

Returning to our invocation, we see that gaṇeśāya and īśvarāya are blue rectangles, whereas namaḥ is mauve and vighna is yellow. Clicking on namaḥ shows that it is an indeclinable pada, and the lexicon entry shows that its stem, namas, may be used as a neuter noun, meaning homage, as well as a preposition, in the sense of "glory to", when associated with a pada in the dative case. This dual nature may be demonstrated on the Stemmer. Thus we understand that nominal padas are represented as blue segments when declined from a stem having a semantic contents, and as a mauve one when being a contentless word used as an invariant grammatical tool, in various roles (adverb, preposition, conjunction, discourse particle, etc.) Compound words like vighneśvarāya are represented as a yellow initial stem segment followed by a blue inflected word.

Here we shall have a little grammatical diversion, in order to understand the difference between the stem gaṇeśa and the pada gaṇeśaḥ. Both are "words", but the word "word" is ambiguous. We should distinguish words as their appear in the sentence (pada) from their root stems that may be looked up in the dictionary, called in Sanskrit prātipadika. These stems carry the denotation of the universal notion they are naming. They are subject to vibhakti in order to be declined as the instance of this notion decorated by morphological parameters that indicate their role in a sentence: gender, number and case for nominal padas.

This distinction is hardly visible in English, where gender has been relegated to pronouns, and case is expressed by syntactic means with the help of prepositions. Thus for instance "cat" has only two variations: plural "cats" and genitive "cat's", which furthermore are homophones! So as laymen we use the word "word" for the two notions, without even thinking of the ambiguity, if we are not professional linguists. In Sanskrit, morphology is taken seriously. There are three genders, three numbers, and eight cases (vibhakti), giving potentially 72 pada variations for a given stem. These variations are essential to be mastered, since their understanding is crucial for the understanding of sentences where they appear.

Sanskrit is not just any old language. It is a scientific medium of communication, refined from a North Indian vernacular spoken at the time by professional grammarians such as Pāṇini (25 centuries ago). So learning how to decipher Sanskrit, either oral or written, is a serious endeavour. In Pañcatantra, it is stated: It takes 12 years to learn grammar. In the West, Sanskrit is still taught with 19th century methods, using bulky grammars with page after page of declension and conjugation tables that you are supposed to learn by heart. This is crazy. Now we have efficient software, built on Pāṇinian principles, that allows you to delay acquisition of surface details, and access texts directly in their interesting structure. The frequent paradigms you will catch progressively, by reading texts (and furthermore reading them aloud, this is important).

Thus don't be afraid of the complexity of Sanskrit grammar. Using our tools, it will take you 12 months to learn Sanskrit, instead of the 12 years required by the traditional methods using massive memorization.

We mentioned above the Sanskrit name for nominal stem : prātipadika. This looks like a complicated term. Let us look in our lexicon. Please note the etymology of this entry: [*pratipada-ika]. This notation explains the derivation of the adjective prātipadika as affixing suffix "-ika" to a simpler stem pratipada "matured" to the vṛddhi phonetic level (this is the meaning of the * notation, that in this case lengthens initial short vowel a to ā). By clicking on this inner stem we find that it is obtained by prefixing the word pada by the proposition prati (against). So prātipadika means litterally "relevant to the pada analysis". Here we learn 2 things: Firstly, inner morphology gives you the etymological meaning of a word as a compositional process. By learning two generic morphological composition rules, you see that it was enough to know pada in order to understand prātipadika. So this means you should not be afraid of seemingly complex words, that may be analyzed rather than memorized. Secondly, it indicates that the ancient grammarians placed the pada notion as prior to the stem notion. The stem is an analysis of the word in the context of a sentence, which is a meaningful linguistic unit.

Now that we have completed this quick survey of various morphological tools, let us return at our Reader interface. First we must explain the colors of the segment rectangles. We saw in our Gaṇeśa incantation three colors, blue for nominal padas (nouns and adjectives), mauve for indeclinable ones (grammatical tool words) and yellow for stems as initial segments of compounds. In the grammatical example above, we saw a red pada śrūyate, which by clicking reveals its verbal nature: [śru]{pr. ps. sg. 3}, meaning passive form of root śru, conjugated in the present tense (vartamāna) in the singular third person (prathama).

Aside : compare the conjugation tables of śru Western style and Indian style. Not in the same order of persons ! This is also the occasion to contemplate the richness of the verbal morphology. Again, there is no reason to try to memorize all those forms. They will be recognized by the reader software when they occur in sentences, and progressively you will memorize the frequently occurring ones. Many of these forms, like śuśuśrūṣuṣīṣu, are unlikely to occur in Sanskrit texts, but occasionally a poet may indulge into coining such rarities for their alliteration esthetics. And not only you do not have to memorize these forms, but even my computer does not memorize them. The morphological generators, of the declension and conjugation services, do not look-up the forms in some precomputed database. The forms are generated on the fly, for every query, using the internal Pāṇinian glueing processes.

We return to our grammatical subhāṣita. One color remains mysterious, the yellow of dvā. dvā is not a pada, it is only the pūrvapada (iic.) of the samāsa (compound) dvā-daśabhiḥ. Thus here this sentence has only 4 padas, even though it is shown in 5 segments. Yellow is the color of stems (prātipadika), since in general a samāsa of two padas (P1,P2) is formed by sandhi-glueing the (prātipadika) of P1, obtained by erasing (lopa) its vibhakti, with pada P2.

We have seen 4 colors of boxes so far. Let us survey all the colors that may appear in the analysis of a text. These colors are properly the parts of speech of Sanskrit. We thus turn to Hoisting the colors of Sanskrit in the Corpus repository.

Here I comment the corresponding mini-corpus, which presents the parts-of-speech categories of Sanskrit, and is an introduction to morphology.

N.B. Pāṇini says (I,4,14): सुप्तिङन्तम् पदम्, i.e. sup-tiṅ-antam padam: A pada is ended in sup or in tiṅ. In this formulation, sup is a condensed (pratyāhāra) formulation for the declension suffixes, and tiṅ similarly designates the conjugation suffixes. In our color codes, red is tiṅ (verbal forms) and blue is sup (nominal forms). Thus nominal forms and verbal forms are two modes of formation of padas. However, given a word enunciation (śabda) out of its context of use in a text, it does not make sense to ask whether it is a subanta or a tiṅanta, since actually it may be both, e.g. gacchati. Although such ambiguities are not frequent, they may occur. That is, a pada is not just a piece of speech, it is a structured entity, summarized in its tag.

After this short overview of some of the platform tools, we are now ready to start reading a simple text in the so-called corpus mode of our software. So please continue reading at the Crash course on reading Sanskrit corpus. When you are done with this little Vikramacarita story, you may turn to the second lesson which will discuss sandhi and will introduce to the use of our interactive Sanskrit reader.

© Gérard Huet 2023