Edit distance-based search approach for retrieving element-wise prosody/rhymes in Hindi-Urdu poetry

Background : Prosody (rhyming words) is a connatural element of poetry, throughout its reach, across thousands of languages in the world. Since medieval era, the Indic poetry (principally the Hindi/Urdu poetry) has created an impactful ﬂamboyance w.r.t the subjects, styles, and other creative aspects in poetry. Besides the message of heartfelt poetry, we see the Qaﬁya (i.e., rhyming words) is the core element, without which we may not consider anything Hindi/Urdu poetry but merely a piece of writing; alongside it, Radif (i.e., a phrasal suﬃx to qaﬁya) is also considered next to the intrinsic part in Ghazals. In this regard, the contributions of this paper are one–the development of an optimal technique for the prosodic (qaﬁya) suggestions/retrieval in Hindi/Urdu poetry; and two–the qaﬁya suggestions based on the attached subsequent radif. Methods : The work in this paper involves usage of a 13.46 M tokens tri-script corpus of poetry. Instead of phonetic value matching, the proposed methodology employs four diﬀerent Edit Distances (i.e., Levenshtein, Damerau–Levenshtein, Jaro–Winkler, and Hamming distance) as the comparison measures for prosodic suggestions. Findings : The proposed work shows better results in comparison to ‘Qaaﬁya Dictionary’ powered by rekhta.org. Moreover, w.r.t the inter-metric similarity and running time Jaro–Winkler appears to be the most optimal algorithm for the rhyme suggestion, whereas the Levenshtein distance is the laziest technique. Novelty/Applications : This work beneﬁts researchers of Indic natural language processing for lexical look-ups and analysis of creative literature, especially poetry.


Introduction
Throughout the history of the world and literature, the impression of poetry is beholding and profound. We may have lengthy texts and thick volumes, to ponder, on comparison of poetry and other sub-fields in literature, but Maugham (1) closes the debate by rightly https://www.indjst.org/ eulogizing poetry as: "The crown of literature is poetry. It is the end and aim. It is the sublimest activity od the human mind. It is the achievement of beauty" On comparison of poetry and other fields in arts and humanities, we see humans can learn to impart sermons; they can perfect their dancing and singing skills with practice, they can become excellent painters, but the art of saying poetry is divine. What is poetry? Rafi (2) cites Ibn Rachik's definition as " ‫مقفی‬ ‫موزون‬ ‫الکالم‬ ‫ھو‬ ‫و‬ ‫الشعر‬ ‫بالقصد‬ " i.e., poetry is a narration which is balanced, rhyming, and said unintentionally. Though in the aforementioned aspects, intention is arguable, but the other two i.e., balance (w.r.t poetic meters) and prosody/rhyme are quite indispensable. Crowe et al. (3) discussed the importance of prosody as: "…metered language is a double medium, with two systems of effect-a steady music… [and] a sense of a ritualistic occasion. " Al-Beruni (4) maintained that aside from the fact that it (prosody) makes memorizing the poem easier, "the soul yearns for anything that has symmetry and regularity and feels disgusted for that which has not regularity".
On rhyming, Rampuri (5) translates a famous critique made by Avicenna " ‫نہیں‬ ‫شعر‬ ‫نزدیک‬ ‫ہمارے‬ ‫وہ‬ ‫نہیں‬ ٰ ‫مقفی‬ ‫جو‬ " (it is not poetry to us which is not rhyming). Verily, the musicality in poetry increases if it is balanced on the metrical grounds, accompanying the rhythm with properly addressed rhymes. Hence, the prosody is collectively considered as the core which defines the whole of poetry, irrespective of languages, in all genres of poems.
The fields of Computational Linguistics (CL) and Natural Language Processing (NLP) focus on the analysis and development of tools and techniques for the human languages, i.e., often called Natural Languages (NL) (6) . The research in these fields has contributed much phenomenal work in terms of information retrieval (7,8) , language translation/ transliteration (9,10) , text generation (11,12) , and text/document classification (13,14) . However, the contributions, as mentioned earlier, are to name a few examples from the interminable list of NLP research and applications. These researches generally revolve around the data, which broadly engages prose in terms of articles, blogs, stories, customer reviews, and feeds from social-media streams. However, all of these, irrespective of types, lack poeticism, or show least of it. Hence, we maintained that substantial work has now been done in prose, in comparison to a quite small work for poetry.
For the novice and aspiring creative writers, it is very difficult to remember the whole bunch of rhyming words, nor the conventional dictionary lookup helps them manually. This paper brings an interdisciplinary study that embraces the literary studies and computational aspects of dealing with NLs and tries to contribute to the analysis of poetry for building a system that helps to retrieve and suggest appropriate rhymes/prosodies. The work is done for the Indic languages, such as Hindi and/or Urdu, which are morphologically rich (15) , having astounding and vivid contributions in the literature (16)(17)(18) , and being dominant in the list of world languages with the number of native speakers (19) ; however, have shown a very little work in the computational treatment of the text. Hence, for this reason, these languages are, nowadays, termed as resource-poor and scared-source (20,21) . So, to do a little, this paper contributes to the following main points: • Presents an edit-distance based technique for prosody (rhyming words, i.e., qafiya) searching and suggestion. In this regard, four well-known distance metrics have been used, i.e., Levenshtein, Damerau-Levenshtein, Jaro-Winkler, and Hamming distance. • The resulting list of prosodies is compared in a pair-wise manner to evaluate the inter-similarity of the edit-distances. • Further, for a thematic prosody suggestion-i.e., in the case of ghazals-radif-based qafiya suggestion, a probabilistic approach based on language modeling technique is also presented. • Lastly, a running time analysis is made for both of these techniques.
The rest of the paper is organized as the §2 describes the historical background of Hindi/Urdu as language, the influence of other languages on it, mutual connection, and bonding therein, a brief survey related to the poetry system, the importance of prosody in Indic poetry. §3 shares the details of data and proposed methodology, followed by a discussion on results in §4, and conclusions in §5 in the end.

Urdu and Hindi as a language
Hindi and Urdu are the most prominent languages in the Indic branch of the language family. On colloquial grounds (22) , both languages are mutually intelligible (23) ; hence can be considered as the same language under the name of 'Hindustani' or 'Hindi-Urdu' (henceforth HU) (24) . Historically, Urdu is evolved in medieval India as a creole language (20) , with the influence of many languages: preliminary Persian, Arabic, and Turkic languages (15) ; and lately, under the colonial rule, English https://www.indjst.org/ and Portuguese (25) . Today, the Urdu and Hindi languages, respectively, are the Persianised and Sanskritised registers of the Hindustani language (24,26) . The quantified similarity between the two languages is found at a greater extent; both on partsof-speech wise and phonetic/articulatory features-based lexical similarity (27,28) . The Hindi-Urdu is a victim of digraphia, such as Urdu is supposed to be written with the modified Perso-Arabic, whereas Hindi is written in Devanagari scripts. Together, both languages appear at the 3 rd place, as most spoken languages of the world, with 329.1M native speakers and 697.4M total speakers (19) .

Poetry and prosody in Hindi-Urdu: A brief survey
Likewise the tradition in various languages, Hindi-Urdu has the compendium of the meter (5,29) . The development of the Hindi-Urdu poetry system can be taken into account under the influence of ancient Indian languages, i.e., Vedic and Panini-Sanskrit, and Prakrits. These languages spanned over 2500 years, mainly divided into two ages, i.e., Old Indo-Aryan ca. 1500-300 BCE; and Middle Indo-Aryan (MIA) ca. 300 BCE-1500 CE (30) . Further, all varieties of MIA languages that evolved from Sanskrit are subsumed under the term Prakrit (31) . Alongside the Sanskrit, Persian and Arabic languages, have played the most instrumental role in the development of the poetry system in the 18−20 th century (32,33) . We suspect that it would become an extensive debate on the comparison of Indic and non-Indic poetry symmetries and characteristics. Thus, we just summarized the points that are essential w.r.t the rhythmic structure and meter, and broadly brought into the practice of Hindi-Urdu poetry since medieval era (the relevant details of ancient Indic prosody-aligned with the Vedic and Paninian Sanskrit-are given in footnotes 1 ). We maintain a conclusion by accounting the rule of Ghazal 2 ([ɣəzəl], ‫غزل‬ / ग़ज़ल) and relevant Perso-Arabic genres of poetry in the medieval and post-medieval era in India. Also, under the influence of Perso-Arabic culture, classical Hindi-Urdu poetry principally followed Arabic poetry meters (2,5) .
Along with the meter, prosody in Hindi-Urdu poetry includes Qafiya ( ‫قافیہ‬ / क़ा फ़या; pl. Qawafee [qəvɑ:fi:] ‫قوافی‬ / क़वाफ़ ) that are the rhyming words; and Radif ([rəd̪ i:f], ‫ردیف‬ / रद फ ) which is the phrasal suffix to qafiya are intrinsic parts of the ghazal. However, the qafiya equally qualifies the concept of rhyming words, tukant तु काां त [t ̪ ʊ.käːn̪ t ̪ ], in Sanskrit or Hindi poetry, which heavily draws on Sanskrit vocabulary. Similarly, radif equivalently corresponds to the charnant चरणाां त [t͡ ʃə.ɾə ɳ̃äːn̪ t ̪ ] in the same context. As per rule, the radif (preceded by the qafiya) has to appear at the end of every second hemstitch of ghazal (5,37) . The radif in any ghazal brings focus to the thematic mood of poetry and sets the philosophical disposition. Furthermore, the radif not only entices the alluring poetic characteristic but also brings challenging criteria (for poets) to end every couplet in the same contextual ambience. On an additional note, the usage of radif is found very common in Hindi-Urdu poetry and often observed in Persian and Turkic poetry. However, the usage of radif is absent in Arabic poetry.
It should be kept in mind that the Urdu language has the quality of syntactical construction of two words through the Iẓāfats; which are used exclusively in Hindi-Urdu poetry. This Perso-Arabic stylistic construction of words replaces the post-position ‫کا‬ / का [kɑ:] and its different derived morphologies ( ‫کے‬ / के [kae:]; and ‫کی‬ / कक [ki:]) through reordering the surrounding words to bring rhythmic beauty in text/poetry. While the qafiya/rhyme will relate to the final word of the (syntactically constructed) compound word. The reordering is performed on two (or more) words in the following three manners (though the following text appears complex but in reality, it is per se sublime characteristics of Hindi-Urdu poetry): 1. Zer-e-Izafat (ZI): Appends a diacritic symbol 'ِ (zer). For example., ‫امید‬ ‫کی‬ ‫حر‬ ‫س‬ / सहर क उ मीद (/sahar ki umīd/; [literal] 'hope of dawn') will be rewritten as ِ ‫ّید‬ ‫ام‬ ‫حر‬ ‫س‬ / उ मीद-ए सहर /umīd-e-sahar/ by appending zer at the end of first word ‫امید‬ . The transliteration systems for romanizing Urdu-Hindi use ZI in two different manners, as we see '-i' and '-e' (38) , we count both of them correct and valid for interchangeable use. For Devanagari script, we can use the vowel matra ए or its modified inherent form . ZI also relates adjectives to the surrounding nouns; however, adjectives (with ZI) come after nouns. For e.g.; ‫سیاہ‬ ٰ ‫زلف‬ / ज़ु फ-ए सयाह/zulf-e-siyah/ (black tresses); ‫نم‬ ٰ ‫چشم‬ / चँम-ए म /chashm-e-num/ (damp eye). ; shown with the blue colour). Thus, we can observe the intent of the poet throughout the ghazal that he is saying to look or observe about different topics in each couplet. Table 1. Prosody limned in a famous ghazal said by 'Iqbal' (40) . Red and blue colors mark qafiya and radif respectively. The English translation is excerpted from 'Khalil' (41) (40).

Hindi (Scripted in Devanagari)
Romanized Transliteration The radif in ghazal is of a single word; however, radif with over single word do exist in Hindi-Urdu poetry. For better comprehension, consider a line from the famous ghazal of 'Momin' (42) , where the text, likewise before, are respectively in red and blue colour for qafiya/rhyming word and radif.

Related Work and Research Gap
The computational treatment of the poetic prosody for the Hindi-Urdu language showed an inconsequential contribution. In other ways, speech prosody has got comparatively more focus than the poetic prosody. We assume that the coverage of speech prosody is extraneous for this paper; however, the most recent work on the phonological and phonetic aspects in the Urdu are contributed by Hussain et al. (43,44) and Jabeen et al. (45)(46)(47) etc. The work on the phonological aspects of Urdu poetry by comparing it to moraic weight was pursued by Hussain (48) . It also evaluates the possibility of mapping Urdu poetry on the rules that mask Urdu prose. In the same regard, there exist many detailed texts, (e.g., Rampuri (5) , Pritchett and Khaliq (49) , Abidi (29) , and Rafi (2) etc.) for the comprehending Urdu poetry, prosody, and metrical rules; however, they suffuse the linguistically-theoretical aspects of the analysis, yet the computational approaches are missing.
The computational treatment of Urdu poetry is mainly offered at two forums: Aruuz.com (50) and Rekhta.org (51) . Both of these are web-based solutions. The following two subsequent paragraphs are discussing both of these forums respectively.
Aruuz (50) offers the metrical tagging of Urdu poems; which to mean technically is the splitting of a couplet for syntactic parsing as per the grammatical rules/meters. So, on inputting (at max.) a total of 4 hemstitches in Nastalique script Aruuz finds the closest meter of the couplet and tags words therein accordingly with the metrical units. Since version 2.0, Aruuz started mentioning the fluency score of the couplet. Figure 1 shows the output of Aruuz (upon inputting 2 opening couplets of a ghazal penned by veteran Indian lyricist Javed Akhter). Aruuz describes the ghazal is said to be written in 'Beher-e-Hindi' with a fluency score of 8. Other than the aforementioned task, Aruuz also suggests similar works that are composed in the same meter.
Rekhta (51) is making a tremendous effort by maintaining the repository of Hindi-Urdu literature, which covers poetry in almost all genres and preservation of classical Urdu text in the form of e-books. It also offers an online lookup tool, namely, 'Qafiya Dictionary' (QD) for qafiya searching. On giving a word, Rekhta shows the rhyming words in terms of Exact and Close categories. However, the technique applied for searching and retrieval of rhyming words is unknown.
Besides the two of the aforementioned web-forums, we find no work done particularly for retrieval of rhyming words. Thus, we assume the work presented in this paper is novel per se. Forbye it, authors humbly maintained that the proposed methodology is bit straightforward and produces a baseline result, which at this moment cannot be compared with any of the previous work. Though, we have tried to compare our results with QD and found that not only QD lacks the entry of many words but also it lags behinds to show appropriate rhyming words.

Data
In recent times, the use of the resources available on the Internet has become a common practice. Following the same fashion, the dataset for this research work is also prepared from the websites designed for Urdu poetry. We opt Rekhta as the primary source for data scraping and further the construction of parallel corpus. As mentioned in §2.3, the website is not only a leading repository of Urdu poetry; it also offers poetry in various scripts to help the readers of Urdu poetry in Nastalique/modified Perso-Arabic; Devanagari and Roman Urdu scripts. We would also like to mention that the scraper is built with BeautifulSoup-4 (52) , which is a well-known Python package for this sort of task. A consolidated overview of the scraped data used in this research work is listed in table 2. Including ghazals, it shows a variety of all available poetry genres at Rekhta, such as nazam, dohe, qita, marsiya, mukhammas, manqabat, masnavi, naat, qasida, sehra, salaam, and rubai. Since, not every genre of poetry has a set of paired hemstitch (i.e., a couple), therefore, the count of hemstitches is reported. The statistics for the compounds are calculated with the list of tri-grams; such that the whole corpus is processed to workout list of n-grams (for word and characters both n ∈ 1 · · · 4; (for Hindi) and 'e' (for Roman-Urdu) to form a ZI with surrounding tokens. Similarly, through the list of bi-grams of Urdu sub-set, we count every item where the first entity ends with the diacritic 'zer' . Mathematically, suppose nW g is the list of word n-grams for language g; then, https://www.indjst.org/ The counting for WI undergoes the same process for Urdu, Hindi and Roman Urdu. The second entity in the tri-gram list is checked for 'ओ' (for Hindi), 'o' (for Roman-Urdu), and for Urdu, we use the list of tri-grams where the second entity is ‫'و'‬ These alphabets can be substituted in the criteria mentioned in equation 1 with respective languages to get the count of WI. Moreover, for Hindi and Roman-Urdu, the number of compounds having both ZI and WI are counted where the entity at the odd indices of every tri-gram is the alphabet that delimits WI or ZI. Counting compounds having both ZI and WI for Urdu in Nastalique script is a complex task; authors find two ways of dealing with it: a) using tri-grams such that the last character of the first entity should be zer and the last entity to be ‫'و'‬ , 'and b) first entity should be ‫'و'‬ followed by the last character to be zer in the second entity of tri-gram. Another insight into the dataset w.r.t combination of final character n-grams that the dataset seldom shows irregularities, such as after processing, we saw few non-Perso-Arabic alphabets, numbers, and symbols along with Perso-Arabic alphabets. However, n-grams with these irregularities are removed before employing the proposed methodology.
While scraping poems from Rekhta.org, we saved every poem P, with all three scripts (i.e., Nastalique, Devanagari, and Roman-Urdu) as a dictionary in the separate JSON files. Thus, let H U , H R , and H D , be the list of hemstitches in P respectively corresponding to the Nastalique, Romanized, and Devanagari scripts; hence more formally The whole corpus C is a collection of poems such that C ← {P 1 , · · · , P n } , where n is the total number of poems. For C, algorithm 1 shows the preprocessing steps in which we worked out a dictionary I such as the character n-gram is the key against which a set of such words is retained that end on the respective character n-gram.
Step 7 and 8 in algorithm 1 show the work with the Urdu vocabulary. However, in practice, these steps were also executed for the Devanagari and Roman-Urdu subsets as well.

Proposed methodology
This section is particularly divided into 2 subsections. Each one is dedicatedly describing the proposed methodology such that in §3.2.1 and §3.2.2 simple qafiya and radif-based qafiya searching is presented.

Top-t Qafiya Search (QS)
After the preprocessing, retrieval of the top-t relevant words is made. In result, a list of qawafee, sorted in the ascending order of distances is brought. For calculating distance between two words, we used widely employed distance metrics such as Hamming (Ham.) (53) , Levenshtein (Lev.) (54) , Damerau-Levenshtein (D-L) (55) , and Jaro-Winkler (J-W) (56) distance. Since these algorithms are quite well-known and well understood, therefore, we skip the technical details. Algorithm 2 shows the procedure for QS. In step 3 and 4, the symbol \ indicates set minus operation. In steps 5−7, one of the aforementioned metrics is shown with function ∆(c, W) where C and W are the candidate and target words. In line 11, the set operator ⌢ https://www.indjst.org/ denotes the sequential concatenating of two tuples, for example, suppose λ 1 = ⟨x 1 , x 2 , x 3 ⟩ and λ 2 = ⟨y 1 , y 3 , y 2 ⟩ then λ ⌢ 1 λ 2 = ⟨x 1 , x 2 , x 3 , y 1 , y 3 , y 2 ⟩ .

Radif-based Qafiya Search (RBQS)
Authors exert a widely employed method for next word prediction through language modeling (57) . We calculate the chance for next word (w i ) to appear via conditional probability( P) with its previous words (P is expandable through chain-rule, as shown in equation 4).
Since, for radif, there can be too many combinations of words (depending on its size, i.e., number of tokens therein), therefore, we remodel P as P ′ with Markov assumption, and limit the sequence of words to n-grams (see equation 5). Since qafiya appears before radif; and we are more interested into finding suitable qawafee w.r.t a target qafiya Q on provided radif R , thus, use of bi-grams in Markov Model (MM) would make sense (57) , such that the probability for candidate qafiya is calculated given the first token in R.
https://www.indjst.org/ Equation 6 renders MM for bi-gram with add-1 smoothed scoring function Φ(·) which takes a candidate qafiya c, first token r in radif R , and dictionaries of uni-and bi-grams D1, D2; while algorithm 3 shows the overall procedure for RBQS.

Proposed methodology vs Rekhta org
The presented work appeared to be unique for both methodology and results. The proposed method is thoroughly tested with various words and compared with the Qaafiya Dictionary (QD) powered by Rekhta. We found that QD often fails to produce good results when the target word is bigger in length. To a higher degree of surprise, a lot of common words are absent in the QD. Though the proposed method is thoroughly tested and the entire lists of retrieved rhyming words were manually examined but to keep the space top-3 rhyming words are reported for each distance metric.
To show the comparison between the QD and proposed work we have selected target word ‫سمجھے‬ /समझे [smd3 ʱe:] (verb; 'understand'). QD responded with no results in their Exact category, while the results in the category Close are not enough invigorating for prosody 3 . Figure 2a and 2b show the output of the QD for Exact and Close categories respectively for the given word. At times it appears that QD begins to produce derivational morphology of a word w.r.t future tense or to use the honorific style communication, for instance, pahnaaoge ([literal] 'make dressed') (see figure 2b, the first word under section '2222'), thus we cannot count it to be a better rhyming word. In comparison to QD, the method proposed in this work retrieves better words for rhyming as shown in table 3.

Agreement between distance metrics
We are also interested in the quantified answer to show the inter-agreement or inter-similarity between the word lists produced by different metrics. Hence, the two ordered sets of rhyming words (A and B) produced with different edit-distances are tested in a pair-wise fashion. Since the similarity between any two metrics is symmetrical, i.e., have shown only the lower triangle in table 4. We can see (in table 4) the least similarity is found between J-W and D-L distances i.e., .1% and in contrast, D-L and Levenshtein distances perform roughly similar in comparison to the rest of distance metrics. However, the rest of all numeric figures (except diagonal) in table 4 are significantly low which to mean intuitively reflects word lists possessing a variety of different words.
Consider table 4 (the left portion, before the vertical bar) as a square matrix, namely D . Thus, from the matrix averaged similarity of a distance metric (D[·]) is calculated via equation 8. Forbye it, we have ensured to exclude the value of the metric itself (values at diagonal) in the computation.
Where n is the number of metrics, i is the index of the corresponding metric, c and r represents the row and column incidences.

Radif-based Qafiya Search
The results for radif-based qafiya search are also positive. Results for the RBQS are shown in table 5, where (in row 1) we can see the suggested rhymes based on single word radif and similarly rhymes for the multi-word radif (in row 2). We notice that not only the rhymes are appropriate but also the intrinsic sense of formed phrases (radif+qafiya) are semantically correct. This is due to the employment of language modeling technique that (outside the supervised learning techniques) is very cursory in next word predictions; hence, the technique further elicits candidate rhymes according to the probability of its usage w.r.t the previous words available in the corpus.

Running time analysis of distance metrics
The overall time taken in preprocessing datasets and producing dictionaries does not matter, but for the sake of reporting, the yielded figures are-on a machine with a configuration of core i7 and Ubuntu as the operating system-is ≈36 minutes. Thus, rather making discussion on the preprocessing time, we share insights into the running time elapsed by distance metrics we performed a full-throttle experiment; which involves retrieval of top 1500 rhyming words against 1000 most frequent and https://www.indjst.org/ distinct words (excluding stopwords) in the poetry corpus. Keeping the symmetric behavior of distance metric in mind, we have implemented two specific checks to ensure inputting words are not the same; and the distance for the same pair of words is not re-calculated.  Figure 3 shows the overall running time for all distance metrics as per the system limits, experiment settings, and criterion defined above. On the left subplot, the overall running time is shown whereas the right subplot shows the running time of a single word pair. Thus, for (1000 × 1500 =) 1.5 M comparison, we can see the most robust distance metric is Jaro-Winkler followed by Hamming distance. The laziest metric to calculate differences among 1.5 M distinct pairs of words is vanilla Levenshtein distance. The running time for the single word pair appears to be the same but in proportions of microseconds.

Conclusion and Future Work
A good assembly of rhyming words is not only core of the poetry but also adds poetic flavor to the prose. Regular expressionbased methods can guarantee you the quick lookups but the retrieved results are not ranked. In contrast, this paper effectively shows the utility of the n-gram suffixes and edit-distance metrics for the retrieval of ranked-rhyme suggestions. Among all distance metrics, Jaro-Winkler distance is found to be the most favorable metric for rhyme suggestions in terms of the running time and the variety of rhymes. The work is done for Hindi/Urdu poetry by exploiting the tri-script poetry corpus. One natural question arises that instead of the poetry corpus, why did not a monolingual prose corpus is inducted? Hence, to answer the query, authors maintain that though the monolingual prose corpus will share more variety of words but on the same note poetic quality in n-grams diminish. Thus, we recommend enriching the poetry corpus instead of exploiting prose corpus.
We have noticed that for Hindi, character n-grams should be used where In the future, the existing work can be appraised by considering phonetic value comparison; specifically for standard Urdu, where under human cognition it is very easy to pronounce words with or without diacritics (harakats); however, in practice words may sound differently even they end on the same final character n-gram. Another potential future work can be the retrieval of suggestions w.r.t the metrical weights of the Hindi/Urdu poetry system.