-
-
Notifications
You must be signed in to change notification settings - Fork 16
Inflection 85: Updating Malayalam grammar.xml and following files #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
deonajulary06
wants to merge
37
commits into
main
Choose a base branch
from
update-new-language-doc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
e359163
Add Malayalam dictionary support
deonajulary06 a109f4e
Add Malayalam language support in LocaleUtils.hpp
deonajulary06 9fd9e01
Add Malayalam locale support to LocaleUtils
deonajulary06 72e7c28
Add Malayalam language to the tests
deonajulary06 95b483d
Add Malayalam locale group ml_IN
deonajulary06 681b568
ADD: Malayalam tokenizer configuration file
deonajulary06 bfa7ae6
Inflection-85: Add Git LFS config for Malayalam dictionary and XML files
deonajulary06 5aa7ec4
Add Malayalam inflection and pronoun tests
deonajulary06 f20350f
Updated copyright line
deonajulary06 9cc0a44
Updated copyright message
deonajulary06 4bb9964
Updated copright message
deonajulary06 a46c301
Updated language grammar to include Malayalam
deonajulary06 df82a8b
Added pronouns for Malayalam
deonajulary06 8fcfb16
Add ll GrammarSynthesizer files
deonajulary06 0c50978
Add Malayalam grammar synthesizer
deonajulary06 9523646
Add Malayalam-specific CommonConceptFactory with lists and quantities
deonajulary06 a8a7f2d
Update document on how to add a new language, fixed errors
deonajulary06 e60ea6f
Updated grammar.xml for Malayalam
deonajulary06 80af5bf
Update pronoun_ml.csv
deonajulary06 a70ad5e
Updated all grammar synthesizer component for Malayalam
deonajulary06 6a191b6
Update Common Concept Factory files
deonajulary06 421d0e4
Updated tests for Malayalam
deonajulary06 11d35cc
Fix Malayalam grammar synthesis and remove count lookup function
deonajulary06 de22098
Updated Grammeme Constants files to include sociative case
deonajulary06 367f745
Temporary fix for GitHub
deonajulary06 4786b2d
Modified files to fix more test errors
deonajulary06 f2b59c6
Update files to fix errors
deonajulary06 b7528a8
Same file as before but with corrected indentations
deonajulary06 300383e
Update test files to reflect tokenization
deonajulary06 fd0be77
Added tokenizer files
deonajulary06 fa63e94
Added feedback on how to add a new language
deonajulary06 8e7cf94
Modified files
deonajulary06 f72a7ed
New changes to fix errors
deonajulary06 513e79e
Updated Grammar Synthesizer file
deonajulary06 a4a0ec1
Fixed errors
deonajulary06 b4f8f51
Made changes to fix errors
deonajulary06 66fc08e
Fix Common Concept Factory
deonajulary06 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
748,744 changes: 748,744 additions & 0 deletions
748,744
inflection/resources/org/unicode/inflection/dictionary/dictionary_ml.lst
Large diffs are not rendered by default.
Oops, something went wrong.
7,716 changes: 7,716 additions & 0 deletions
7,716
inflection/resources/org/unicode/inflection/dictionary/inflectional_ml.xml
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
83 changes: 83 additions & 0 deletions
83
inflection/resources/org/unicode/inflection/inflection/pronoun_ml.csv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
അവൻ,third,singular,nominative,masculine,personal,distal | ||
എനിക്ക്,first,singular,dative,personal | ||
ഞാൻ,first,singular,nominative,exclusive,personal | ||
എന്നെ,first,singular,accusative,exclusive,personal | ||
എന്റെ,first,singular,genitive,determination=dependent,exclusive,personal | ||
എന്റേത്,first,singular,genitive,determination=independent,exclusive,personal | ||
നമ്മെ,first,plural,accusative,inclusive,personal | ||
നമുക്ക്,first,plural,dative,inclusive,personal | ||
ഞങ്ങൾ,first,plural,nominative,exclusive,personal | ||
ഞങ്ങളെ,first,plural,accusative,exclusive,personal | ||
ഞങ്ങൾക്ക്,first,plural,dative,exclusive,personal | ||
ഞങ്ങളുടെ,first,plural,genitive,exclusive,determination=dependent,personal | ||
ഞങ്ങളുടേത്,first,plural,genitive,exclusive,determination=independent,personal | ||
നമ്മുടെ,first,plural,genitive,inclusive,determination=dependent,personal | ||
നമ്മുടേതു്,first,plural,genitive,inclusive,determination=independent,personal | ||
നിനക്ക്,second,singular,dative,informal,personal | ||
നീ,second,singular,nominative,informal,personal | ||
നിനെ,second,singular,accusative,informal,personal | ||
നിന്റെ,second,singular,genitive,informal,determination=dependent,personal | ||
നിന്റേതു്,second,singular,genitive,informal,determination=independent,personal | ||
താങ്കൾ,second,singular,nominative,formal,personal | ||
താങ്കളെ,second,singular,accusative,formal,personal | ||
താങ്കൾക്ക്,second,singular,dative,formal,personal | ||
താങ്കളുടെ,second,singular,genitive,formal,determination=dependent,personal | ||
താങ്കളുടേതു്,second,singular,genitive,formal,determination=independent,personal | ||
നിങ്ങൾ,second,plural,nominative,formal,personal | ||
നിങ്ങളെ,second,plural,accusative,formal,personal | ||
നിങ്ങൾക്ക്,second,plural,dative,formal,personal | ||
നിങ്ങളുടെ,second,plural,genitive,formal,determination=dependent,personal | ||
നിങ്ങളുടേതു്,second,plural,genitive,formal,determination=independent,personal | ||
അവനെ,third,singular,accusative,masculine,personal,distal | ||
അവന്റെ,third,singular,genitive,masculine,determination=dependent,personal,distal | ||
അവന്റെത്,third,singular,genitive,masculine,determination=independent,personal,distal | ||
അവൾ,third,singular,nominative,feminine,personal,distal | ||
അവളെ,third,singular,accusative,feminine,personal,distal | ||
അവളുടെ,third,singular,genitive,feminine,determination=dependent,personal,distal | ||
അവളുടേതു്,third,singular,genitive,feminine,determination=independent,personal,distal | ||
അത്,third,singular,nominative,neuter,personal,distal | ||
അതിനെ,third,singular,accusative,neuter,personal,distal | ||
അതിന്റെ,third,singular,genitive,neuter,determination=dependent,personal,distal | ||
അതിന്റേതു്,third,singular,genitive,neuter,determination=independent,personal,distal | ||
അവർ,third,plural,nominative,personal,distal | ||
അവരെ,third,plural,accusative,personal,distal | ||
അവരുടെ,third,plural,genitive,determination=dependent,personal,distal | ||
അവരുടേതു്,third,plural,genitive,determination=independent,personal,distal | ||
എന്നിൽ,first,singular,locative,personal | ||
എന്നാൽ,first,singular,instrumental,personal | ||
എന്നോടു്,first,singular,sociative,personal | ||
ഞങ്ങളിലു്,first,plural,locative,exclusive,personal | ||
ഞങ്ങളാൽ,first,plural,instrumental,exclusive,personal | ||
ഞങ്ങളോടു്,first,plural,sociative,exclusive,personal | ||
നിനിൽ,second,singular,locative,informal,personal | ||
നിനാൽ,second,singular,instrumental,informal,personal | ||
നിനോടു്,second,singular,sociative,informal,personal | ||
താങ്കളിൽ,second,singular,locative,formal,personal | ||
താങ്കളാൽ,second,singular,instrumental,formal,personal | ||
താങ്കളോടു്,second,singular,sociative,formal,personal | ||
നിങ്ങളിൽ,second,plural,locative,formal,personal | ||
നിങ്ങളാൽ,second,plural,instrumental,formal,personal | ||
നിങ്ങളോടു്,second,plural,sociative,formal,personal | ||
അവനിൽ,third,singular,locative,masculine,personal,distal | ||
അവനാൽ,third,singular,instrumental,masculine,personal,distal | ||
അവനോടു്,third,singular,sociative,masculine,personal,distal | ||
അവളിൽ,third,singular,locative,feminine,personal,distal | ||
അവളാൽ,third,singular,instrumental,feminine,personal,distal | ||
അവളോടു്,third,singular,sociative,feminine,personal,distal | ||
അതിൽ,third,singular,locative,neuter,personal,distal | ||
അതാൽ,third,singular,instrumental,neuter,personal,distal | ||
അതോടു്,third,singular,sociative,neuter,personal,distal | ||
അവരിൽ,third,plural,locative,personal,distal | ||
അവരാൽ,third,plural,instrumental,personal,distal | ||
അവരോടു്,third,plural,sociative,personal,distal | ||
താൻ,third,singular,nominative,reflexive,personal | ||
തങ്ങൾ,third,plural,nominative,formal,reflexive,personal | ||
ഇവൻ,third,singular,nominative,masculine,proximal,personal | ||
ഇവൾ,third,singular,nominative,feminine,proximal,personal | ||
ഇത്,third,singular,nominative,neuter,proximal,personal | ||
ഇവർ,third,plural,nominative,proximal,personal | ||
എവൻ,third,singular,nominative,masculine,interrogative | ||
എവൾ,third,singular,nominative,feminine,interrogative | ||
എവർ,third,plural,nominative,interrogative | ||
ഏത്,third,singular,nominative,neuter,interrogative | ||
നാം,first,plural,nominative,inclusive,personal |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7 changes: 7 additions & 0 deletions
7
inflection/resources/org/unicode/inflection/tokenizer/config_ml.properties
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# | ||
# Copyright 2025 Unicode Incorporated and others. All rights reserved. | ||
# | ||
tokenizer.implementation.class=DefaultTokenizer | ||
tokenizer.nonDecompound.file=/org/unicode/inflection/tokenizer/ml/nondecompound.tok | ||
tokenizer.decompound=^(ശ്രീ)(.+?)(ഗുരു|സര്ക്കാര്)$|^(.+?)(ഗുരു|സര്ക്കാര്)$|^(.+?)(ഉണ്ട്|ആണ്|ഇല്ല)$|^(.+?)(ഒടൊപ്പം|ഉടൻ|ഓടെ|ഓട്|ഒപ്പം|തന്നെ|പോലും|പോലെ|ഉം|യ്)$|^(.+?)(കളുടെ|ങ്ങളുടെ|ത്തിന്റെ|ൻ്റെ|ന്റെ|യുടേ|യുടെ|യാൽ|യിൽ|ഇൽ|ല്|ൽ|ക്ക്|മാർ|ങ്ങൾ|കൾ|നെ|യെ)$ | ||
|
35 changes: 35 additions & 0 deletions
35
inflection/resources/org/unicode/inflection/tokenizer/ml/nondecompound.tok
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
അമ്മ | ||
അച്ഛൻ | ||
അച്ഛി | ||
അമ്മൻ | ||
മകൻ | ||
മകൾ | ||
കുട്ടി | ||
കുട്ടികൾ | ||
ആൺകുട്ടി | ||
ആൺകുട്ടികൾ | ||
പെൺകുട്ടി | ||
പെൺകുട്ടികൾ | ||
കഥ | ||
ചിത്രം | ||
ചിത്രങ്ങൾ | ||
ഗ്രന്ഥം | ||
ഗ്രന്ഥങ്ങൾ | ||
മക്കൾ | ||
ഞാൻ | ||
നീ | ||
നിങ്ങൾ | ||
അവൻ | ||
അവൾ | ||
അവ | ||
അവർ | ||
ഇത് | ||
അത് | ||
ഇവ | ||
അവ | ||
ശ്രീ | ||
നാരായണ | ||
ഗുരു | ||
കേരളം | ||
സര്ക്കാര് | ||
കേരളസര്ക്കാര് |
69 changes: 69 additions & 0 deletions
69
inflection/src/inflection/dialog/language/MlCommonConceptFactory.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
/* | ||
* Copyright 2025 Unicode Incorporated and others. All rights reserved. | ||
*/ | ||
|
||
#include <inflection/dialog/language/MlCommonConceptFactory.hpp> | ||
#include <inflection/dialog/SemanticFeatureConceptBase.hpp> | ||
#include <inflection/dialog/SpeakableString.hpp> | ||
#include <inflection/dialog/SemanticConceptList.hpp> | ||
#include <inflection/dialog/Plurality.hpp> | ||
|
||
namespace inflection::dialog::language { | ||
|
||
MlCommonConceptFactory::MlCommonConceptFactory(const ::inflection::util::ULocale& language) | ||
: super(language) | ||
{ | ||
} | ||
|
||
MlCommonConceptFactory::~MlCommonConceptFactory() | ||
{ | ||
} | ||
|
||
// Malayalam-specific conjunction for OR | ||
::inflection::dialog::SemanticConceptList* MlCommonConceptFactory::createOrList( | ||
const ::std::vector<const ::inflection::dialog::SemanticFeatureConceptBase*>& concepts) const | ||
{ | ||
auto list = super::createOrList(concepts); | ||
if (list) { | ||
list->setBeforeLast(::inflection::dialog::SpeakableString(u" അല്ലെങ്കിൽ ")); | ||
} | ||
return list; | ||
} | ||
|
||
// Malayalam-specific conjunction for AND | ||
::inflection::dialog::SemanticConceptList* MlCommonConceptFactory::createAndList( | ||
const ::std::vector<const ::inflection::dialog::SemanticFeatureConceptBase*>& concepts) const | ||
{ | ||
auto list = super::createAndList(concepts); | ||
if (list) { | ||
list->setBeforeLast(::inflection::dialog::SpeakableString(u"യും ")); | ||
list->setItemDelimiter(::inflection::dialog::SpeakableString(u", ")); | ||
} | ||
return list; | ||
} | ||
|
||
// In Malayalam, numbers generally follow the noun | ||
::inflection::dialog::SpeakableString | ||
MlCommonConceptFactory::quantifiedJoin(const ::inflection::dialog::SpeakableString& formattedNumber, | ||
const ::inflection::dialog::SpeakableString& nounPhrase, | ||
const ::std::u16string& /*measureWord*/, | ||
Plurality::Rule countType) const | ||
{ | ||
::inflection::dialog::SpeakableString space(u" "); | ||
if (countType == Plurality::Rule::ONE) { | ||
return nounPhrase + space + formattedNumber; | ||
} | ||
return formattedNumber + space + nounPhrase; | ||
} | ||
|
||
// Fallback to base implementation for now | ||
::inflection::dialog::SpeakableString | ||
MlCommonConceptFactory::quantifyType(const ::inflection::dialog::SpeakableString& formattedNumber, | ||
const ::inflection::dialog::SemanticFeatureConceptBase& semanticConcept, | ||
bool useDefault, | ||
::inflection::dialog::Plurality::Rule countType) const | ||
{ | ||
return super::quantifyType(formattedNumber, semanticConcept, useDefault, countType); | ||
} | ||
|
||
} // namespace inflection::dialog::language |
41 changes: 41 additions & 0 deletions
41
inflection/src/inflection/dialog/language/MlCommonConceptFactory.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright 2025 Unicode Incorporated and others. All rights reserved. | ||
*/ | ||
#pragma once | ||
|
||
#include <inflection/dialog/language/fwd.hpp> | ||
#include <inflection/dialog/CommonConceptFactoryImpl.hpp> | ||
#include <inflection/grammar/synthesis/fwd.hpp> | ||
#include <inflection/dialog/Plurality.hpp> | ||
|
||
namespace inflection::dialog::language { | ||
|
||
class MlCommonConceptFactory : public CommonConceptFactoryImpl { | ||
using super = CommonConceptFactoryImpl; | ||
|
||
public: | ||
explicit MlCommonConceptFactory(const ::inflection::util::ULocale& language); | ||
~MlCommonConceptFactory() override; | ||
|
||
// Malayalam-specific conjunction handling | ||
::inflection::dialog::SemanticConceptList* createOrList( | ||
const ::std::vector<const ::inflection::dialog::SemanticFeatureConceptBase*>& concepts) const override; | ||
|
||
::inflection::dialog::SemanticConceptList* createAndList( | ||
const ::std::vector<const ::inflection::dialog::SemanticFeatureConceptBase*>& concepts) const override; | ||
|
||
protected: | ||
::inflection::dialog::SpeakableString quantifiedJoin( | ||
const ::inflection::dialog::SpeakableString& formattedNumber, | ||
const ::inflection::dialog::SpeakableString& nounPhrase, | ||
const ::std::u16string& measureWord, | ||
::inflection::dialog::Plurality::Rule countType) const override; | ||
|
||
::inflection::dialog::SpeakableString quantifyType( | ||
const ::inflection::dialog::SpeakableString& formattedNumber, | ||
const ::inflection::dialog::SemanticFeatureConceptBase& semanticConcept, | ||
bool useDefault, | ||
::inflection::dialog::Plurality::Rule countType) const override; | ||
}; | ||
|
||
} // namespace inflection::dialog::language |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
determination=
is unnecessary. You can remove it.