Contribute to MbazaNLP
MbazaNLP is an open-source community. Every contribution — code, data, documentation, or community support — moves Kinyarwanda NLP forward.
Contribute Data
High-quality labelled data is the foundation of our models. You can contribute by:
- Translating or validating sentence pairs
- Recording or transcribing speech
- Donating text corpora for training
Contribute Code
Help improve our models, tools, and infrastructure:
- Fix bugs or open issues on GitHub
- Improve model training scripts
- Add demos, APIs, or integrations
Improve Documentation
Good documentation makes models accessible to more people:
- Update model and dataset cards on HuggingFace
- Write tutorials and usage examples
- Translate documentation into Kinyarwanda
Join the Community
The community is where collaboration starts:
- Answer questions in #help on Slack
- Share opportunities in #opportunities
- Participate in community sprints and events
Contribution guides
Pick a task below. Each guide has everything you need — what to install, what to do, and how to submit your work. No prior experience required for most tasks.
What you'll do: Run one of the MbazaNLP models against real inputs, score the outputs, and report what you find. Results go into the model card on HuggingFace and help the community understand where each model performs well and where it falls short.
Skills needed: Kinyarwanda fluency. A laptop. No machine learning background required.
Models available for evaluation
| Model | Task | HuggingFace ID |
|---|---|---|
| Nllb_finetuned_general_en_kin | English → Kinyarwanda (general) | mbazaNLP/Nllb_finetuned_general_en_kin |
| Nllb_finetuned_education_en_kin | Education-domain translation | mbazaNLP/Nllb_finetuned_education_en_kin |
| Nllb_finetuned_tourism_en_kin | Tourism-domain translation | mbazaNLP/Nllb_finetuned_tourism_en_kin |
| Whisper-Small-Kinyarwanda | Speech recognition (ASR) | mbazaNLP/Whisper-Small-Kinyarwanda |
| kinyarwanda-tts-model | Text-to-speech | mbazaNLP/kinyarwanda-tts-model |
Step 1 — install dependencies
pip install transformers torch soundfile
# For ASR and TTS also install:
pip install datasets torchaudio
Step 2 — load and run a model
Translation example:
from transformers import pipeline
translator = pipeline(
"translation",
model="mbazaNLP/Nllb_finetuned_general_en_kin"
)
result = translator(
"The student reads the book carefully.",
src_lang="eng_Latn",
tgt_lang="kin_Latn"
)
print(result[0]["translation_text"])
Step 3 — score each output
For each of your 20 test inputs, rate the output on three dimensions using a 1–5 scale:
- Fluency — does the output read like natural Kinyarwanda?
- Adequacy — does it convey the same meaning as the source?
- Domain fit — does the vocabulary match the expected domain (education, tourism, etc.)?
Flag any clear errors: mistranslations, hallucinated content, or unintelligible audio output.
Step 4 — submit your results
Copy your scores into the community evaluation spreadsheet (link shared in #models on Slack). Include: the model name, your input sentences, the outputs, your scores, and any error notes. A maintainer will aggregate the results and update the model card on HuggingFace.
No laptop? TTS outputs can be rated by ear using the offline evaluation pack — ask in #help on Slack.
What you'll do: Translate English sentences into Kinyarwanda (or review existing translations for accuracy) and submit them as new rows in the NMT Education parallel dataset. The dataset currently has around 10,000 sentence pairs and is the primary training data for the education-domain translation model.
Skills needed: Bilingual fluency in Kinyarwanda and English or French. No coding required.
What you can contribute
| Type | Description |
|---|---|
| New pairs | Translate English education sentences into Kinyarwanda |
| Review | Check existing pairs for accuracy and flag errors |
| Domain expansion | Contribute pairs from a new domain: health, agriculture, or legal |
Step 1 — find source sentences
Use any of the following as source material:
- The curated list of English education sentences shared in #data on Slack
- Primary school textbooks, government education documents, or curriculum materials in PDF or Word format (licensed for reuse)
- Open educational resources published under CC-BY
Step 2 — format your pairs
Each contribution is a simple two-line pair. Use plain text — no machine translation:
en: The teacher writes on the blackboard.
kin: Umwarimu andika ku mbaho.
en: Students should study every day.
kin: Abanyeshuri bagomba kwiga buri munsi.
Step 3 — quality check before submitting
- Both sides must be complete sentences, not fragments.
- The Kinyarwanda translation must be grammatically correct — have at least one other fluent speaker review it before submitting.
- Do not use machine-generated translations. The dataset is human reference data.
Step 4 — submit
Add your pairs to the community contribution sheet (link in #data on Slack). Pairs are batched by a maintainer into a pull request against the dataset at the end of each month. You will be credited in the dataset changelog.
What you'll do: Write or improve the documentation card for a MbazaNLP model on HuggingFace. A complete model card tells users what the model does, what it should not be used for, how it was trained, and how well it performs. These details matter for responsible AI use and make the model citable in academic work.
Skills needed: Clear technical writing. Familiarity with the model's task is helpful but not required — you can write documentation from the training repository and paper alone.
What a complete model card includes
| Section | What to write | Common gap? |
|---|---|---|
| YAML frontmatter | Language tags (e.g. language: rw), licence, task tags | Yes — missing tags make the model invisible in HuggingFace search |
| Model description | What it is, architecture, base model, training data summary | Sometimes |
| Intended Use | Primary use case + what it should NOT be used for | Yes — was missing from 10 of 11 MbazaNLP models before May 2026 |
| Limitations & Bias | Known failure modes, language variety coverage, demographic gaps | Yes |
| Evaluation results | At least one metric (BLEU, WER, MOS) on a named test set | Yes |
| How to use | Working code example using the current Transformers API | Sometimes |
| Citation | BibTeX block so researchers can cite the model | Yes — was missing from all 13 models |
Step 1 — pick a model
Browse the mbazaNLP organisation on HuggingFace and find a model whose card is incomplete. Look for blank sections, missing YAML frontmatter, or no code example. The #models channel on Slack has a list of current gaps.
Step 2 — use the community template
Fork the mbazaNLP/model-card-template repository on HuggingFace as your starting point. Fill in each section for the model you have chosen. Check the community model card standards for the required fields and format.
Step 3 — test the code example
Before submitting, run the code example in the card and confirm it works with the current version of the Transformers library. Outdated examples are one of the most common issues found in MbazaNLP model cards.
pip install transformers --upgrade
# Then paste and run the code example from your draft card
Step 4 — open a pull request
On HuggingFace, navigate to the model repository and open a community PR with your
updated README.md. Title it: docs: improve model card — add [sections you added].
Post the PR link in #models on Slack so a maintainer can review it.
Checklist before submitting
[ ] YAML frontmatter: language, licence, task tags present
[ ] Model description: architecture and base model named
[ ] Intended Use: primary use + out-of-scope uses (2 paragraphs)
[ ] Limitations and Bias: at least one paragraph
[ ] Training data: dataset name, size, and source
[ ] Evaluation results: at least one metric on a named test set
[ ] How to use: code example tested and working on current Transformers
[ ] Citation: BibTeX block present
What you'll do: Make changes to the aimbaza.org Django site — add an event, add an opportunity, fix a broken link, or write a blog post. The site is open-source and accepts pull requests from any contributor.
Skills needed: Python basics and git familiarity. For content-only changes (events, opportunities), you only need to edit a YAML or JSON file — no Django knowledge required.
Good first tasks
- Add an upcoming community event to the events calendar
- Add a grant, fellowship, or computing opportunity to the Opportunities page
- Fix a broken link or typo anywhere on the site
- Write a short blog post about your NLP work or research
Step 1 — set up the site locally
git clone https://github.com/MBAZA-NLP/aimbaza.org.git
cd aimbaza.org
python -m venv .venv
# Mac / Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
pip install -r requirements/development.txt
cp .env.example .env
python manage.py migrate
python manage.py loaddata apps/events/fixtures/initial_events.json \
apps/blog/fixtures/initial_posts.json
python manage.py runserver
The site will be available at http://127.0.0.1:8000.
Step 2 — make your change
Adding an event: Open apps/events/fixtures/initial_events.json
and add a new entry following the format of the existing ones. Run
python manage.py loaddata apps/events/fixtures/initial_events.json
to reload and check it appears on the Events page.
Adding an opportunity: Open
apps/opportunities/fixtures/initial_opportunities.json
and add your entry. Each opportunity needs a title, description, category
(computing / grant / fellowship), deadline, and link.
Writing a blog post: Add a new entry to
apps/blog/fixtures/initial_posts.json with a unique pk,
a slug, title, excerpt, body, author, and published date. Reload with
python manage.py loaddata apps/blog/fixtures/initial_posts.json.
Step 3 — run the checks
black . --check
flake8 .
python manage.py test
All three must pass before you open a pull request.
Step 4 — open a pull request
git checkout -b feat/add-your-change-description
git add .
git commit -m "feat: add [what you added]"
git push origin feat/add-your-change-description
Open a PR against the develop branch (not main).
A maintainer will review within 5 working days. See
CONTRIBUTING.md for the full workflow.
What you'll do: Set up the tools every contributor needs and make a small, real contribution to a MbazaNLP repository. By the end you will have a GitHub account, a HuggingFace account, and at least one merged pull request.
Skills needed: None. Bring a laptop and 90 minutes.
Step 1 — create accounts
- GitHub: go to github.com/join and create a free account. GitHub is where MbazaNLP source code and documentation live.
- HuggingFace: go to huggingface.co/join and create a free account. HuggingFace is where MbazaNLP publishes its models and datasets.
- Slack: join the community workspace at aimbaza.slack.com. Introduce yourself in #introductions and join the channels relevant to what you want to work on.
Step 2 — install Git
Git is the version control tool used by every contributor.
- Windows: download from git-scm.com and run the installer.
- Mac: run
xcode-select --installin Terminal. - Linux: run
sudo apt install git(Ubuntu/Debian) orsudo dnf install git(Fedora).
Confirm it is installed:
git --version
Step 3 — configure Git with your name
git config --global user.name "Your Name"
git config --global user.email "your@email.com"
Step 4 — fork and clone a repository
Go to a MbazaNLP repository on GitHub (for example, MBAZA-NLP/aimbaza.org). Click the Fork button in the top right corner — this creates your own copy of the repository under your account. Then clone it to your machine:
git clone https://github.com/YOUR-USERNAME/aimbaza.org.git
cd aimbaza.org
Step 5 — create a branch and make a change
# Always create a branch — never commit directly to main
git checkout -b docs/fix-my-first-typo
Open any file and make a small, real change: fix a spelling error, improve a sentence in the README, or add a missing full stop. Save the file.
Step 6 — commit and push
git add .
git commit -m "docs: fix typo in README"
git push origin docs/fix-my-first-typo
Step 7 — open a pull request
Go to your forked repository on GitHub. You will see a banner saying your branch is ahead of the original. Click Compare & pull request. Write a short description of what you changed and why, then click Create pull request.
A maintainer will review your PR and either merge it or leave feedback. Post the PR link in #github on Slack if you would like someone to take a look sooner.
That is it. You are now a contributor to an open-source African language AI project.
Pull Request Workflow
- Fork the relevant repository on GitHub.
- Create a feature branch:
git checkout -b feat/your-feature-name - Make your changes and write or update tests where applicable.
- Run linting before pushing:
black . && flake8 . - Open a pull request against
main(ordevelopfor the website repo). - A maintainer will review within 5 working days. Address any feedback and the PR will be merged.
Read the full CONTRIBUTING.md for code style, commit message conventions, and branching standards.
Engineering Standards
All MbazaNLP code and model contributions follow our community engineering standards: