Contribute to MbazaNLP

MbazaNLP is an open-source community. Every contribution — code, data, documentation, or community support — moves Kinyarwanda NLP forward.

Contribute Data

High-quality labelled data is the foundation of our models. You can contribute by:

Translating or validating sentence pairs
Recording or transcribing speech
Donating text corpora for training

Browse data repos →

Contribute Code

Help improve our models, tools, and infrastructure:

Fix bugs or open issues on GitHub
Improve model training scripts
Add demos, APIs, or integrations

Browse code repos →

Improve Documentation

Good documentation makes models accessible to more people:

Update model and dataset cards on HuggingFace
Write tutorials and usage examples
Translate documentation into Kinyarwanda

Browse HuggingFace org →

Join the Community

The community is where collaboration starts:

Answer questions in #help on Slack
Share opportunities in #opportunities
Participate in community sprints and events

Join Slack →

Contribution guides

Pick a task below. Each guide has everything you need — what to install, what to do, and how to submit your work. No prior experience required for most tasks.

What you'll do: Run one of the MbazaNLP models against real inputs, score the outputs, and report what you find. Results go into the model card on HuggingFace and help the community understand where each model performs well and where it falls short.

Skills needed: Kinyarwanda fluency. A laptop. No machine learning background required.

Models available for evaluation

Model	Task	HuggingFace ID
Nllb_finetuned_general_en_kin	English → Kinyarwanda (general)	mbazaNLP/Nllb_finetuned_general_en_kin
Nllb_finetuned_education_en_kin	Education-domain translation	mbazaNLP/Nllb_finetuned_education_en_kin
Nllb_finetuned_tourism_en_kin	Tourism-domain translation	mbazaNLP/Nllb_finetuned_tourism_en_kin
Whisper-Small-Kinyarwanda	Speech recognition (ASR)	mbazaNLP/Whisper-Small-Kinyarwanda
kinyarwanda-tts-model	Text-to-speech	mbazaNLP/kinyarwanda-tts-model

Step 1 — install dependencies

pip install transformers torch soundfile
# For ASR and TTS also install:
pip install datasets torchaudio

Step 2 — load and run a model

Translation example:

from transformers import pipeline

translator = pipeline(
    "translation",
    model="mbazaNLP/Nllb_finetuned_general_en_kin"
)
result = translator(
    "The student reads the book carefully.",
    src_lang="eng_Latn",
    tgt_lang="kin_Latn"
)
print(result[0]["translation_text"])

Step 3 — score each output

For each of your 20 test inputs, rate the output on three dimensions using a 1–5 scale:

Fluency — does the output read like natural Kinyarwanda?
Adequacy — does it convey the same meaning as the source?
Domain fit — does the vocabulary match the expected domain (education, tourism, etc.)?

Flag any clear errors: mistranslations, hallucinated content, or unintelligible audio output.

Step 4 — submit your results

Copy your scores into the community evaluation spreadsheet (link shared in #models on Slack). Include: the model name, your input sentences, the outputs, your scores, and any error notes. A maintainer will aggregate the results and update the model card on HuggingFace.

No laptop? TTS outputs can be rated by ear using the offline evaluation pack — ask in #help on Slack.

What you'll do: Translate English sentences into Kinyarwanda (or review existing translations for accuracy) and submit them as new rows in the NMT Education parallel dataset. The dataset currently has around 10,000 sentence pairs and is the primary training data for the education-domain translation model.

Skills needed: Bilingual fluency in Kinyarwanda and English or French. No coding required.

What you can contribute

Type	Description
New pairs	Translate English education sentences into Kinyarwanda
Review	Check existing pairs for accuracy and flag errors
Domain expansion	Contribute pairs from a new domain: health, agriculture, or legal

Step 1 — find source sentences

Use any of the following as source material:

The curated list of English education sentences shared in #data on Slack
Primary school textbooks, government education documents, or curriculum materials in PDF or Word format (licensed for reuse)
Open educational resources published under CC-BY

Step 2 — format your pairs

Each contribution is a simple two-line pair. Use plain text — no machine translation:

en: The teacher writes on the blackboard.
kin: Umwarimu andika ku mbaho.

en: Students should study every day.
kin: Abanyeshuri bagomba kwiga buri munsi.

Step 3 — quality check before submitting

Both sides must be complete sentences, not fragments.
The Kinyarwanda translation must be grammatically correct — have at least one other fluent speaker review it before submitting.
Do not use machine-generated translations. The dataset is human reference data.

Step 4 — submit

Add your pairs to the community contribution sheet (link in #data on Slack). Pairs are batched by a maintainer into a pull request against the dataset at the end of each month. You will be credited in the dataset changelog.

What you'll do: Write or improve the documentation card for a MbazaNLP model on HuggingFace. A complete model card tells users what the model does, what it should not be used for, how it was trained, and how well it performs. These details matter for responsible AI use and make the model citable in academic work.

Skills needed: Clear technical writing. Familiarity with the model's task is helpful but not required — you can write documentation from the training repository and paper alone.

What a complete model card includes

Section	What to write	Common gap?
YAML frontmatter	Language tags (e.g. `language: rw`), licence, task tags	Yes — missing tags make the model invisible in HuggingFace search
Model description	What it is, architecture, base model, training data summary	Sometimes
Intended Use	Primary use case + what it should NOT be used for	Yes — was missing from 10 of 11 MbazaNLP models before May 2026
Limitations & Bias	Known failure modes, language variety coverage, demographic gaps	Yes
Evaluation results	At least one metric (BLEU, WER, MOS) on a named test set	Yes
How to use	Working code example using the current Transformers API	Sometimes
Citation	BibTeX block so researchers can cite the model	Yes — was missing from all 13 models

Step 1 — pick a model

Browse the mbazaNLP organisation on HuggingFace and find a model whose card is incomplete. Look for blank sections, missing YAML frontmatter, or no code example. The #models channel on Slack has a list of current gaps.

Step 2 — use the community template

Fork the mbazaNLP/model-card-template repository on HuggingFace as your starting point. Fill in each section for the model you have chosen. Check the community model card standards for the required fields and format.

Step 3 — test the code example

Before submitting, run the code example in the card and confirm it works with the current version of the Transformers library. Outdated examples are one of the most common issues found in MbazaNLP model cards.

pip install transformers --upgrade
# Then paste and run the code example from your draft card

Step 4 — open a pull request

On HuggingFace, navigate to the model repository and open a community PR with your updated README.md. Title it: docs: improve model card — add [sections you added]. Post the PR link in #models on Slack so a maintainer can review it.

Checklist before submitting

[ ] YAML frontmatter: language, licence, task tags present
[ ] Model description: architecture and base model named
[ ] Intended Use: primary use + out-of-scope uses (2 paragraphs)
[ ] Limitations and Bias: at least one paragraph
[ ] Training data: dataset name, size, and source
[ ] Evaluation results: at least one metric on a named test set
[ ] How to use: code example tested and working on current Transformers
[ ] Citation: BibTeX block present

What you'll do: Make changes to the aimbaza.org Django site — add an event, add an opportunity, fix a broken link, or write a blog post. The site is open-source and accepts pull requests from any contributor.

Skills needed: Python basics and git familiarity. For content-only changes (events, opportunities), you only need to edit a YAML or JSON file — no Django knowledge required.

Good first tasks

Add an upcoming community event to the events calendar
Add a grant, fellowship, or computing opportunity to the Opportunities page
Fix a broken link or typo anywhere on the site
Write a short blog post about your NLP work or research

Step 1 — set up the site locally

git clone https://github.com/MBAZA-NLP/aimbaza.org.git
cd aimbaza.org

python -m venv .venv
# Mac / Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate

pip install -r requirements/development.txt
cp .env.example .env
python manage.py migrate
python manage.py loaddata apps/events/fixtures/initial_events.json \
                          apps/blog/fixtures/initial_posts.json
python manage.py runserver

The site will be available at http://127.0.0.1:8000.

Step 2 — make your change

Adding an event: Open apps/events/fixtures/initial_events.json and add a new entry following the format of the existing ones. Run python manage.py loaddata apps/events/fixtures/initial_events.json to reload and check it appears on the Events page.

Adding an opportunity: Open apps/opportunities/fixtures/initial_opportunities.json and add your entry. Each opportunity needs a title, description, category (computing / grant / fellowship), deadline, and link.

Writing a blog post: Add a new entry to apps/blog/fixtures/initial_posts.json with a unique pk, a slug, title, excerpt, body, author, and published date. Reload with python manage.py loaddata apps/blog/fixtures/initial_posts.json.

Step 3 — run the checks

black . --check
flake8 .
python manage.py test

All three must pass before you open a pull request.

Step 4 — open a pull request

git checkout -b feat/add-your-change-description
git add .
git commit -m "feat: add [what you added]"
git push origin feat/add-your-change-description

Open a PR against the develop branch (not main). A maintainer will review within 5 working days. See CONTRIBUTING.md for the full workflow.

What you'll do: Set up the tools every contributor needs and make a small, real contribution to a MbazaNLP repository. By the end you will have a GitHub account, a HuggingFace account, and at least one merged pull request.

Skills needed: None. Bring a laptop and 90 minutes.

Step 1 — create accounts

GitHub: go to github.com/join and create a free account. GitHub is where MbazaNLP source code and documentation live.
HuggingFace: go to huggingface.co/join and create a free account. HuggingFace is where MbazaNLP publishes its models and datasets.
Slack: join the community workspace at aimbaza.slack.com. Introduce yourself in #introductions and join the channels relevant to what you want to work on.

Step 2 — install Git

Git is the version control tool used by every contributor.

Windows: download from git-scm.com and run the installer.
Mac: run xcode-select --install in Terminal.
Linux: run sudo apt install git (Ubuntu/Debian) or sudo dnf install git (Fedora).

Confirm it is installed:

git --version

Step 3 — configure Git with your name

git config --global user.name "Your Name"
git config --global user.email "your@email.com"

Step 4 — fork and clone a repository

Go to a MbazaNLP repository on GitHub (for example, MBAZA-NLP/aimbaza.org). Click the Fork button in the top right corner — this creates your own copy of the repository under your account. Then clone it to your machine:

git clone https://github.com/YOUR-USERNAME/aimbaza.org.git
cd aimbaza.org

Step 5 — create a branch and make a change

# Always create a branch — never commit directly to main
git checkout -b docs/fix-my-first-typo

Open any file and make a small, real change: fix a spelling error, improve a sentence in the README, or add a missing full stop. Save the file.

Step 6 — commit and push

git add .
git commit -m "docs: fix typo in README"
git push origin docs/fix-my-first-typo

Step 7 — open a pull request

Go to your forked repository on GitHub. You will see a banner saying your branch is ahead of the original. Click Compare & pull request. Write a short description of what you changed and why, then click Create pull request.

A maintainer will review your PR and either merge it or leave feedback. Post the PR link in #github on Slack if you would like someone to take a look sooner.

That is it. You are now a contributor to an open-source African language AI project.

Pull Request Workflow

Fork the relevant repository on GitHub.
Create a feature branch: git checkout -b feat/your-feature-name
Make your changes and write or update tests where applicable.
Run linting before pushing: black . && flake8 .
Open a pull request against main (or develop for the website repo).
A maintainer will review within 5 working days. Address any feedback and the PR will be merged.

Read the full CONTRIBUTING.md for code style, commit message conventions, and branching standards.

Engineering Standards

All MbazaNLP code and model contributions follow our community engineering standards:

Python Style

Black + Flake8, PEP 8

Git Workflow

feat/fix/docs branch prefixes, descriptive commits

Model Cards

All HuggingFace releases require a complete model card

Contribute to MbazaNLP

Contribute Data

Contribute Code

Improve Documentation

Join the Community

Contribution guides

Evaluate a model — test how well our models handle real Kinyarwanda

Models available for evaluation

Step 1 — install dependencies

Step 2 — load and run a model

Step 3 — score each output

Step 4 — submit your results

Add parallel sentence pairs — grow the training data for translation models

What you can contribute

Step 1 — find source sentences

Step 2 — format your pairs

Step 3 — quality check before submitting

Step 4 — submit

Review or write a model card — make models discoverable and trustworthy

What a complete model card includes

Step 1 — pick a model

Step 2 — use the community template

Step 3 — test the code example

Step 4 — open a pull request

Checklist before submitting

Contribute to the website — add content or fix issues on aimbaza.org

Good first tasks

Step 1 — set up the site locally

Step 2 — make your change

Step 3 — run the checks

Step 4 — open a pull request

Make your first pull request — complete beginner walkthrough

Step 1 — create accounts

Step 2 — install Git

Step 3 — configure Git with your name

Step 4 — fork and clone a repository

Step 5 — create a branch and make a change

Step 6 — commit and push

Step 7 — open a pull request

Pull Request Workflow

Engineering Standards