Contribute to MbazaNLP

MbazaNLP is an open-source community. Every contribution — code, data, documentation, or community support — moves Kinyarwanda NLP forward.

Contribute Data

High-quality labelled data is the foundation of our models. You can contribute by:

  • Translating or validating sentence pairs
  • Recording or transcribing speech
  • Donating text corpora for training
Browse data repos →
Contribute Code

Help improve our models, tools, and infrastructure:

  • Fix bugs or open issues on GitHub
  • Improve model training scripts
  • Add demos, APIs, or integrations
Browse code repos →
Improve Documentation

Good documentation makes models accessible to more people:

  • Update model and dataset cards on HuggingFace
  • Write tutorials and usage examples
  • Translate documentation into Kinyarwanda
Browse HuggingFace org →
Join the Community

The community is where collaboration starts:

  • Answer questions in #help on Slack
  • Share opportunities in #opportunities
  • Participate in community sprints and events
Join Slack →

Contribution guides

Pick a task below. Each guide has everything you need — what to install, what to do, and how to submit your work. No prior experience required for most tasks.

What you'll do: Run one of the MbazaNLP models against real inputs, score the outputs, and report what you find. Results go into the model card on HuggingFace and help the community understand where each model performs well and where it falls short.

Skills needed: Kinyarwanda fluency. A laptop. No machine learning background required.

Models available for evaluation
ModelTaskHuggingFace ID
Nllb_finetuned_general_en_kinEnglish → Kinyarwanda (general)mbazaNLP/Nllb_finetuned_general_en_kin
Nllb_finetuned_education_en_kinEducation-domain translationmbazaNLP/Nllb_finetuned_education_en_kin
Nllb_finetuned_tourism_en_kinTourism-domain translationmbazaNLP/Nllb_finetuned_tourism_en_kin
Whisper-Small-KinyarwandaSpeech recognition (ASR)mbazaNLP/Whisper-Small-Kinyarwanda
kinyarwanda-tts-modelText-to-speechmbazaNLP/kinyarwanda-tts-model
Step 1 — install dependencies
pip install transformers torch soundfile
# For ASR and TTS also install:
pip install datasets torchaudio
Step 2 — load and run a model

Translation example:

from transformers import pipeline

translator = pipeline(
    "translation",
    model="mbazaNLP/Nllb_finetuned_general_en_kin"
)
result = translator(
    "The student reads the book carefully.",
    src_lang="eng_Latn",
    tgt_lang="kin_Latn"
)
print(result[0]["translation_text"])
Step 3 — score each output

For each of your 20 test inputs, rate the output on three dimensions using a 1–5 scale:

  • Fluency — does the output read like natural Kinyarwanda?
  • Adequacy — does it convey the same meaning as the source?
  • Domain fit — does the vocabulary match the expected domain (education, tourism, etc.)?

Flag any clear errors: mistranslations, hallucinated content, or unintelligible audio output.

Step 4 — submit your results

Copy your scores into the community evaluation spreadsheet (link shared in #models on Slack). Include: the model name, your input sentences, the outputs, your scores, and any error notes. A maintainer will aggregate the results and update the model card on HuggingFace.

No laptop? TTS outputs can be rated by ear using the offline evaluation pack — ask in #help on Slack.

What you'll do: Translate English sentences into Kinyarwanda (or review existing translations for accuracy) and submit them as new rows in the NMT Education parallel dataset. The dataset currently has around 10,000 sentence pairs and is the primary training data for the education-domain translation model.

Skills needed: Bilingual fluency in Kinyarwanda and English or French. No coding required.

What you can contribute
TypeDescription
New pairsTranslate English education sentences into Kinyarwanda
ReviewCheck existing pairs for accuracy and flag errors
Domain expansionContribute pairs from a new domain: health, agriculture, or legal
Step 1 — find source sentences

Use any of the following as source material:

  • The curated list of English education sentences shared in #data on Slack
  • Primary school textbooks, government education documents, or curriculum materials in PDF or Word format (licensed for reuse)
  • Open educational resources published under CC-BY
Step 2 — format your pairs

Each contribution is a simple two-line pair. Use plain text — no machine translation:

en: The teacher writes on the blackboard.
kin: Umwarimu andika ku mbaho.

en: Students should study every day.
kin: Abanyeshuri bagomba kwiga buri munsi.
Step 3 — quality check before submitting
  • Both sides must be complete sentences, not fragments.
  • The Kinyarwanda translation must be grammatically correct — have at least one other fluent speaker review it before submitting.
  • Do not use machine-generated translations. The dataset is human reference data.
Step 4 — submit

Add your pairs to the community contribution sheet (link in #data on Slack). Pairs are batched by a maintainer into a pull request against the dataset at the end of each month. You will be credited in the dataset changelog.

What you'll do: Write or improve the documentation card for a MbazaNLP model on HuggingFace. A complete model card tells users what the model does, what it should not be used for, how it was trained, and how well it performs. These details matter for responsible AI use and make the model citable in academic work.

Skills needed: Clear technical writing. Familiarity with the model's task is helpful but not required — you can write documentation from the training repository and paper alone.

What a complete model card includes
SectionWhat to writeCommon gap?
YAML frontmatterLanguage tags (e.g. language: rw), licence, task tagsYes — missing tags make the model invisible in HuggingFace search
Model descriptionWhat it is, architecture, base model, training data summarySometimes
Intended UsePrimary use case + what it should NOT be used forYes — was missing from 10 of 11 MbazaNLP models before May 2026
Limitations & BiasKnown failure modes, language variety coverage, demographic gapsYes
Evaluation resultsAt least one metric (BLEU, WER, MOS) on a named test setYes
How to useWorking code example using the current Transformers APISometimes
CitationBibTeX block so researchers can cite the modelYes — was missing from all 13 models
Step 1 — pick a model

Browse the mbazaNLP organisation on HuggingFace and find a model whose card is incomplete. Look for blank sections, missing YAML frontmatter, or no code example. The #models channel on Slack has a list of current gaps.

Step 2 — use the community template

Fork the mbazaNLP/model-card-template repository on HuggingFace as your starting point. Fill in each section for the model you have chosen. Check the community model card standards for the required fields and format.

Step 3 — test the code example

Before submitting, run the code example in the card and confirm it works with the current version of the Transformers library. Outdated examples are one of the most common issues found in MbazaNLP model cards.

pip install transformers --upgrade
# Then paste and run the code example from your draft card
Step 4 — open a pull request

On HuggingFace, navigate to the model repository and open a community PR with your updated README.md. Title it: docs: improve model card — add [sections you added]. Post the PR link in #models on Slack so a maintainer can review it.

Checklist before submitting
[ ] YAML frontmatter: language, licence, task tags present
[ ] Model description: architecture and base model named
[ ] Intended Use: primary use + out-of-scope uses (2 paragraphs)
[ ] Limitations and Bias: at least one paragraph
[ ] Training data: dataset name, size, and source
[ ] Evaluation results: at least one metric on a named test set
[ ] How to use: code example tested and working on current Transformers
[ ] Citation: BibTeX block present

What you'll do: Make changes to the aimbaza.org Django site — add an event, add an opportunity, fix a broken link, or write a blog post. The site is open-source and accepts pull requests from any contributor.

Skills needed: Python basics and git familiarity. For content-only changes (events, opportunities), you only need to edit a YAML or JSON file — no Django knowledge required.

Good first tasks
  • Add an upcoming community event to the events calendar
  • Add a grant, fellowship, or computing opportunity to the Opportunities page
  • Fix a broken link or typo anywhere on the site
  • Write a short blog post about your NLP work or research
Step 1 — set up the site locally
git clone https://github.com/MBAZA-NLP/aimbaza.org.git
cd aimbaza.org

python -m venv .venv
# Mac / Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate

pip install -r requirements/development.txt
cp .env.example .env
python manage.py migrate
python manage.py loaddata apps/events/fixtures/initial_events.json \
                          apps/blog/fixtures/initial_posts.json
python manage.py runserver

The site will be available at http://127.0.0.1:8000.

Step 2 — make your change

Adding an event: Open apps/events/fixtures/initial_events.json and add a new entry following the format of the existing ones. Run python manage.py loaddata apps/events/fixtures/initial_events.json to reload and check it appears on the Events page.

Adding an opportunity: Open apps/opportunities/fixtures/initial_opportunities.json and add your entry. Each opportunity needs a title, description, category (computing / grant / fellowship), deadline, and link.

Writing a blog post: Add a new entry to apps/blog/fixtures/initial_posts.json with a unique pk, a slug, title, excerpt, body, author, and published date. Reload with python manage.py loaddata apps/blog/fixtures/initial_posts.json.

Step 3 — run the checks
black . --check
flake8 .
python manage.py test

All three must pass before you open a pull request.

Step 4 — open a pull request
git checkout -b feat/add-your-change-description
git add .
git commit -m "feat: add [what you added]"
git push origin feat/add-your-change-description

Open a PR against the develop branch (not main). A maintainer will review within 5 working days. See CONTRIBUTING.md for the full workflow.

What you'll do: Set up the tools every contributor needs and make a small, real contribution to a MbazaNLP repository. By the end you will have a GitHub account, a HuggingFace account, and at least one merged pull request.

Skills needed: None. Bring a laptop and 90 minutes.

Step 1 — create accounts
  1. GitHub: go to github.com/join and create a free account. GitHub is where MbazaNLP source code and documentation live.
  2. HuggingFace: go to huggingface.co/join and create a free account. HuggingFace is where MbazaNLP publishes its models and datasets.
  3. Slack: join the community workspace at aimbaza.slack.com. Introduce yourself in #introductions and join the channels relevant to what you want to work on.
Step 2 — install Git

Git is the version control tool used by every contributor.

  • Windows: download from git-scm.com and run the installer.
  • Mac: run xcode-select --install in Terminal.
  • Linux: run sudo apt install git (Ubuntu/Debian) or sudo dnf install git (Fedora).

Confirm it is installed:

git --version
Step 3 — configure Git with your name
git config --global user.name "Your Name"
git config --global user.email "your@email.com"
Step 4 — fork and clone a repository

Go to a MbazaNLP repository on GitHub (for example, MBAZA-NLP/aimbaza.org). Click the Fork button in the top right corner — this creates your own copy of the repository under your account. Then clone it to your machine:

git clone https://github.com/YOUR-USERNAME/aimbaza.org.git
cd aimbaza.org
Step 5 — create a branch and make a change
# Always create a branch — never commit directly to main
git checkout -b docs/fix-my-first-typo

Open any file and make a small, real change: fix a spelling error, improve a sentence in the README, or add a missing full stop. Save the file.

Step 6 — commit and push
git add .
git commit -m "docs: fix typo in README"
git push origin docs/fix-my-first-typo
Step 7 — open a pull request

Go to your forked repository on GitHub. You will see a banner saying your branch is ahead of the original. Click Compare & pull request. Write a short description of what you changed and why, then click Create pull request.

A maintainer will review your PR and either merge it or leave feedback. Post the PR link in #github on Slack if you would like someone to take a look sooner.

That is it. You are now a contributor to an open-source African language AI project.

Pull Request Workflow

  1. Fork the relevant repository on GitHub.
  2. Create a feature branch: git checkout -b feat/your-feature-name
  3. Make your changes and write or update tests where applicable.
  4. Run linting before pushing: black . && flake8 .
  5. Open a pull request against main (or develop for the website repo).
  6. A maintainer will review within 5 working days. Address any feedback and the PR will be merged.

Read the full CONTRIBUTING.md for code style, commit message conventions, and branching standards.

Engineering Standards

All MbazaNLP code and model contributions follow our community engineering standards:

Python Style
Black + Flake8, PEP 8
Git Workflow
feat/fix/docs branch prefixes, descriptive commits
Model Cards
All HuggingFace releases require a complete model card