Projects & Models

Open-source NLP models and datasets for Kinyarwanda and East African languages, built by the MbazaNLP community and freely available on HuggingFace.

Machine Translation

NLLB Fine-tuned — General (En↔Kin)

NLLB-200 1.3B fine-tuned on general-purpose English–Kinyarwanda data.

View on HuggingFace →
NLLB Fine-tuned — Education (En↔Kin)

NLLB-200 1.3B fine-tuned on education-domain data (Coursera, Atingi).

View on HuggingFace →
NLLB Fine-tuned — Tourism (En↔Kin)

NLLB-200 1.3B fine-tuned on tourism-domain English–Kinyarwanda data.

View on HuggingFace →
NLLB Education (full card)

Education-domain translation model with complete model card.

View on HuggingFace →
NLLB Tourism 8-bit Quantized

Tourism translation model quantized to 8-bit for CPU inference.

View on HuggingFace →
Quantized NLLB — Education 8-bit

Education translation model quantized to 8-bit via CTranslate2.

View on HuggingFace →
Quantized NLLB — Tourism 8-bit

Tourism translation model quantized to 8-bit via CTranslate2.

View on HuggingFace →

Speech Recognition (ASR)

Kinyarwanda Coqui STT

Coqui STT model for Kinyarwanda automatic speech recognition.

View on HuggingFace →
Kinyarwanda NeMo Conformer

NeMo Conformer-CTC-Large ASR fine-tuned on Kinyarwanda speech.

View on HuggingFace →
Whisper Small — Kinyarwanda

OpenAI Whisper-small fine-tuned on Common Voice Kinyarwanda.

View on HuggingFace →
Multilingual ASR (Rw / Sw / Lg)

NeMo Conformer-CTC-Large for Kinyarwanda, Swahili, and Luganda.

View on HuggingFace →

Text-to-Speech & Other

Kinyarwanda TTS

Text-to-Speech model producing natural Kinyarwanda speech.

View on HuggingFace →
Bakame Chatbot

Rasa-based Kinyarwanda chatbot. Gated — request access on model page.

View on HuggingFace →

Datasets

NMT Education Parallel Corpus (En–Kin)

61,767 sentence pairs for education-domain English–Kinyarwanda translation (Coursera, Atingi, Wikipedia).

View on HuggingFace →
PIQA-kin

Physical commonsense reasoning benchmark in Kinyarwanda (binary QA).

View on HuggingFace →

All models and datasets are openly licensed. Browse the full collection at huggingface.co/mbazaNLP and github.com/MBAZA-NLP.