Jisa - AI Manga Translator
End-to-end manga translation system for OCR, inpainting, and Thai typesetting.
Overview
Jisa is an end-to-end manga translation system that uploads comic pages, detects speech bubbles, extracts text with OCR, translates it into Thai, removes the original text, and renders the translated text back into the page. The workflow combines a Python backend and a Next.js frontend to give users a local-first dashboard for uploading pages, monitoring progress, and reviewing original, cleaned, and translated results.
Problem & Solution
The Problem
Translating manga pages manually is slow and inconsistent. A good translation is not enough on its own because the text must be extracted accurately, bubble regions cleaned up, and the translated dialogue typeset back into the page without breaking the layout or tone.
The Solution
Jisa automates the full workflow by detecting speech bubbles, OCRing each bubble, translating with page-level context, inpainting the original text out of the image, and typesetting the Thai translation back into the bubble regions while preserving the artwork and page structure.
Tech Stack
Provides the Python backend HTTP API and job orchestration layer for the image processing pipeline.
Handle image manipulation, cleaning, masking, and processing tasks across the manga page workflow.
Support the segmentation models used for bubble detection and refinement.
Drives the frontend dashboard for uploads, status tracking, and reviewing processed pages.
Provide the styling and motion system for the upload and review interface.
Supports local OCR inference and OpenAI-compatible translation workflows.
Key Features
- Drag-and-drop upload for one or many manga pages
- Background job processing with live status polling
- Bubble segmentation using a detection-first pipeline
- OCR extraction per bubble with page-level fallback when needed
- Page-context translation for more consistent dialogue
- Inpainting that removes original text while preserving artwork
- Bubble-aware typesetting for translated Thai text
- Side-by-side review of original, cleaned, and final translated images
- Progress and error reporting for each uploaded file
Challenges & Learnings
Managing VRAM across multiple models
Several vision models and OCR inference can run on the same machine, so the backend unloads models between stages to stay within a practical GPU budget.
Cleaning text without destroying artwork
Inpainting needs a text-only mask instead of just a bubble mask, or the cleaned image loses too much of the original page structure.
Thai typesetting inside tight bubbles
Naive wrapping is not enough for Thai dialogue, so the renderer uses language-aware layout logic to fit translated text back into constrained speech bubbles.
Coordinating OCR, translation, and fallbacks
OCR and translation work best when they share page context, and the pipeline also needs fallback paths for missing detections, unavailable credentials, and inpainting failures.