FastAPI Next.js PyTorch

Jisa - AI Manga Translator

End-to-end manga translation system for OCR, inpainting, and Thai typesetting.

Overview

Jisa is an end-to-end manga translation system that uploads comic pages, detects speech bubbles, extracts text with OCR, translates it into Thai, removes the original text, and renders the translated text back into the page. The workflow combines a Python backend and a Next.js frontend to give users a local-first dashboard for uploading pages, monitoring progress, and reviewing original, cleaned, and translated results.

Problem & Solution

The Problem

Translating manga pages manually is slow and inconsistent. A good translation is not enough on its own because the text must be extracted accurately, bubble regions cleaned up, and the translated dialogue typeset back into the page without breaking the layout or tone.

The Solution

Jisa automates the full workflow by detecting speech bubbles, OCRing each bubble, translating with page-level context, inpainting the original text out of the image, and typesetting the Thai translation back into the bubble regions while preserving the artwork and page structure.

Tech Stack

FastAPI + Uvicorn

Provides the Python backend HTTP API and job orchestration layer for the image processing pipeline.

OpenCV, NumPy, Pillow

Handle image manipulation, cleaning, masking, and processing tasks across the manga page workflow.

PyTorch + Ultralytics

Support the segmentation models used for bubble detection and refinement.

Next.js 16 + React 19

Drives the frontend dashboard for uploads, status tracking, and reviewing processed pages.

Tailwind CSS 4 + Framer Motion

Provide the styling and motion system for the upload and review interface.

Ollama + BYOK translation

Supports local OCR inference and OpenAI-compatible translation workflows.

Key Features

Drag-and-drop upload for one or many manga pages
Background job processing with live status polling
Bubble segmentation using a detection-first pipeline
OCR extraction per bubble with page-level fallback when needed
Page-context translation for more consistent dialogue
Inpainting that removes original text while preserving artwork
Bubble-aware typesetting for translated Thai text
Side-by-side review of original, cleaned, and final translated images
Progress and error reporting for each uploaded file

Challenges & Learnings

#01

Managing VRAM across multiple models

Several vision models and OCR inference can run on the same machine, so the backend unloads models between stages to stay within a practical GPU budget.

#02

Cleaning text without destroying artwork

Inpainting needs a text-only mask instead of just a bubble mask, or the cleaned image loses too much of the original page structure.

#03

Thai typesetting inside tight bubbles

Naive wrapping is not enough for Thai dialogue, so the renderer uses language-aware layout logic to fit translated text back into constrained speech bubbles.

#04

Coordinating OCR, translation, and fallbacks

OCR and translation work best when they share page context, and the pipeline also needs fallback paths for missing detections, unavailable credentials, and inpainting failures.

Back to all projects Source link not available