BasicCAT — Computer-Aided Translation (CAT) Tools

Offline AI Manga Translator

Sat, 05 Apr 2025 06:14:50 +0000

Thanks to the current AI technologies, we can now run our offline AI manga translator on personal computers. In this article, we are going to talk about how to use ImageTrans, a computer-aided image translation tool, to translate Japanese manga. Since the entire process is done locally, it has no extra cost, no usage limits and no privacy problems.

Software Used

ImageTrans: an integrated workshop for image translation. It can extract the text with OCR, call AI translation, remove the text and reinject the translation.
A large language model tool, like LM Studio and Ollama, to call models like Deepseek, qwen and sakuraLLM. We are going to use Ollama in this article.

Start Local Services

Start a local manga OCR servce which ImageTrans uses for Japanese recognition (guide). If you need to translation images in other languages, you can choose other services.
Install and run Ollama. Execute the following command to download the qwen model.
```
ollama run qwen2.5
```

Use ImageTrans to Translate Manga

New Project

Create a new project based on the Japanese -> Chinese manga template.

Using a template has many benefits. The text styles, the language pair, OCR and translation engines to use are all configured. If it requires a specific balloon detection model, it will also prompt you to download it.

Configuration

Configure Ollama.

In ImageTrans’s preferences, configure the ChatGPT plugin to let it call the local Ollama server.

We need to modify two params:
- host: change to http://localhost:11434/v1
- model: change to qwen2.5
Configure custom workflow.

Through menu->project->batch->custom worflow to open the custom worflow form. Set the translation engine to ChatGPT and save the settings.

Import Images and Start Translation

Import the images and right click on the image. Through current image -> one-click translation (custom workflow), we can translate the current image. We can translate all the images through batch operations.

Demo video:

Sample Translation Result

Source	Target
あの子ヒンメル様の仲間なんだって?	听说那孩子是辛美尔大人的同伴？
悲しい顔一つしないなんて、	连一个悲伤的表情都没有，
薄情だね.	真是薄情啊。
おやおや、私達もしていませんよ.	哎呀哎呀，我们也没有啊。
司教はまじめにやれ!	主教认真点干！
この薄情者!	这个薄情的人！
ほっはっは.手痛いですな.	哈哈哈哈。真让人痛心啊。
…だって私、この人の事何も知らないし…	…可是我，对这个人的事一无所知啊…
たった１０年一緒に旅しただけだし…	只是一起旅行了仅仅十年而已…

ヒンメル will be translated as 希梅尔 by default. Here, we can specify it as a term with the right translation (辛美尔) to regulate the translation of proper nouns.

Vertical Japanese and Chinese Text Recognition

Sat, 29 Mar 2025 06:18:50 +0000

Unlike other languages, Japanese and Chinese can be arranged both vertically and horizontally. Most Japanese books and manga still use vertical text while Chinese is used horizontally now. You can only see vertical Chinese in ancient books and comics.

Japanese arranged both vertically and horizontally:

Vertical Traditional Chinese:

Here are several ways to recognize vertical text.

Word Detection

A straightforward way to recognize vertical text is to detect the position of each word, recognize them, and then merge them into a line or a paragraph.

There are many OCRs that return single-word coordinates, such as RapidOCR.

Recognition results:

Text Line Detection

Most popular deep learning-based OCRs can only detect text lines. Special training is required to give the OCR the ability to distinguish between horizontal and vertical.

Currently, manga-image-translator is the best open-source OCR in this area.

Recognition results:

If the OCR only recognizes horizontal text, we can first do a processing on the image to convert the vertical text into horizontal text.

Recognize Entire Image

With Transformer OCR, you can directly input images to get text results.

manga-OCR and Large models such as ChatGPT have the vertical text recognition capability. But they often need to be used in conjunction with other text detection methods.

All of the above features are integrated into ImageTrans and can be used after purchase.

Hardcoded Subtitle Extraction

Sat, 08 Mar 2025 11:09:50 +0000

Hardcoded (burned-in) subtitles are subtitles that blend directly into the video frames. This type of subtitle cannot be extracted directly. If we can’t find the original subtitle file and we want to extract the subtitle text and the corresponding timestamps, we need to use optical character recognition technology (OCR) for extraction.

Here is a simple extraction process of hardcoded subtitles: directly recognize the subtitles in each frame, and then determine which frames belong to the same subtitle line based on the position and content of the text to calculate the timeline.

But in practice, because OCR is a time-consuming operation, for 1 minute of a 25 FPS video, we need to process 1500 pictures, which may take tens of minutes. There is also a recognition rate problem, which can affect the results. So we need to optimize the process to improve the performance.

One approach is to use less time-consuming image processing methods to determine which frames contain subtitles, and then use an accurate OCR to recognize the text. There are already tools like VideoSubFinder and esrXP. However, this type of software uses traditional image processing methods, and the accuracy rate is not high enough. In this case, we can directly use the text detection method of an OCR software to determine which frames contain subtitles.

OCR is generally divided into two steps: detecting the text and recognizing the text. OCR’s text detection method is more accurate than traditional approaches, and since we skip the recognition step, it can also save time.

ImageTrans provides a hardcoded subtitle extraction tool that provides support for the above process. Next, we will introduce the use of the software with Empresses in the Palace as an example. Bilingual subtitles are provided in the Empresses in the Palace. Based on this bilingual subtitles, we can create a parallel corpus for language learning and research.

Subtitle Analysis

First, let’s take a look at what the subtitles in Empresses in the Palace look like. Here are a few screenshots:

We can see that the subtitles will have two lines, three lines, etc. The translated text may be scattered in multiple subtitles of the same original text.

Extract Video Frames

Open Silhouette and use its frame extractor to extract video frames:

Here we can set FPS. When FPS is set to 3, only 3 frames are extracted per second. If we want to have accurate timestamps, we can make the FPS bigger, but it will take more time to process. If we just need the text and don’t need the time to be accurate, FPS can be set smaller.

Recognize Subtitles in the Video

Next, open ImageTrans and import the video frames we just extracted.

Open the video subtitle extractor through menu -> tools.

Set the boundary of the region to be recognized, select the detection engine as “detect only (PaddleOCR)”, set the number of threads as 4, and click “Detect subtitles in all images” to start detection. Here, we process a 54-second video. The FPS for extraction is set to 3 so there are 164 images to detect.

After the operation is completed, we can see that the subtitle lines in the image are detected.

After that, we click “OCR all keyframes”, which recognizes the text in the subtitle images. Since we only recognize keyframes here, the number of images to be processed becomes 21.

We can see that an additional text box containing the recognized text is added.

After that, we can export the subtitles as an SRT file.

Because it’s bilingual, there are some extra steps. First, before “OCR all keyframes”, uncheck “Auto remove line breaks”. Then in the video subtitle extractor, click “Keep only the last line break”. In this way, we can make the text into one line of source text, one line of target text.

Then tick the checkbox: “merge multiple targets belonging to the same source”. In this way, the scattered target text will be merged together.

Here are the extracted subtitles:

1
00:00:00,999 --> 00:00:02,664
- Shichu. - Huan.
实初哥哥 嬛妹妹

2
00:00:03,663 --> 00:00:04,995
I just checked up on your family.
刚刚我去府上请脉

3
00:00:05,328 --> 00:00:07,659
Your mother told me you'd come here to offer incense.
听甄伯母说你来这里进香了

4
00:00:07,992 --> 00:00:09,657
Simply for a stroll and-to-pass an idle hour.
出来走走 也是散心

5
00:00:11,322 --> 00:00:13,320
Huan, don't try to hide it from me.
嬛妹妹 你就不要再瞒我了

6
00:00:14,319 --> 00:00:17,982
I know you' ve been worried about the audition for many days.
我知道为了殿选之事 你已经烦恼多日了

7
00:00:19,647 --> 00:00:22,644
U may only do what I'm allowed. The rest I leave to fate.
嬛儿是尽人事以听天命

8
00:00:23,643 --> 00:00:26,640
Huan, when my father lived, he often said,
嬛妹妹 家父在世的时候常说

9
00:00:26,973 --> 00:00:28,971
"A jade vessel is the symbol of a pure heart.
一片冰心在玉壶

10
00:00:29,304 --> 00:00:32,301
He wanted me to give this to my future-
他让我把此壶 交予我们温家未来的

11
00:00:33,300 --> 00:00:34,965
It is my own wish as well.
其实这也是我一直以来的心意

12
00:00:35,298 --> 00:00:38,295
If you accept this, you won't be called to the audition.
你若接受的话 就不用再去宫中殿选了

13
00:00:40,626 --> 00:00:43,290
In the time of the Shunzhi Emperor, it was decreed
顺治爷在世的时候就定下定例

14
00:00:43,623 --> 00:00:46,953
that girls qualified to join the harem may not marry before the audition.
所有未经选看的秀女 断不可私下结亲

15
00:00:47,619 --> 00:00:51,948
Though you intend to help, you need not give me such a valuable item.
实初哥哥想一时救急 也不必拿出这么贵重的东西来

16
00:00:52,614 --> 00:00:53,946
I'm profoundly flattered.
嬛儿受不起

The advantage of using ImageTrans to extract hardcoded subtitles is that we can intervene and modify the whole process, and different OCR engines can be selected for different languages.

After the subtitles are extracted, they can also be imported into Silhouette, a computer-aided audio-video translation software, and adjusted with the help of the waveform.

Video tutorials:

Traditional approach (VideoSubFinder + ImageTrans): https://www.youtube.com/watch?v=__MTiAtqrTs
OCR approach (Silhouette + ImageTrans): https://www.youtube.com/watch?v=5zEDfrXUAdI

How to Transcribe and Translate Japanese Video

Sun, 23 Feb 2025 04:19:50 +0000

We may need to transcribe and translate some Japanese videos. The desktop tool Silhouette has made it easy to be done.

The processing can be done entirely offline on your own computer.

Recognize the speech using an ASR model like Whisper.
Adjust the recognition result in the program with the help of the waveform and various controls.
Translate all the lines with an LLM model like ChatGPT or DeepSeek.

Based on my test, the processing only takes 20 minutes translating a 180 minutes Japanese video on a M4 Mac Mini device.

How to Align Text with Audio

Sun, 23 Feb 2025 02:01:50 +0000

You’ve got an audio or a video and the script of the speech. Now, you would like to align the text with the speech. Instead of manually creating the timelines, we can use a computer-aided approach:

First, recognize the speech to get the timelines with the recognized words.
Then, align the recognized text with the existing text.
Finally, determining the alignment of the aligned text and the timelines based on the text length.

The computer-aided video/audio translation tool Silhouette is made for such a purpose.

Recognized result:

Aligner:

Aligned result:

PS: if the recognized speech is accurate, we do not have to do this. This is for cases where the audio quality is not good, which leads to bad recognition results.

E-Commerce Image Translation

Mon, 18 Nov 2024 12:01:50 +0000

ImageTrans is a computer-aided image translation tool. We can use it to translate e-commerce images.

There are a wide variety of images for cross-border e-commerce. Some images are used as the main image for search results and some are used in the details. It poses a number of challenges for translators:

Some text has to be translated, while some does not need translation or has to be removed.
Text may have complex backgrounds such as patterns and gradients. We need to remove the text and restore the background.
The target text might be longer and occupy a larger area than the source text (e.g. English to Chinese).
Text has to be aligned precisely in the image.

ImageTrans provides the following functions to handle the translation of cross-border e-commerce images.

It can use OCR technology to automatically generate text boxes and remove text, eliminating manual selection and erasure of text. Adding, deleting, and editing text boxes manually is also supported.

Original image:

Text mask:

Text-removed image:
It can pretranslate the text with machine translation. Multiple machine translation engines (Ali e-commerce, ChatGPT, DeepL, Baidu, etc.) can be used to provide references for translation. Although e-commerce translation is a creative translation, machine translation can still provide some help.
It can unify the text styles of multiple selected text boxes.
It can align multiple selected text boxes. They can also be aligned with the source text boxes.
It supports displaying alignment lines when moving a text box.
It can export the results as Photoshop images, or directly process existing PSD files for images that require complex modifications.
It has built-in search and replace, which can be used to unify text case and perform other operations.

Here are some examples of translated images. Lato is used here as the font for English. This font has a variety of font weights and small spacing, which can meet the needs of Chinese-English translation of complex pictures. In order to ensure readable text, all the text boxes’ font size is set larger than 15px.

Example #1:

Example #2:

Example #3:

Example #4:

Click here for more image translation examples

How to Write a Plugin for ImageTrans

Sun, 15 Sep 2024 03:23:50 +0000

ImageTrans uses the ABPlugin library to provide plugin functionality. We can develop five types of plugins: text recognition, machine translation, custom action, mask generation, and text removal.

Here we are going to write a Google machine translation plugin for demonstration.

Environments

Download B4J 8.9: B4J.zip.
Download additional libraries: b4jlib.zip.
Download OpenJDK+OpenJFX: jdk-14.0.1.zip

Configure the library path and the JDK path in the software.

New Project

Create a new UI project.

In Build Configuration, change the package toorg.xulihang.imagetrans, which is the same as ImageTrans.

Then create a newgoogleMTPlugin.basclass, containing the following template content:

Sub Class_Globals
	Private fx As JFX
End Sub

'Initializes the object. You can NOT add parameters to this method!
Public Sub Initialize() As String
	Log("Initializing plugin " & GetNiceName)
	' Here return a key to prevent running unauthorized plugins
	Return "MyKey"
End Sub

' must be available
public Sub GetNiceName() As String
	Return "googleMT"
End Sub

' must be available
public Sub Run(Tag As String, Params As Map) As ResumableSub
	Select Tag
		Case "getParams"
			Dim paramsList As List
			paramsList.Initialize
			paramsList.Add("key")
			Return paramsList
		Case "translate"
			Return ""
		Case "batchtranslate"
			Return Array()
		Case "supportBatchTranslation"
			Return True
	End Select
	Return ""
End Sub

The plugin name can be obtained throughGetIceName.

The types of plugins are distinguished based on the suffixes of their names, and currently there are several types of suffixes:

Machine translation: MT
Text recognition: OCR
Image inpainting (text removal): Inpaint
Text mask generation: MaskGen
Cutsom action: Action

Plugin Implementation

ImageTrans will pass the name of the operation to be executed and the corresponding parameters to the plugin. The plugin performs the corresponding operation based on the specified tag.

Here are the common operations:

getParams: get the parameters that need to be configured.
getDefaultParamValues: get default params
getSetupParams: get params for installation
getIsInstalledOrRunning: check if the plugin is installed or running
translate: translate a single sentence
batchtranslate: translate multiple sentences
supportBatchTranslation: whether it supports multiple sentence translation
getText: recognize text in a single area
getTextWithLocation: recognize the text of the entire image and return coordinate information
inpaint: generate text-removed images
genMask: generate text mask
byTextArea: designed to process by text area
process: process text in text areas

Here, let’s implement translation using Google.

Select Tag
	Case "translate"
		wait for (translate(Params.Get("source"),Params.Get("sourceLang"),Params.Get("targetLang"),Params.Get("preferencesMap"))) complete (result As String)
		Return result
	Case "batchtranslate"
		wait for (batchTranslate(Params.Get("source"),Params.Get("sourceLang"),Params.Get("targetLang"),Params.Get("preferencesMap"))) complete (targetList As List)
		Return targetList
End Select

Firstly, implement single sentence translation with a simple HTTP request:

Sub translate(source As String,sourceLang As String,targetLang As String,preferencesMap As Map) As ResumableSub
	Dim target As String
	Dim su As StringUtils
	Dim job As HttpJob
	job.Initialize("job",Me)
	Dim params As String
	Dim key As String
	key=getMap("google",getMap("api",preferencesMap)).GetDefault("key","")
	If key="" Then
		Return ""
	End If
	params="key="&key& _
	"&q="&su.EncodeUrl(source,"UTF-8")&"&format=text&source="&sourceLang&"&target="&targetLang
	job.PostString("https://translation.googleapis.com/language/translate/v2",params)
	wait For (job) JobDone(job As HttpJob)
	If job.Success Then
		Try
			Dim result,data As Map
			Dim json As JSONParser
			json.Initialize(job.GetString)
			result=json.NextObject
			data=result.Get("data")
			Dim translations As List
			translations=data.Get("translations")
			Dim map1 As Map
			map1=translations.Get(0)
			target=map1.Get("translatedText")
		Catch
			target=""
			Log(LastException)
		End Try
	Else
		target=""
	End If
	job.Release
	Return target
End Sub


Sub getMap(key As String,parentmap As Map) As Map
	Return parentmap.Get(key)
End Sub

Then handle multiple sentence translation. Google does not support multiple sentence translation by default. We can separate multiple sentences with segmentation symbols and send them to Google Translate in one request.

Sub batchTranslate(sourceList As List,sourceLang As String,targetLang As String,preferencesMap As Map) As ResumableSub
	Dim targetList As List
	targetList.Initialize
	Dim sb As StringBuilder
	sb.Initialize
	For Each source As String In sourceList
		sb.Append(source.Replace(CRLF,"<br/>"))
		sb.Append(CRLF)
	Next
	wait for (translate(sb.ToString,sourceLang,targetLang,preferencesMap)) Complete (target As String)
	Dim targetList As List
	targetList.Initialize
	For Each result As String In Regex.Split(CRLF,target)
		result = result.Replace("<br/>",CRLF)
		targetList.Add(result)
	Next
	Return targetList
End Sub

Test

In main, run the following code for testing:

Dim map1 As Map
map1.Initialize
map1.Put("api",CreateMap("google":CreateMap("key":"api key")))
Dim n As googleMTPlugin
n.Initialize
wait for (n.translate("BasicCAT Documentation","en","zh",map1)) complete (result As String)
Log(result)
wait for (n.batchTranslate(Array("Sentence 1","Sentence 2"),"en","zh",map1)) complete (targetList As List)
Log(targetList)

Pack

After the implementation, we need to package it.

Perform the following compile to library operation and save the files to the plugins folder of ImageTrans.

More Plugin Examples

You can find more plugin examples here: https://github.com/xulihang/ImageTrans_plugins

How to Localize a Desktop Application written in B4J

Wed, 21 Aug 2024 13:11:50 +0000

ImageTrans is a desktop program written in B4J which can be displayed in multiple languages.

The Localizator localization library is used behind it.

It needs to store keys and text in different languages in an Excel spreadsheet like the following.

key	zh	en
Hello {1}	你好 {1}	Hello {1}

Then in the code, find the corresponding language version according to the key:

lblHello.Text = loc.LocalizeParams("Hello {1}!", Array(edtName.Text))

It’s easy to add a new language by simply creating a new column and using the ISO-639-1 standard language code as the header. For example, the table below has added a column for French.

key	zh	en	fr
Hello {1}	你好 {1}	Hello {1}	Bonjour {1}

ImageTrans has an integrated localization feature that allows you to export the above table and import back translations from the table. See this issue for details: issue544

Localize ImageTrans with BasicCAT

Next, let’s talk about how to use BasicCAT to translate the exported xlsx file for localizing ImageTrans.

Hide columns that do not require translation.

Suppose we currently have a table like the one below, with French as the column to be translated:

key	zh	en	fr
Hello {1}	你好 {1}	Hello {1}	Hello {1}

We need to hide other columns to get the following table:

fr
Hello {1}

After that, BasicCAT will only process the column that needs to be translated.

Use BasicCAT to translate and generate the translated xlsx to be imported back to the ImageTrans software. We can use BasicCAT’s pre-translation function to automatically invoke machine translation to translate.
Process newly added text to be translated.

Each version update of the software may add new text that needs to be translated.

We can prepare the xlsx file according to step 1, and later use BasicCAT’s reimport feature to regenerate the project file based on the new file and the existing translations.

Afterwards, use search and replace to find segments with empty translations. Click “Filter Segments” in the bottom left corner to display only these fragments in the editor, making it easier for us to translate new text.

Comic Reader App That Translates

Sun, 28 Jul 2024 12:04:50 +0000

ImageTrans is a computer-aided image translation software. It has a good image viewer and can recognize the text in the images and translate. We can use it as a comic reader to read foreign comics (or manga, webtoon, manhua, doujin, etc.).

Reading related features:

Dragging support with the mouse
Zoom and navigation with hotkeys support
Support of a variety of common file formats: JPG, PNG, BMP, WebP, PDF, ZIP, CBZ, EPUB, MOBI, etc.
Support of detecting panels and reading by panels. It can also convert comics to webtoon.
Voice reading using TTS
Support of exporting results as web pages for reading on cell phones

Translation related features:

Support of most languages in the world
Support of calling multiple machine translation engines and large language models
Support of improving the translation with terms and strings substitution.

How to Convert Traditional Comics to Webtoon

Sun, 28 Jul 2024 10:57:50 +0000

Traditional comics (manga) are usually drawn in a format suitable for printing on A5 or A4 paper. With the popularity of e-reading devices such as mobile phones, webtoon has comes up. It arranges comic panels on a long image in a size suitable for mobile phone reading. Readers can just keep reading by swiping down.

Converting an existing traditional comics to webtoon usually involves rearranging subplots and redrawing speech bubbles. You can refer to the video by comic artist Jason Brubaker.

As readers, if we want to convert traditional comics to webtoon for easy reading on our mobile phones without too much effort, we can use ImageTrans for automatic conversion. This software can detect the panels and export them in a long image in the webtoon format. We can edit the panels and adjust the order of them in the software to ensure the results are correct before exporting.

Original image:

Image source: “Tie Ji Gang Bing”.

Exported image:

BasicCAT — Computer-Aided Translation (CAT) Tools

Offline AI Manga Translator

Software Used

Start Local Services

Use ImageTrans to Translate Manga

New Project

Configuration

Import Images and Start Translation

Sample Translation Result

Vertical Japanese and Chinese Text Recognition

Word Detection

Text Line Detection

Recognize Entire Image

Hardcoded Subtitle Extraction

Subtitle Analysis

Extract Video Frames

Recognize Subtitles in the Video

How to Transcribe and Translate Japanese Video

How to Align Text with Audio

Further Reading

E-Commerce Image Translation

How to Write a Plugin for ImageTrans

Environments

New Project

Plugin Implementation

Test

Pack

More Plugin Examples

How to Localize a Desktop Application written in B4J

Localize ImageTrans with BasicCAT

Comic Reader App That Translates

How to Convert Traditional Comics to Webtoon