How to Add Pronunciation (Furigana) to Japanese Kanji

Japanese kanji originated from China and are logographic characters. Their pronunciation can be annotated using romaji or kana. The kana used for this purpose are called furigana.

Below are some examples of text with pronunciation annotations (raw text and annotated):

Example 1:

一匹の子犬。

一匹いっぴきの子犬こいぬ。

Example 2:

もちろんです以下にランダムな日本語の文章を生成しました。

もちろんです以下いかにランダムな日本語にほんごの文章ぶんしょうを生成せいせいしました。

Example 3:

王都が見えてきた。

王都おうとが見みえてきた。

Example 4:

でかいのは態度だけで後は何もかもが小さい。

でかいのは態度たいどだけで後あとは何なにもかもが小ちいさい。

Adding pronunciation annotations to Japanese kanji is not a simple task. It requires correctly segmenting Japanese sentences into words and then obtaining the accurate pronunciation of the segmented results. Japanese kanji pronunciation is complex, with on’yomi (Chinese readings) and kun’yomi (Japanese readings), and phonetic changes often occur when multiple characters are combined.

Kuromoji is a specialized Japanese morphological analysis software that performs word segmentation, grammatical analysis, and pronunciation retrieval based on dictionaries such as ipadic and unidic. It can be used to annotate kanji with pronunciation.

However, it has some drawbacks.

The default ipadic dictionary has limited data, leading to inaccurate annotations. Unidic, on the other hand, segments too finely—for example, splitting “日本語” (Japanese) into “日本” and “語,” which results in annotation errors. Additionally, it handles counters like “一匹” (one small animal) poorly.

ImageTrans integrates Kuromoji and combines the results from ipadic and unidic to generate accurate annotations. It also automatically handles counters, achieving better annotation results. This can be used to assist with manga reading, Japanese language learning, and automatic text annotation. If higher accuracy is required, large language models such as GPT or DeepSeek can also be used for annotation.