We may need to transcribe and translate some Japanese videos. The desktop tool Silhouette has made it easy to be done.

screenshot

The processing can be done entirely offline on your own computer.

  1. Recognize the speech using an ASR model like Whisper.
  2. Adjust the recognition result in the program with the help of the waveform and various controls.
  3. Translate all the lines with an LLM model like ChatGPT or DeepSeek.

Based on my test, the processing only takes 20 minutes translating a 180 minutes Japanese video on a M4 Mac Mini device.