As in most cases, translators only give the translated document to the client, the source text and the target text are not aligned in sentence level. So I made this aligner.
It is simple to use, just like BasicCAT. Use Enter to split segments and Delete to merge segments.
How does it work
It first splits the text into paragraphs and users have to align in paragraph level first. Then, paragraphs can be broken into sentences. If the numbers of sentences of the source text and the target text differ, empty textarea will be created as a placeholder.
More about text alignment: Parallel Text Alignment
- Build paralel corpus
- Reimport sentence-level translation to CAT tools from translated files.
- Create bilingual files from separate source files and target files.
Recommended aligning procedure
Import text -> align in paragraph level with LF Aligner and manually correct -> split paragraphs into sentences -> align in sentence level with LF Aligner or bleualign and manually correct
- Text of files of different formats can be extracted using Apache Tika or BasicCAT (based on Okapi Tikal)
- Segmentation rules can influence the results significantly. Make sure you have the right rules.
Java 8, Python 3 (needed by bleualign), LF Aligner
Crossplatform LF Aligner (unzip to Aligner’s root): LFAligner.zip
Changes: Add support of Linux and macOS for LF Aligner
- Integrated LF Aligner
- Check target for duplicates removing
- Separate BasicCAT workfile reading
- Append blank cells context menu
- Update segmentation rules after a new srx file is chosen
- Other bugfixes
- Export to XLIFF
- Bleualign results which deleted some segments will show with a red border
- Inaccurate bleualign results (by comparing target word count and source word count ratios of the whole text and the segment) will show with a yellow border
- Add keyboard shortcuts for “go to the next empty segment” and “go to the next segment with issues”
- New segment operations via context menu (right click on the empty area of the editor to call the menu)
- Word count
- Segments Remover (remove segments based on range, whether is empty or duplicated and specific text list)
- Go to the next empty segment
- Clean tags menu
- Export to XLSX
- Other bugfixes
- Support reading TMX, XLIFF files and BasicCAT work files
- Read from Clipboard
- TMX export now supports tags
- New project format (not compatible with previous versions)
- Bleualign can be used in paragraph level
- Updated text segmentation method
- Automatic alignment with Bleualign
How to use Bleualign:
- Ensure that you have Python installed.
- Export the source text to txt and translate the txt file with machine translation.
- “Edit”->”Bleualign”->”Align”. Align after choosing the translated txt file.
Change: Strip empty lines.
- Read Bilingual ass subtitle files with timestamp and filename info
- Faster TMX Exporting
- Read Bilingual files (one paragraph if source text and one paragraph of target text)
- Support docx files
- Allow text editing
- Move segment up or down
- Delete segment
- Export to TMX
- Drap and Drop to add text files