Skip to content

Commit 838d61b

Browse files
committed
fix(AI translation): It should be able to see the input file now
1 parent 3af9610 commit 838d61b

File tree

2 files changed

+70
-34
lines changed

2 files changed

+70
-34
lines changed

.github/prompts/ai-translation-user.prompt.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@ messages:
88
Please read the translation file "{{translation_file}}" and translate all the strings from English to {{language}}.
99
1010
The file contains strings in the format "line_number:English text" - please translate only the text after the colon while preserving the exact line number and colon format.
11+
12+
Translation strings to process:
13+
{{translation_content}}

.github/workflows/ai-translation.yml

Lines changed: 67 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ jobs:
207207
matrix:
208208
include: ${{ fromJson(needs.extract_strings.outputs.translation-matrix) }}
209209
fail-fast: false # Continue processing other languages even if one fails
210-
max-parallel: 5 # Limit concurrent AI requests
210+
max-parallel: 1 # Limit concurrent AI requests to avoid rate limiting
211211

212212
steps:
213213
- name: Harden the runner (Audit all outbound calls)
@@ -236,6 +236,15 @@ jobs:
236236
echo "📊 Translation file size: $file_size bytes"
237237
echo "✅ Translation file validation completed successfully"
238238
239+
- name: Prepare translation content
240+
id: translation_content
241+
run: |
242+
# Read file and indent each line by 2 spaces (except the first)
243+
awk 'NR==1 {print} NR>1 {print " " $0}' "${{ matrix.file }}" > indented_content.txt
244+
echo "content<<EOF" >> $GITHUB_OUTPUT
245+
cat indented_content.txt >> $GITHUB_OUTPUT
246+
echo "EOF" >> $GITHUB_OUTPUT
247+
239248
- name: Run AI translation
240249
id: ai_translate
241250
uses: actions/ai-inference@334892bb203895caaed82ec52d23c1ed9385151e # v2.0.4
@@ -246,10 +255,8 @@ jobs:
246255
language: ${{ matrix.language }}
247256
lang_code: ${{ matrix.lang_code }}
248257
translation_file: ${{ matrix.file }}
249-
file_input: |
250-
translation_content: ${{ matrix.file }}
251-
enable-github-mcp: true
252-
github-mcp-toolsets: "context,repos"
258+
translation_content: |
259+
${{ steps.translation_content.outputs.content }}
253260
model: openai/gpt-4.1
254261
max-tokens: 8000
255262
token: ${{ secrets.AMC_COPILOT_TOKEN_CLASSIC }}
@@ -308,22 +315,32 @@ jobs:
308315
# Save the AI response back to the original translation file
309316
if [ "${{ steps.check_translation.outputs.output_method }}" = "file" ]; then
310317
echo "📄 Using response file: ${{ steps.ai_translate.outputs.response-file }}"
311-
cp "${{ steps.ai_translate.outputs.response-file }}" "${{ matrix.file }}"
318+
cp "${{ steps.ai_translate.outputs.response-file }}" "${{ matrix.file }}.raw"
312319
elif [ "${{ steps.check_translation.outputs.output_method }}" = "content" ]; then
313320
echo "📝 Using response content"
314-
echo "${{ steps.ai_translate.outputs.response }}" > "${{ matrix.file }}"
321+
echo "${{ steps.ai_translate.outputs.response }}" > "${{ matrix.file }}.raw"
315322
else
316323
echo "❌ Unexpected output method: ${{ steps.check_translation.outputs.output_method }}"
317324
exit 1
318325
fi
319326
320-
# Validate the saved file
327+
# Clean up AI output: keep only lines starting with number followed by colon
328+
echo "🧹 Cleaning AI output..."
329+
grep -E '^[0-9]+:' "${{ matrix.file }}.raw" > "${{ matrix.file }}" || {
330+
echo "❌ Failed to extract valid translations from AI output"
331+
echo "Raw output preview:"
332+
head -20 "${{ matrix.file }}.raw"
333+
exit 1
334+
}
335+
336+
# Validate the cleaned file
321337
if [ -f "${{ matrix.file }}" ] && [ -s "${{ matrix.file }}" ]; then
322-
echo "✅ AI translation saved successfully for ${{ matrix.language }} (${{ matrix.file }})"
323-
echo "📊 File size: $(wc -c < "${{ matrix.file }}") bytes"
324-
echo "📊 Line count: $(wc -l < "${{ matrix.file }}") lines"
338+
echo "✅ AI translation saved and cleaned successfully for ${{ matrix.language }}"
339+
echo "📊 Raw file size: $(wc -c < "${{ matrix.file }}.raw") bytes"
340+
echo "📊 Cleaned file size: $(wc -c < "${{ matrix.file }}") bytes"
341+
echo "📊 Valid translation lines: $(wc -l < "${{ matrix.file }}") lines"
325342
else
326-
echo "❌ Translation file is empty or missing: ${{ matrix.file }}"
343+
echo "❌ Translation file is empty or missing after cleanup: ${{ matrix.file }}"
327344
exit 1
328345
fi
329346
@@ -437,7 +454,10 @@ jobs:
437454

438455
- name: Insert AI translations into .po files
439456
if: needs.extract_strings.outputs.translations-to-process == 'true'
457+
shell: bash # Don't use -e flag to prevent premature exit
440458
run: |
459+
set -x # Enable command tracing for debugging
460+
441461
# Check if we have any translated files
442462
if ls missing_translations_*.txt 1> /dev/null 2>&1; then
443463
echo "📥 Processing AI translations..."
@@ -450,12 +470,12 @@ jobs:
450470
if [ -f "$file" ]; then
451471
if grep -q "# Translation failed" "$file" 2>/dev/null; then
452472
echo "⚠️ Found failed translation: $file"
453-
((failed_translations++))
473+
failed_translations=$((failed_translations + 1))
454474
# Remove failed translation files so they don't get processed
455475
rm "$file"
456476
else
457477
echo "✅ Found successful translation: $file"
458-
((successful_translations++))
478+
successful_translations=$((successful_translations + 1))
459479
fi
460480
fi
461481
done
@@ -466,8 +486,39 @@ jobs:
466486
467487
if [ $successful_translations -gt 0 ]; then
468488
echo "🔄 Processing successful translations with insert_missing_translations.py"
469-
python insert_missing_translations.py
470-
echo "✅ AI translations inserted into .po files"
489+
490+
# Show files that will be processed
491+
echo "Files to process:"
492+
ls -lh missing_translations_*.txt
493+
494+
# Validate file format before processing
495+
echo "Validating translation file format..."
496+
for file in missing_translations_*.txt; do
497+
echo "Checking $file:"
498+
if grep -qE '^[0-9]+:' "$file"; then
499+
echo "✅ File format is valid"
500+
else
501+
echo "❌ ERROR: File $file does not contain valid translation lines (format: number:text)"
502+
echo "File contents:"
503+
cat "$file"
504+
exit 1
505+
fi
506+
507+
# Show file preview
508+
echo "First 5 lines of $file:"
509+
head -5 "$file"
510+
echo "---"
511+
done
512+
513+
# Run with full error output captured
514+
echo "Running insert_missing_translations.py..."
515+
if python insert_missing_translations.py 2>&1; then
516+
echo "✅ AI translations inserted into .po files"
517+
else
518+
exit_code=$?
519+
echo "❌ insert_missing_translations.py failed with exit code $exit_code"
520+
exit $exit_code
521+
fi
471522
else
472523
echo "⚠️ No successful translations to process"
473524
fi
@@ -536,14 +587,7 @@ jobs:
536587
537588
🤖 **AI-Powered Translation Applied with Enhanced Matrix Processing**:
538589
- Automatically extracted missing translations using `extract_missing_translations.py`
539-
- Used GitHub Actions matrix strategy to process numbered files in parallel
540590
- Applied AI-powered translations using GitHub Models (GPT-4o) for multiple languages
541-
- **GITHUB PROMPT.YML FORMAT**: Using official GitHub prompt.yml template format with separated files
542-
- **SEPARATED PROMPT.YML FILES**: Organized in .github/prompts/ directory for better structure
543-
- **CLEAN ORGANIZATION**: ai-translation-system.prompt.yml and ai-translation-user.prompt.yml separated from workflows
544-
- **PERSONAL ACCESS TOKEN**: Using amilcarlucas PAT for GitHub MCP access instead of GITHUB_TOKEN
545-
- **GITHUB MCP ENABLED**: AI can read translation files directly from repository using Model Context Protocol
546-
- **FILE-BASED PROMPTS**: AI reads translation files directly instead of embedding content in YAML prompts
547591
- Supports processing unlimited translations per language with automatic chunking
548592
- Inserted translated strings into .po files using `insert_missing_translations.py`
549593
- Compiled binary .mo files for immediate use
@@ -553,23 +597,12 @@ jobs:
553597
**Languages processed**: Portuguese (pt), German (de), Italian (it), Japanese (ja), Chinese Simplified (zh_CN)
554598
555599
**Enhanced Matrix Processing & Scaling**:
556-
- ✅ **Parallel processing** of translation files for better performance
557600
- ✅ **Automatic chunking** when >50 strings per language (configurable)
558601
- ✅ **Robust error handling** for failed AI translation requests with detailed debugging
559602
- ✅ **File validation** before and after AI processing
560603
- ✅ **Consistent terminology** guidelines applied across all chunks for each language
561604
- Robust error handling for failed AI translation requests
562605
563-
**Technical Improvements Made**:
564-
- 🔧 **Organized prompt structure**: Moved prompt files to .github/prompts/ directory to avoid confusion with workflows
565-
- 🔧 **GitHub MCP enabled**: AI can read translation files directly from repository using Model Context Protocol
566-
- 🔧 **Separated prompt architecture**: System prompt and user prompt in separate files for better maintainability
567-
- 🔧 **File-based AI prompts**: AI reads translation files directly, eliminating YAML content embedding issues
568-
- 🔧 **Reusable system prompts**: System prompt can be reused across different translation tasks
569-
- 🔧 **Enhanced reliability**: No more YAML syntax issues from embedded content with special characters
570-
- 🔧 **Better scalability**: File-based approach handles large translation batches without prompt size limits
571-
- 🔧 **403 error fix**: Enabled GitHub MCP to resolve permission issues when reading repository files
572-
573606
**Translation Guidelines Applied**:
574607
- Technical aviation/drone context preservation
575608
- Formal register for technical documentation

0 commit comments

Comments
 (0)