Skip to content

German synthetic data generation#403

Open
manalilatkar wants to merge 3 commits intoIBM:mainfrom
manalilatkar:german_sdg
Open

German synthetic data generation#403
manalilatkar wants to merge 3 commits intoIBM:mainfrom
manalilatkar:german_sdg

Conversation

@manalilatkar
Copy link
Copy Markdown
Member

@manalilatkar manalilatkar commented Mar 3, 2026

  • added german prompts and german flow yaml
  • added 2 more parsing blocks to the last llm block
  • fixed a bug in the chunking process
  • fixed the jsonl-> csv conversion script
  • added the newly created german golden dataset

- added german prompts and german flow yaml
- added 2 more parsing blocks to the last llm block
- fixed a bug in the chunking process
- fixed the jsonl-> csv conversion script
- added the newly created golden dataset

Signed-off-by: Manali Latkar <manali.latkar@ibm.com>
Copy link
Copy Markdown
Member

@dharaneeshvrd dharaneeshvrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except the minor comment, I am not able to provide any other comment
@iv1111 please review the german prompt and the golden dataset generated.

answer = item.get('answer', [{}])
if not type(answer) is list or len(answer) == 0:
answer = [{}]
answer = item.get('answer', '')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will never return you None

Suggested change
answer = item.get('answer', '')
answer = item.get('answer')

this will

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants