Skip to content

Commit 6d4d51a

Browse files
committed
updated trimming prompt and removes double slashes from output
1 parent 06cd767 commit 6d4d51a

File tree

1 file changed

+31
-27
lines changed

1 file changed

+31
-27
lines changed

conversion2025/mathpix_to_llm_with_lines_to_api.ipynb

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1291,20 +1291,21 @@
12911291
" end: str = Field(..., description=\"The end position of the trim.\")\n",
12921292
"\n",
12931293
"llm_task_trim_content = f\"\"\"\n",
1294-
" You will be given the full text of a question, extracted from a markdown file by line numbers.\n",
1294+
" You will be given the full text of a question, extracted from a markdown file using line numbers.\n",
1295+
" Assuming the extracted text is correct, then only the start of the first and the end of last lines may contain unwanted text.\n",
12951296
" The first and last lines may contain unwanted text, such as:\n",
12961297
" - Question numbering (e.g. \"1.\", \"2.\", \"(a)\", \"(b)\", \"i.\", \"ii.\", etc.)\n",
12971298
" - Text from the previous or next question.\n",
1299+
" We want to remove this unwanted text.\n",
12981300
"\n",
12991301
" Focus only on the actual stem (content) of the question.\n",
13001302
"\n",
13011303
" Your task is to, using the full question as guidance:\n",
1302-
" - From the first line, identify the exact substring where the stem begins, without the unwanted text, and put it in `start`.\n",
1303-
" - From the last line, identify the exact substring where the stem ends, without the unwanted text, and put it in `end`.\n",
1304-
" - Ensure that the substrings are taken verbatim from the original text, so they can be located precisely in Python code.\n",
1305-
" - Try to output as little as possible.\n",
1306-
"\n",
1307-
" We assume that the middle of the stem is always correct, so only the start and end may need trimming.\n",
1304+
" - First and Last line of the stem may be the same if the stem is only one line.\n",
1305+
" - From the first line, identify the exact substring where the wanted text begins, and put it in `start`.\n",
1306+
" - From the last line, identify the exact substring where the wanted text ends and put it in `end`.\n",
1307+
" - These two substrings will be used to find the start and end index using regex afterwards, so try to use as few words as possible.\n",
1308+
" - Overlapping between start and end is allowed.\n",
13081309
"\n",
13091310
" Example #1:\n",
13101311
" first line: \"1. A man is going up hill at 1m/s\"\n",
@@ -1315,20 +1316,21 @@
13151316
" \"\"\"\n",
13161317
"\n",
13171318
"llm_task_trim_part = f\"\"\"\n",
1318-
" You will be given the full text of a question, extracted from a markdown file by line numbers.\n",
1319+
" You will be given the full text of a question, extracted from a markdown file using line numbers.\n",
1320+
" Assuming the extracted text is correct, then only the start of the first and the end of last lines may contain unwanted text.\n",
13191321
" The first and last lines may contain unwanted text, such as:\n",
13201322
" - Question numbering (e.g. \"1.\", \"2.\", \"(a)\", \"(b)\", \"i.\", \"ii.\", etc.)\n",
13211323
" - Text from the previous or next question.\n",
1324+
" We want to remove this unwanted text.\n",
13221325
"\n",
13231326
" Focus only on one sub-question (part) of the question, specified later.\n",
13241327
"\n",
13251328
" Your task is to, using the full question as guidance:\n",
1326-
" - Identify the exact substring where the sub-question begins, without the unwanted text, and put it in `start`.\n",
1327-
" - Identify the exact substring where the sub-question ends, without the unwanted text, and put it in `end`.\n",
1328-
" - Ensure that the substrings are taken verbatim from the original text, so they can be located precisely in Python code.\n",
1329-
" - Try to output as little as possible.\n",
1330-
"\n",
1331-
" We assume that the middle of the sub-question is always correct, so only the start and end may need trimming.\n",
1329+
" - First and Last line of the sub-question may be the same if the sub-question is only one line.\n",
1330+
" - From the first line, identify the exact substring where the wanted text begins, and put it in `start`.\n",
1331+
" - From the last line, identify the exact substring where the wanted text ends and put it in `end`.\n",
1332+
" - These two substrings will be used to find the start and end index using regex afterwards, so try to use as few words as possible.\n",
1333+
" - Overlapping between start and end is allowed.\n",
13321334
"\n",
13331335
" Example #1:\n",
13341336
" first line: \"answer the following question: (a) what is his speed?\"\n",
@@ -1339,20 +1341,21 @@
13391341
" \"\"\"\n",
13401342
"\n",
13411343
"llm_task_trim_part_solution = f\"\"\"\n",
1342-
" You will be given the full text of a question, extracted from a markdown file by line numbers.\n",
1344+
" You will be given the full text of a question, extracted from a markdown file using line numbers.\n",
1345+
" Assuming the extracted text is correct, then only the start of the first and the end of last lines may contain unwanted text.\n",
13431346
" The first and last lines may contain unwanted text, such as:\n",
13441347
" - Question numbering (e.g. \"1.\", \"2.\", \"(a)\", \"(b)\", \"i.\", \"ii.\", etc.)\n",
13451348
" - Text from the previous or next question.\n",
1349+
" We want to remove this unwanted text.\n",
13461350
"\n",
1347-
" Focus only on one part-solution of a sub-question of the question, specified later.\n",
1351+
" Focus only on one part-solution of a sub-question (part) of the question, specified later.\n",
13481352
"\n",
13491353
" Your task is to, using the full question as guidance:\n",
1350-
" - Identify the exact substring where the part-solution begins, without the unwanted text, and put it in `start`.\n",
1351-
" - Identify the exact substring where the part-solution ends, without the unwanted text, and put it in `end`.\n",
1352-
" - Ensure that the substrings are taken verbatim from the original text, so they can be located precisely in Python code.\n",
1353-
" - Try to output as little as possible.\n",
1354-
"\n",
1355-
" We assume that the middle of the part-solution is always correct, so only the start and end may need trimming.\n",
1354+
" - First and Last line of the part-solution may be the same if the part-solution is only one line.\n",
1355+
" - From the first line, identify the exact substring where the wanted text begins, and put it in `start`.\n",
1356+
" - From the last line, identify the exact substring where the wanted text ends and put it in `end`.\n",
1357+
" - These two substrings will be used to find the start and end index using regex afterwards, so try to use as few words as possible.\n",
1358+
" - Overlapping between start and end is allowed.\n",
13561359
"\n",
13571360
" Example #1:\n",
13581361
" first line: \"A: (a) 2 + 3 = 5\"\n",
@@ -1384,7 +1387,7 @@
13841387
" Full question:\n",
13851388
" {question}\n",
13861389
"\n",
1387-
" Stem (content) of the question:\n",
1390+
" Stem (content) of the question to extract from:\n",
13881391
" {content_text}\n",
13891392
"\n",
13901393
" Return the JSON now.\n",
@@ -1428,7 +1431,7 @@
14281431
" Full question:\n",
14291432
" {question}\n",
14301433
"\n",
1431-
" specific sub-question (part) of the question:\n",
1434+
" specific sub-question (part) of the question to extract from:\n",
14321435
" {part_text}\n",
14331436
"\n",
14341437
" Return the JSON now.\n",
@@ -1472,7 +1475,7 @@
14721475
" Full question:\n",
14731476
" {question}\n",
14741477
"\n",
1475-
" Specific part-solution of the question:\n",
1478+
" Specific part-solution of the question to extract from:\n",
14761479
" {solution_text}\n",
14771480
"\n",
14781481
" Return the JSON now.\n",
@@ -1484,13 +1487,14 @@
14841487
"\n",
14851488
" try:\n",
14861489
" parsed_output = solution_parser.parse(response.content)\n",
1487-
" start = solution_text.index(parsed_output.start)\n",
1488-
" end = solution_text.index(parsed_output.end) + len(parsed_output.end)\n",
1490+
" start = solution_text.index(parsed_output.start.replace('\\\\\\\\', '\\\\'))\n",
1491+
" end = solution_text.index(parsed_output.end.replace('\\\\\\\\', '\\\\')) + len(parsed_output.end)\n",
14891492
" print(f\"Successfully trimmed part-solution for question {question_number}, part {part_number}.\")\n",
14901493
"\n",
14911494
" return improve_trim(solution_text, start, end)\n",
14921495
" except Exception as e:\n",
14931496
" print(f\"Error parsing LLM response as JSON for trimming solution part for question {question_number}, part {part_number}\")\n",
1497+
" print(response.content)\n",
14941498
" print(f\"Retrying... Attempt No.{attempt_idx + 1}\")\n",
14951499
" time.sleep(2)\n",
14961500
"\n",

0 commit comments

Comments
 (0)