|
404 | 404 | " questions: list[QuestionModelLines] = Field(..., description=\"A list of questions.\")\n", |
405 | 405 | "\n", |
406 | 406 | "llm_task_seperate_questions = \"\"\"\n", |
407 | | - " Your task is to extract the line numbers for the start and end of each question and solution from the markdown file, then format it as a JSON object.\n", |
| 407 | + " Your task is to extract the line numbers for the start and end of all the question and solution from the markdown file, then format it as a JSON object.\n", |
| 408 | + " Note that the questions and solutions may not be around the same area in the markdown file.\n", |
408 | 409 | " These line numbers will be used later to extract the content of the questions and solutions procedurally.\n", |
409 | 410 | " \n", |
410 | 411 | " 1. **Content Extraction:**\n", |
|
413 | 414 | " - Begin by identifying all the questions in the markdown file, and for each question:\n", |
414 | 415 | " - Identify the start and end line numbers of the full question content, and place them in `question_content_start` and `question_content_end`.\n", |
415 | 416 | " - Identify the start and end line numbers of the full relevant solution content, and place them in `solution_content_start` and `solution_content_end`.\n", |
416 | | - " - Be careful to ensure that everything related to the question and solution is included, including any math delimiters and LaTeX formatting.\n", |
| 417 | + " - Be careful to ensure that everything related to the question and solution is included, including any math delimiters($, $$) and LaTeX formatting.\n", |
417 | 418 | " - Do not forget to include any images or figures that are part of the question or solution.\n", |
418 | 419 | " \n", |
419 | 420 | " 2. **Output Format:**\n", |
|
568 | 569 | " \"\"\"\n", |
569 | 570 | " Initialize the Set_Question_With_Solution with a question and its solution.\n", |
570 | 571 | " \n", |
571 | | - " Args:\n", |
| 572 | + " Args: \n", |
572 | 573 | " question (Set_Question): The question object.\n", |
573 | 574 | " solution (Set_Solution): The solution object.\n", |
574 | 575 | " \"\"\"\n", |
|
625 | 626 | "llm_task_seperate_parts_question = r\"\"\"\n", |
626 | 627 | " 1. **Content Extraction:**\n", |
627 | 628 | " - You may choose the `title` for the question.\n", |
628 | | - " - From the input `Full Question Content`, identify the start line and end line for the main introductory text (the stem), place them in `content_start` and `content_end`. \n", |
| 629 | + " - From the input `Full Question Content`, identify the start line and end line for the main introductory text (the stem), place them in `content_start` and `content_end`.\n", |
629 | 630 | " - From the input `Full Question Content`, identify and separate all the `parts`(sub-questions), they could be explicit (e.g. using, \"(a)\", \"(b)\", \"i.\", \"ii.\"... etc.), but may also be implied. For each identified sub-question:\n", |
630 | 631 | " - Place the start line going into `part_start` and the end line going into `part_end`.\n", |
631 | 632 | " - If the question has no sub-questions, leave `part_start` as 0 and `part_end` as -1.\n", |
632 | 633 | " - You may use the `Full Solution Content` to help with identifying the parts.\n", |
633 | | - " - Be careful to ensure that everything related to the question stem/parts is included, including any math delimiters and LaTeX formatting.\n", |
| 634 | + " - Be careful to ensure that everything related to the question stem/parts is included, including any math delimiters($, $$) and LaTeX formatting.\n", |
634 | 635 | " - Do not forget to include any images or figures that are part of the question stem, parts or solution.\n", |
635 | 636 | " - Ensure no solution content is included in the `content` or `parts` fields.\n", |
636 | 637 | " \n", |
|
639 | 640 | " - Do NOT include any explanations, comments, or markdown code blocks (like ```json).\n", |
640 | 641 | " \"\"\"\n", |
641 | 642 | "\n", |
| 643 | + "example_seperate_parts_question = r\"\"\"\n", |
| 644 | + " example:\n", |
| 645 | + " [(0, \"Q1. find value of $x$ in the following equation:\"),\n", |
| 646 | + " (1, \"i. $x + 1 = 2$\"),\n", |
| 647 | + " (2, \"ii. $x - 1 = 5$\")]\n", |
| 648 | + "\n", |
| 649 | + " should be converted to:\n", |
| 650 | + " {\n", |
| 651 | + " \"title\": \"suitable title\",\n", |
| 652 | + " \"content_start\": 0,\n", |
| 653 | + " \"content_end\": 0,\n", |
| 654 | + " \"parts\": [\n", |
| 655 | + " {\n", |
| 656 | + " \"part_start\": 1,\n", |
| 657 | + " \"part_end\": 1\n", |
| 658 | + " },\n", |
| 659 | + " {\n", |
| 660 | + " \"part_start\": 2,\n", |
| 661 | + " \"part_end\": 2\n", |
| 662 | + " }\n", |
| 663 | + " ]\n", |
| 664 | + " }\n", |
| 665 | + " \"\"\"\n", |
| 666 | + "\n", |
642 | 667 | "llm_task_seperate_parts_solution = r\"\"\"\n", |
643 | 668 | " 1. **Content Extraction:**\n", |
644 | 669 | " - From the input `full solution content`, identify the specific solution part that corresponds to the `target question part`, and place the start line and end line into `part_solution_start` and `part_solution_end`.\n", |
645 | 670 | " - If the `target question part` is empty, identify the specific solution part that corresponds to the `full question stem`.\n", |
646 | 671 | " - Use the `full question stem` and `full question parts` to help identify the specific solution part.\n", |
647 | 672 | " - Ensure that the `target question part` is used to extract the specific solution part.\n", |
648 | | - " - Be careful to ensure that everything related to the solution part is included, including any math delimiters and LaTeX formatting.\n", |
| 673 | + " - Be careful to ensure that everything related to the solution part is included, including any math delimiters($, $$) and LaTeX formatting.\n", |
649 | 674 | " - Do not forget to include any images or figures that are part of the solution.\n", |
650 | 675 | "\n", |
651 | 676 | " 2. **Output Format:**\n", |
|
672 | 697 | "\n", |
673 | 698 | " {llm_task_seperate_parts_question}\n", |
674 | 699 | "\n", |
| 700 | + " {example_seperate_parts_question}\n", |
| 701 | + "\n", |
675 | 702 | " Full Solution Content:\n", |
676 | 703 | " {solution_input}\n", |
677 | 704 | "\n", |
|
782 | 809 | " ).model_dump()" |
783 | 810 | ] |
784 | 811 | }, |
| 812 | + { |
| 813 | + "cell_type": "markdown", |
| 814 | + "id": "24", |
| 815 | + "metadata": {}, |
| 816 | + "source": [ |
| 817 | + "# remove the duplicated text for single part questions" |
| 818 | + ] |
| 819 | + }, |
785 | 820 | { |
786 | 821 | "cell_type": "code", |
787 | 822 | "execution_count": null, |
788 | | - "id": "24", |
| 823 | + "id": "25", |
| 824 | + "metadata": {}, |
| 825 | + "outputs": [], |
| 826 | + "source": [ |
| 827 | + "# class NoPartsQuestionModel(BaseModel):\n", |
| 828 | + "# \"\"\"\n", |
| 829 | + "# Represents a question without parts.\n", |
| 830 | + "# \"\"\"\n", |
| 831 | + "# hasParts: bool = Field(False, description=\"Indicates if the question has parts.\")\n", |
| 832 | + "\n", |
| 833 | + "# llm_task_remove_dupe = \"\"\"\n", |
| 834 | + "# 1. **Task:**\n", |
| 835 | + "# - Check if the single part that the question has is the same as the full question content.\n", |
| 836 | + "# - If it is not, then remove the part and set `hasParts` to `False`.\n", |
| 837 | + "# - If it is, then set `hasParts` to `True`.\n", |
| 838 | + " \n", |
| 839 | + "# 2. **Output Format:**\n", |
| 840 | + "# - You MUST output ONLY a single, raw, valid JSON string that matches the provided schema.\n", |
| 841 | + "# - Do NOT include any explanations, comments, or markdown code blocks (like ```json).\n", |
| 842 | + "# \"\"\"\n", |
| 843 | + "# def llm_remove_dupe_part(content: str, part: str) -> bool:\n", |
| 844 | + "# return content == part\n", |
| 845 | + "\n", |
| 846 | + "def dupe_text_reduce(questions_dict: dict) -> dict:\n", |
| 847 | + " \"\"\"\n", |
| 848 | + " Reduces duplicate text in the questions content and its parts.\n", |
| 849 | + " \"\"\"\n", |
| 850 | + " for question in questions_dict[\"questions\"]:\n", |
| 851 | + " parts = question[\"parts\"]\n", |
| 852 | + " if len(parts) == 1 and parts[0] == question[\"content\"]:\n", |
| 853 | + " # If the only part is the same as the content, remove the part and set hasParts to False.\n", |
| 854 | + " question[\"parts\"][0] = \"\"\n", |
| 855 | + " \n", |
| 856 | + " return questions_dict" |
| 857 | + ] |
| 858 | + }, |
| 859 | + { |
| 860 | + "cell_type": "code", |
| 861 | + "execution_count": null, |
| 862 | + "id": "26", |
789 | 863 | "metadata": {}, |
790 | 864 | "outputs": [], |
791 | 865 | "source": [ |
|
815 | 889 | "\n", |
816 | 890 | " extracted_dict = extract_parts_question(questions_dict)\n", |
817 | 891 | " print(\"succesfully extracted the parts from the questions.\")\n", |
818 | | - " print(json.dumps(extracted_dict, indent=2))\n", |
| 892 | + " print(json.dumps(extracted_dict))\n", |
819 | 893 | " print(\"Now validating the content...\")\n", |
820 | 894 | "\n", |
821 | | - " return extracted_dict" |
| 895 | + " return dupe_text_reduce(extracted_dict)" |
822 | 896 | ] |
823 | 897 | }, |
824 | 898 | { |
825 | 899 | "cell_type": "code", |
826 | 900 | "execution_count": null, |
827 | | - "id": "25", |
| 901 | + "id": "27", |
828 | 902 | "metadata": {}, |
829 | 903 | "outputs": [], |
830 | 904 | "source": [ |
|
833 | 907 | }, |
834 | 908 | { |
835 | 909 | "cell_type": "markdown", |
836 | | - "id": "26", |
| 910 | + "id": "28", |
837 | 911 | "metadata": {}, |
838 | 912 | "source": [ |
839 | 913 | "# Displaying questions" |
|
842 | 916 | { |
843 | 917 | "cell_type": "code", |
844 | 918 | "execution_count": null, |
845 | | - "id": "27", |
| 919 | + "id": "29", |
846 | 920 | "metadata": {}, |
847 | 921 | "outputs": [], |
848 | 922 | "source": [ |
|
869 | 943 | }, |
870 | 944 | { |
871 | 945 | "cell_type": "markdown", |
872 | | - "id": "28", |
| 946 | + "id": "30", |
873 | 947 | "metadata": {}, |
874 | 948 | "source": [ |
875 | 949 | "# in2lambda to JSON" |
|
878 | 952 | { |
879 | 953 | "cell_type": "code", |
880 | 954 | "execution_count": null, |
881 | | - "id": "29", |
| 955 | + "id": "31", |
882 | 956 | "metadata": {}, |
883 | 957 | "outputs": [], |
884 | 958 | "source": [ |
|
0 commit comments