lambda-feedback
diff --git a/‎conversion2025/README.md‎
Lines changed: 28 additions & 15 deletions b/‎conversion2025/README.md‎
Lines changed: 28 additions & 15 deletions
diff --git a/‎conversion2025/assumptions.txt‎
Lines changed: 7 additions & 9 deletions b/‎conversion2025/assumptions.txt‎
Lines changed: 7 additions & 9 deletions
diff --git a/‎…5/mathpix_to_llm_with_lines_to_api.ipynb‎ ‎conversion2025/converter.ipynb‎conversion2025/mathpix_to_llm_with_lines_to_api.ipynb renamed to conversion2025/converter.ipynb
Lines changed: 9 additions & 5 deletions b/‎…5/mathpix_to_llm_with_lines_to_api.ipynb‎ ‎conversion2025/converter.ipynb‎conversion2025/mathpix_to_llm_with_lines_to_api.ipynb renamed to conversion2025/converter.ipynb
Lines changed: 9 additions & 5 deletions
@@ -1,33 +1,46 @@
 # README
 
 ## Overview
-This Jupyter Notebook (`file_name.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Mathpix and OpenAI's LLM capabilities for text transformation.
+This Jupyter Notebook (`converter.ipynb`) is designed for processing question sets and tutorials, extracting mathematical expressions, formatting them in Markdown then converting it into Lambda Feedback compatible JSONs. It leverages Mathpix, OpenAI's LLM capabilities and post processing for text transformation.
 
 ## Requirements
 Ensure you have the following installed:
 - Python 3.8+
 - `pip install -r requirements.txt`
 
 ## Setup
-1. Create a `.env` file in the root directory and add your OpenAI API keys:
+1. Create a `.env` file in the root directory and add your OpenAI and MathPix API keys:
    ```env
    OPENAI_API_KEY=<your-openai-api-key>
-   OPENAI_MODEL=<your-openai-model>
    MATHPIX_API_KEY=<your-mathpix-key>
    MATHPIX_APP_ID=<your-mathpix-id>
    ```
-4. Open `file_name.ipynb` and execute the cells to process your documents.
+2. Open `converter.ipynb` and execute the cells to process your documents.
 
-## Notes
-- Ensure your API key and endpoint are correct, as they are required for LLM functionality.
-- The notebook is designed for scientific documents, but can be extended to other text formats.
+#### Notes
+- Ensure your API keys are correct, as they are required for LLM functionality.
 
 ## How to use
-Place a pdf of your choice into the folder, `/conversion_content`. Name the pdf file as `example.pdf`.
-Run the converter in Jupiter. A folder with all the convertion content will be produced.
-for `mathpix_to_llm_to_in2lambda_to_JSON.ipynb`, it will produce a folder called `/mathpix_to_llm_to_in2lambda_to_JSON_out`.
-This will contain all the output of the converter.
-
-There is a markdown file called `example.md` inside `/mathpix_to_llm_to_in2lambda_to_JSON_out`, this is the markdown version of the pdf.
-As Mathpix rather reliably generates a consistent markdown version of the pdf, the converter will simply start from `example.md`.
-Meaning that if you wish to convert a different pdf, you must delete `example.md` first.
+1. Place a pdf (the set of questions) of your choice into the folder, `/conversion_content/input`.
+
+2. Ensure only 1 pdf is within the ./input folder, as the converter chooses one pdf in the folder in an undefined manner (likely alphabetically).
+
+3. Run the converter in Jupiter. The folder `/conversion_content/converter` will be created if it does not exist yet.
+
+#### Convertion process
+1. Within `/conversion_content/converter`, it will create another folder, `converter/conversion_content/media`, this will hold all the images that MathPix extracted.
+
+2. A file called `exmaple.md` will be made within  `/conversion_content/converter` if it does not exist yet, this is the markdown file produced by MathPix after scanning the pdf.
+
+3. Note that the current program will keep using the same `example.md` unless it is deleted, this is to reduce MathPix tokens as it almost alway produce identical markdown files with the same pdf.
+This means that to convert a different pdf file, you must also delete `example.md`.
+
+#### Notes
+Please read `assumptions.txt` for things that the converter assumes. If these assumptions are not obeyed, the converter may struggle and produce odd results.
+
+## Evaluation
+I believe this converter should be able to be integrated with the platform's API.
+
+The converter does have its flaws and there are definitely areas it can still improve on, such as being able to reliably taking in extremely messy inputs or produce more than just questions and solutions (answer box).
+
+Within the boundary of its assumptions however, it works very reliably and well.
@@ -1,10 +1,8 @@
 assumptions:
-    - only 1 set of questions
-    - the set of question contains the questions AND solutions
-    - parts are only 1 level deep (i.e. no Q1, part a), i)
-    - individual questions and solutions are seperatable by using just lines
-    - all parts are explicitly enumerated
-    - Chunky Independent Maths are deperated (otherwise Mathpix will not be able to seperate them)
-
-
-parts needs to be ordered
+    - there is only 1 set of questions in the pdf.
+    - the set of question contains the questions AND solutions.
+    - parts are only 1 level deep (i.e. no subquestions within a subquestion)
+    - individual questions and solutions are seperatable by using just lines (the chunk of text where the question/solution is contains all and only the question/solution, this includes images)
+    - all parts are explicitly enumerated (may struggle with implied subquestions)
+    - independent math equations are deperated (if equations belonging to different questions are clustered together, MathPix may see it as one and put them under the same maths delimiter, this rarely happens)
+    - dollar signs are used as math delimiters only.
@@ -38,7 +38,7 @@
     "from langchain_openai import ChatOpenAI\n",
     "from langchain.output_parsers import PydanticOutputParser\n",
     "\n",
-    "from in2lambda.api.module import Module\n",
+    "from in2lambda.api.set import Set\n",
     "from in2lambda.api.question import Question\n",
     "from in2lambda.api.part import Part\n",
     "\n",
@@ -126,7 +126,7 @@
     "# location of the output folder and media folder.\n",
     "folder_path = \"conversion_content\"\n",
     "input_path = f\"{folder_path}/input\"\n",
-    "output_path = f\"{folder_path}/mathpix_to_llm_with_lines_to_api\"\n",
+    "output_path = f\"{folder_path}/converter\"\n",
     "media_path = f\"{output_path}/media\"\n",
     "\n",
     "# Create output and media directories if they do not exist.\n",
@@ -469,7 +469,9 @@
     "    classes = []\n",
     "\n",
     "    while index < len(md_content):\n",
+    "\n",
     "        # no need to check index range since there is always at least 2 characters left\n",
+    "        # display math indicator\n",
     "        if md_content[index: index+2] == \"$$\":\n",
     "            display_math = []\n",
     "            index += 2\n",
@@ -479,6 +481,7 @@
     "            classes.append(DisplayMath(\"\".join(display_math)))\n",
     "            index += 2\n",
     "        \n",
+    "        # inline math indicator\n",
     "        elif md_content[index] == \"$\":\n",
     "            inline_math = []\n",
     "            index += 1\n",
@@ -488,6 +491,7 @@
     "            classes.append(InlineMath(\"\".join(inline_math)))\n",
     "            index += 1\n",
     "\n",
+    "        # otherwise just regular text\n",
     "        else:\n",
     "            regular_text = []\n",
     "            while index < len(md_content) and md_content[index] != \"$\":\n",
@@ -1616,7 +1620,7 @@
    "source": [
     "questions = full_json_question_set[\"questions\"]\n",
     "\n",
-    "in2lambda_questions = []\n",
+    "in2lambda_set = []\n",
     "\n",
     "# Loop over all questions and question_answers and use in2lambda API to create a JSON.\n",
     "for question_idx, question_dict in enumerate(questions, start=1):\n",
@@ -1647,10 +1651,10 @@
     "        parts=parts,\n",
     "        images=image_paths\n",
     "    )\n",
-    "    in2lambda_questions.append(question)\n",
+    "    in2lambda_set.append(question)\n",
     "\n",
     "try:\n",
-    "    Module(in2lambda_questions).to_json(f\"{output_path}/out\")\n",
+    "    Set(questions=in2lambda_set).to_json(f\"{output_path}/out\")\n",
     "    print(\"JSON output successfully created.\")\n",
     "except Exception as e:\n",
     "    print(f\"Error creating JSON output: {e}\")"