|
1 | 1 | # README |
2 | 2 |
|
3 | 3 | ## Overview |
4 | | -This Jupyter Notebook (`file_name.ipynb`) is designed for processing scientific documents, extracting mathematical expressions, and formatting them in Markdown. It leverages Mathpix and OpenAI's LLM capabilities for text transformation. |
| 4 | +This Jupyter Notebook (`converter.ipynb`) is designed for processing question sets and tutorials, extracting mathematical expressions, formatting them in Markdown then converting it into Lambda Feedback compatible JSONs. It leverages Mathpix, OpenAI's LLM capabilities and post processing for text transformation. |
5 | 5 |
|
6 | 6 | ## Requirements |
7 | 7 | Ensure you have the following installed: |
8 | 8 | - Python 3.8+ |
9 | 9 | - `pip install -r requirements.txt` |
10 | 10 |
|
11 | 11 | ## Setup |
12 | | -1. Create a `.env` file in the root directory and add your OpenAI API keys: |
| 12 | +1. Create a `.env` file in the root directory and add your OpenAI and MathPix API keys: |
13 | 13 | ```env |
14 | 14 | OPENAI_API_KEY=<your-openai-api-key> |
15 | | - OPENAI_MODEL=<your-openai-model> |
16 | 15 | MATHPIX_API_KEY=<your-mathpix-key> |
17 | 16 | MATHPIX_APP_ID=<your-mathpix-id> |
18 | 17 | ``` |
19 | | -4. Open `file_name.ipynb` and execute the cells to process your documents. |
| 18 | +2. Open `converter.ipynb` and execute the cells to process your documents. |
20 | 19 |
|
21 | | -## Notes |
22 | | -- Ensure your API key and endpoint are correct, as they are required for LLM functionality. |
23 | | -- The notebook is designed for scientific documents, but can be extended to other text formats. |
| 20 | +#### Notes |
| 21 | +- Ensure your API keys are correct, as they are required for LLM functionality. |
24 | 22 |
|
25 | 23 | ## How to use |
26 | | -Place a pdf of your choice into the folder, `/conversion_content`. Name the pdf file as `example.pdf`. |
27 | | -Run the converter in Jupiter. A folder with all the convertion content will be produced. |
28 | | -for `mathpix_to_llm_to_in2lambda_to_JSON.ipynb`, it will produce a folder called `/mathpix_to_llm_to_in2lambda_to_JSON_out`. |
29 | | -This will contain all the output of the converter. |
30 | | - |
31 | | -There is a markdown file called `example.md` inside `/mathpix_to_llm_to_in2lambda_to_JSON_out`, this is the markdown version of the pdf. |
32 | | -As Mathpix rather reliably generates a consistent markdown version of the pdf, the converter will simply start from `example.md`. |
33 | | -Meaning that if you wish to convert a different pdf, you must delete `example.md` first. |
| 24 | +1. Place a pdf (the set of questions) of your choice into the folder, `/conversion_content/input`. |
| 25 | + |
| 26 | +2. Ensure only 1 pdf is within the ./input folder, as the converter chooses one pdf in the folder in an undefined manner (likely alphabetically). |
| 27 | + |
| 28 | +3. Run the converter in Jupiter. The folder `/conversion_content/converter` will be created if it does not exist yet. |
| 29 | + |
| 30 | +#### Convertion process |
| 31 | +1. Within `/conversion_content/converter`, it will create another folder, `converter/conversion_content/media`, this will hold all the images that MathPix extracted. |
| 32 | + |
| 33 | +2. A file called `exmaple.md` will be made within `/conversion_content/converter` if it does not exist yet, this is the markdown file produced by MathPix after scanning the pdf. |
| 34 | + |
| 35 | +3. Note that the current program will keep using the same `example.md` unless it is deleted, this is to reduce MathPix tokens as it almost alway produce identical markdown files with the same pdf. |
| 36 | +This means that to convert a different pdf file, you must also delete `example.md`. |
| 37 | + |
| 38 | +#### Notes |
| 39 | +Please read `assumptions.txt` for things that the converter assumes. If these assumptions are not obeyed, the converter may struggle and produce odd results. |
| 40 | + |
| 41 | +## Evaluation |
| 42 | +I believe this converter should be able to be integrated with the platform's API. |
| 43 | + |
| 44 | +The converter does have its flaws and there are definitely areas it can still improve on, such as being able to reliably taking in extremely messy inputs or produce more than just questions and solutions (answer box). |
| 45 | + |
| 46 | +Within the boundary of its assumptions however, it works very reliably and well. |
0 commit comments