Added compute script, removed unused scripts, cleaned up script forma…#10
Added compute script, removed unused scripts, cleaned up script forma…#10lintyfresh wants to merge 5 commits intomainfrom
Conversation
…tting, updated README
natalya-patrikeeva
left a comment
There was a problem hiding this comment.
Now that the scripts are correct, I made a few code suggestions to make it more elegant, and performant (avoid loop, use vectorized ops, doc your functions) but looks great!
|
|
||
| # what name should the investigate_df be saved under? change each time you | ||
| # run this script to avoid overriding | ||
| investigate_json = "investigate_2" |
There was a problem hiding this comment.
a better way is to check if file exists, and remove and save with same name? We can probably use one log file and append to it. Or are you saying you want to keep several copies like investigate_1, investigate_2?
| # run this script to avoid overriding | ||
| investigate_json = "investigate_2" | ||
|
|
||
| # what name should the investigate_df be saved under? change each time you |
There was a problem hiding this comment.
not sure about this comment and below file metrics_2? is there metrics_1?
|
|
||
| # what name should the investigate_df be saved under? change each time you | ||
| # run this script to avoid overriding | ||
| metrics_json = "metrics_2" |
There was a problem hiding this comment.
same idea as above, let's not make new files unless we can't avoid it
scripts/compute.py
Outdated
|
|
||
|
|
||
| def sort_outputs(s): | ||
| # opens all JSON files within a file path |
There was a problem hiding this comment.
if we want to keep all results in one json, we could have a key for error and filter that later instead of having two files
| # Functions | ||
|
|
||
|
|
||
| def sort_outputs(s): |
There was a problem hiding this comment.
I would pick a more meaningful arg than "s"
There was a problem hiding this comment.
Let's also add doc strings to all functions
scripts/compute.py
Outdated
|
|
||
|
|
||
| def lookup_truth(row): | ||
| # looks up ground truth for a given row based on the benchmark name and |
There was a problem hiding this comment.
doc string here please
scripts/compute.py
Outdated
|
|
||
|
|
||
| def compute_accuracy(row): | ||
| # compares accuracy of llm output to ground_truth based on benchmark_name |
There was a problem hiding this comment.
don't love the if /elses and the repeated return 1 and 0. Can rewrite with less code using functions
scripts/compute_metrics.py
Outdated
| with open("/zfs/projects/students/ltdarc-usf-intern-2025/LLM_benchmarks/inputs/ground_truth.json", "r") as f: | ||
| truth = json.load(f) | ||
| image_index = list(truth.keys()) | ||
| for image in image_index: |
There was a problem hiding this comment.
I think we can load all images at once avoiding the loop
scripts/compute.py
Outdated
| return full_investigate_df | ||
|
|
||
|
|
||
| def lookup_truth(row): |
There was a problem hiding this comment.
I wonder if this function should take in df not a row and run on the whole df not row by row
|
|
||
| full_investigate_df.to_json( | ||
| f"/zfs/projects/students/ltdarc-usf-intern-2025/LLM_benchmarks/outputs/metrics/{investigate_json}.json", | ||
| orient='records') |
There was a problem hiding this comment.
Does this overwrite each run?
…tting, updated README