Starter repository for the HackerRank Orchestrate 24-hour hackathon.
Build a system that verifies visual evidence for damage claims across three object types: cars, laptops, and packages.
Your system will receive claim conversations, one or more submitted images, user claim history, and minimum evidence requirements. It must decide whether the submitted images support the claim, contradict it, or do not provide enough information.
Read problem_statement.md for the full task spec, input/output schema, and allowed values.
- Repository layout
- What you need to build
- Where your code goes
- Quickstart
- Evaluation
- Chat transcript logging
- Submission
- Judge interview
.
├── AGENTS.md # Rules for AI coding tools + transcript logging
├── problem_statement.md # Full task description and I/O schema
├── README.md # You are here
├── code/ # Build your solution here
│ ├── main.py # Suggested terminal entry point
│ └── evaluation/
│ └── main.py # Suggested evaluation entry point
└── dataset/
├── sample_claims.csv # Inputs + expected outputs for development
├── claims.csv # Inputs only; run your system on these rows
├── user_history.csv # Historical claim counts and risk context
├── evidence_requirements.csv # Minimum image evidence requirements
└── images/
├── sample/ # Images referenced by sample_claims.csv
└── test/ # Images referenced by claims.csv
A system that, for each row in dataset/claims.csv, produces one row in output.csv.
Input fields:
| Column | Meaning |
|---|---|
user_id |
User submitting the claim; use this to look up dataset/user_history.csv |
image_paths |
One or more submitted image paths, separated by semicolons |
user_claim |
Chat transcript describing the issue |
claim_object |
car, laptop, or package |
Required output fields:
| Column | Meaning |
|---|---|
evidence_standard_met |
Whether the image set is sufficient to evaluate the claim |
evidence_standard_met_reason |
Short reason for the evidence decision |
risk_flags |
Semicolon-separated risk flags, or none |
issue_type |
Visible issue type |
object_part |
Relevant object part |
claim_status |
supported, contradicted, or not_enough_information |
claim_status_justification |
Concise explanation grounded in the image evidence |
supporting_image_ids |
Image IDs supporting the decision, or none |
valid_image |
Whether the image set is usable for automated review |
severity |
none, low, medium, high, or unknown |
Hard requirements:
- Must read the provided CSV files and local images.
- Must produce
output.csvwith the exact schema inproblem_statement.md. - Must include an evaluation workflow
- Must avoid hardcoded test labels or file-specific answers.
Beyond that you are free to bring your own approach: VLMs, LLMs, structured prompting, rule layers, batching, caching, evaluation pipelines, model comparison, or anything else.
All of your work belongs in code/. The repo ships with empty starter files that you can grow into your full solution.
Suggested conventions:
- Put your main runnable solution in
code/main.py, or document your own entry point clearly. - Put evaluation code under
code/evaluation/or anevaluation/folder included in your finalcode.zip. - Write final predictions to
output.csv.
Clone this repository:
git clone git@github.com:interviewstreet/hackerrank-orchestrate-june26.git
cd hackerrank-orchestrate-june26You are free to use any language or runtime. Python, JavaScript, and TypeScript are all reasonable choices.
The evaluation report should include:
- metrics on
dataset/sample_claims.csv - at least two strategies, prompts, or model configurations compared
- the final strategy used for
output.csv - operational analysis covering model calls, token usage, image usage, approximate cost, runtime, and TPM/RPM considerations
This repo ships with an AGENTS.md that modern AI coding tools may read. It instructs the tool to append conversation turns to a shared log file:
| Platform | Path |
|---|---|
| macOS / Linux | $HOME/hackerrank_orchestrate/log.txt |
| Windows | %USERPROFILE%\hackerrank_orchestrate\log.txt |
You will upload this log as your chat transcript at submission time. The chat transcript means your conversation with the AI coding tool you used to build the system. It is not the runtime logs, reasoning trace, or conversation history produced by the claim-verification agent you are building.
If you use multiple AI tools, include the relevant conversation logs from all of them in the same transcript file. Separate each tool's section with a clear divider and label it with the tool name.
Never paste secrets into the chat. If secrets are needed, use environment variables.
Submit the following files as instructed by HackerRank:
- Code zip: zip your runnable solution, README, prompts/configs, and evaluation folder. Exclude virtualenvs,
node_modules, build artifacts, and unnecessary generated files. - Predictions CSV: your final
output.csvfor all rows indataset/claims.csv. - Chat transcript: the
log.txtfrom the path in Chat transcript logging.
Before submitting, confirm:
output.csvhas one row per row indataset/claims.csv.output.csvhas the exact required columns in the exact required order.- Your evaluation files are included in
code.zip.
After submission, the AI Judge may ask about your approach, implementation decisions, model usage, evaluation strategy, and how you used AI while building the solution.
Be prepared to explain your solution in detail.