Skip to content

Conversation

@ygtangg
Copy link
Contributor

@ygtangg ygtangg commented Nov 26, 2024

Why are these changes needed?

This Data Explorer provides a visually engaging and interactive tool that allows users to explore and draw insights from the leaderboard (conversation) data. It fosters transparency in the ranking process and enhances users’ trust in our leaderboard.
(link to the explorer website: link)
Screenshot 2024-11-25 at 17 58 20
Screenshot 2024-11-25 at 17 58 40

Related issue number (if applicable)

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

@aangelopoulos
Copy link
Collaborator

You're amazing! So fast!

@aangelopoulos
Copy link
Collaborator

Really wonderful job :) will review

@aangelopoulos
Copy link
Collaborator

Looks wonderful. One bug: Dark mode causes issues because the text doesn't turn white. Can you fix? @ygtangg

image

@aangelopoulos aangelopoulos self-requested a review November 26, 2024 20:21
@ygtangg
Copy link
Contributor Author

ygtangg commented Nov 27, 2024

I fixed the dark mode issue on the website. No change to the iframe link is needed.

@aangelopoulos
Copy link
Collaborator

Where is fix? I don't see another commit or PR.

@ygtangg
Copy link
Contributor Author

ygtangg commented Nov 27, 2024

I changed the html website, whose files are in the arena-leaderboard-v2 repo. Do you think we should move it in here as well?

@aangelopoulos
Copy link
Collaborator

Oh got it!
Good question. Let's separate for now, and we can merge at some point later.

@aangelopoulos
Copy link
Collaborator

LGTM. @infwinston ?

@lisadunlap
Copy link
Collaborator

this is suuuuper cool! @aangelopoulos @infwinston lmk if yall need any help to get this merged, this will be an amazing feature

Comment on lines 99 to 104
model_keys = ['chatgpt-4o-latest', 'gemini-1.5-pro-exp-0827','gpt-4o-mini-2024-07-18','claude-3-5-sonnet-20240620','gemini-1.5-flash-exp-0827','llama-3.1-405b-instruct','gemini-1.5-pro-api-0514','mistral-large-2407','reka-core-20240722','gemini-1.5-flash-api-0514', 'deepseek-coder-v2-0724','yi-large','llama-3-70b-instruct','qwen2-72b-instruct','claude-3-haiku-20240307','llama-3.1-8b-instruct','mistral-large-2402','command-r','mixtral-8x22b-instruct-v0.1','gpt-3.5-turbo-0613']
output_tokens_per_USD = [66.66666667000001,200.0,1666.666667,66.66666667000001,3333.333333,333.3333333,200.0,166.6666667,166.6666667,3333.333333,3333.333333,333.3333333,1265.8227849999998,1111.111111,800.0,11111.11111,166.6666667,666.6666667,166.6666667,500.0]
score=[1316.1559008799543,1300.8583398843484,1273.6004783067303,1270.113546648134,1270.530573909608,1266.244657076764,1259.2844314017723,1249.8268751367714,1229.2148108171098,1226.8769924152105,1214.5634252743123,1212.4668382698005,1206.3236747009742,1186.7832147344182,1178.5484948812955,1167.8793593807711,1157.271872307139,1148.6665817312062,1147.0325504217642,1117.0289441863001]
fig = px.scatter(x=output_tokens_per_USD, y=score, title="Quality vs. Cost Effectiveness", labels={
"output_tokens_per_USD": "# of output tokens per USD (in thousands)",
"score": "Arena Score"}, log_x=True, text=model_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's probably not push this info here. it'll be hard to maintain/update.

Copy link
Contributor

@sophie200 sophie200 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes yes, will upload plotly graph to the google storage and then embed with iframe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

@ygtangg ygtangg Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. It's added when I run the server. I can remove it.

@infwinston infwinston merged commit 1ffd4a6 into lm-sys:main Jan 14, 2025
1 check passed
adityamittal13 pushed a commit to adityamittal13/FastChat that referenced this pull request Feb 4, 2025
Co-authored-by: Sophie Xie <sxie2@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants