This repository contains companion data sets for LLM publications.
Companion data to Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language Models
Please cite that paper if making use of this data. E.g. using the following bibtex snippet:
@article{sandholm2024,
title={{Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language Models}},
author={Thomas Sandholm and Sayandev Mukherjee and Bernardo A. Huberman},
journal={arXiv preprint arXiv:2402.06053},
year={2024}
}The generated data dumps are available in /aidea.
They are organize by original problem statement that generated them:
| Data | Problem Statement |
|---|---|
| timeline | Software project timelines are often underestimated, which leads to high costs. |
| employee | It is difficult to measure employee satisfaction in an unbiased way. |
| startup | It is not easy for early startups to find a customer base willing to try new technology. |
| data | Companies struggle with gaining insights from large volumes and high velocity of data. |
| satisfaction | It is hard to track and measure customer satisfaction across large geographies. |
| invest | It is difficult to plan investments in an uncertain economy. |
| innovation | It is difficult to create innovation opportunities without introducing too much process and hampering creativity. |
| talent | Retaining high-performing talent is hard in competitive emerging markets. |
| ml | Large machine learning models are expensive and time consuming to train. |
| privacy | Ensuring privacy of customers is difficult while leveraging their data for business insights. |
Problems generated from solutions have the prefix gprob.
The solution they are generated from has the prefix gsol.
The file naming convention is:
<type>.<index>.<temperature>.txt
where type can be gsol, gprob, sol or prob for solutions for generated
problems, generated problems, solutions and problems respectively. The
index denotes the order in which the solution was generated, starting
with solution 1 which is the solution to the original problem. The ordering
is determined by a depth-first search of the related problem and generated
problem tree. The temperature is the LLM temperature set during solution
and problem generation from a prompt. The temperatures used include 0.5,0.6,0.7,0.8, 0.9,
1.0, and 1.1. The actual temperature fed into the LLM is a uniform random number
in the interval [temp,temp + 0.1].