It is possible to reduce these costs by optimizing the number of tokens in the prompt. We call this "minimizing the token complexity". Informally speaking, token complexity is the estimated number of tokens required in input prompt to achieve a task. I understand more theory is required to formally describe token complexity and we are working on a paper for same.
PromptOptimizer provides a set of optimizers (https://promptoptimizer.readthedocs.io/en/latest/reference.h...) and metrics to evaluate optimizations. As a framework, it is very easy to extend it by creating custom optimizers (https://promptoptimizer.readthedocs.io/en/latest/extend/cust...).
There's a lot that can be done but for now we provide a set of preliminary optimizers and their performance evaluation here: https://github.com/vaibkumr/prompt-optimizer#evaluations
There's a lot that can't be done because we don't assume access to model weights, logits or decoding process. We are trying to do the best with just optimizing the number of tokens in input prompts.
My personal favorite is EntropyOptim: https://promptoptimizer.readthedocs.io/en/latest/reference.h... It uses a Masked Language Model to remove low entropy words (they can easily be inferred by good LLMs). Certainly, sometimes we would not want to lose important tokens for which we provide ignore tags in our framework.
No free lunch: The reduction in cost often comes with a loss in LLM performance as can be observed here: https://github.com/vaibkumr/prompt-optimizer#cost-performanc...
The eventual goal of this project is:
1. A one-stop solution for saving LLM API costs. Most of the existing works in this direction focus on vector stores and semantic cache. This project is perfectly compatible with those techniques and might include own implementations in future.
2. Extending "token complexity" theory formally.
3. Open leaderboard style evaluations in terms of minimizing prompt and continuation tokens for solving a task.
4. Be open to contributions and discussions for more diverse future directions.
The project is inspired by Kevin from The Office TV Show. "Me think, why waste time say lot word, when few word do trick."
what you think?