GenderBench is an open-source evaluation benchmark that measures gender biases in large language models. This is my attempt to decompose this pretty complex and difficult topic into interpretable measures. My goal was to systematize the evaluation of unfair behavior in LLMs and help other developer and researchers do their own tests.What is linked here is the report that is generated from GenderBench logs that quantifies how LLMs behave in various situations when gender can be considered.
Links:
Repository - https://github.com/matus-pikuliak/genderbench
Report - https://genderbench.readthedocs.io/latest/_static/reports/ge...