This resulted in Python package I call OnPrem.LLM.
In the documentation, there are examples for how to use it for information extraction, text generation, retrieval-augmented generation (i.e., chatting with documents on your computer), and text-to-code generation: https://amaiya.github.io/onprem/
Enjoy!