To do that, AI needs to plan its actions and work on code incrementally, one piece at a time.
The challenge is the context size: the entire code base + plan does not fit into the context.
My approach: Add only relevant parts of the code base to the context.
Specifically, AI generation engine implements two distinct phases: planning and coding.
In the planning phase, GPT-4 receives a tree of tasks and a summary of code base (list of files and their descriptions). It replies with updated tasks (i.e. it is able to create sub-tasks as needed), the task it wants to work on in the next step and a list of relevant code fragments for that task.
In the coding phase, it receives the task description (as a tree, in YAML) and relevant code fragments. It replies with new generated or updated files, code fragments, and status: Was the task done? Do we need to break it into subtasks?
In both cases bot can also show it's "observations" before the output, as I believe it helps with planning code generation/planning.
Results: Currently I have only tested extremely basic scenarios. It needs a lot of work to be usable in practice. But I'd say it seems to work more-or-less as expected.
Example 1: "Write a reddit-like backend in Kotlin, using Ktor. Start by planning and creating subtasks."
This was the entire task which bot received, no other data.
Results:
Link to output: https://gist.github.com/killerstorm/dd6e26dc80064b7fc731d583f8d740c1#file-ktor_reddit-txt-L9
In short, it formulated reasonably-sounding subtasks and started generating code, e.g. made a Post model. It was aborted at that step due to GPT-4 API failure, it's not reliable yet.
Example 2: "Write a reddit clone in TypeScript. Start by planning and creating subtasks."
Link to output: https://gist.github.com/killerstorm/e3c50bea3ca3463c8b2d947dcfd80b84
You can see more work here, but I expect that it's less interesting.
Challenges: I'd say it can work pretty well in file-at-once mode. Making _fragments_ of the file is more challenging because it's not a well-defined concept. FWIW GPT-4 largely ignored what I wrote about file fragments and made entire files at once, which was the right decision.
I will post link to script in the comment to this post.