Ah, in my experiments it writes like > 90% of the code correctly. I got the best...

Ah, in my experiments it writes like > 90% of the code correctly.

I got the best results with prompts like:

Given the following python code:

``` Few hundreds python loc here ```

Write tests for the function name_of_function maximizing coverage.

The function in this example had a bit of read/dumps from disk and everything. The code returned correctly created mocks, set up setup and teardown methods and came up with 4 test cases. I only needed to fix the imports, but that's because I just dumped python code without preserving the file structure.

I am amazed how fast these models are evolving.