What about OpenAI Structured Outputs? This seems to do exactly this.

zackangelo · on Sept 25, 2024

I'm building this type of functionality on top of Llama models if you're interested: https://docs.mixlayer.com/examples/json-output

refulgentis · on Sept 25, 2024

I'm writing a Flutter AI client app, integrates with llama.cpp. I used a PoC of llama.cpp running in WASM, I'm desperate to signal the app is agnostic to AI provider, but it was horrifically slow, ended up backing out to WebMLC.

What are you doing underneath, here? If thats secret sauce, I'm curious what you're seeing in tokens/sec on ex. a phone vs. MacBook M-series.

Or are you deploying on servers?

refulgentis · on Sept 25, 2024

Correct, I think so too, seemed that update must be doing exactly this. tl;dr: in the context of Llama fn calling reliability, you don't need to reach for training, in fact, you'll do it and still have the same problem.