25%

More Reliable

The API retries exactly once if instructed to do so for any given error OpenAI might return, by increasing the reliability of the overall API by ~25% in my experience.

Multiple Models

Running at the Same Time

At this time ChatGPT provides two main models for chat completion GPT 3.5 and GPT 4.

- GPT 3.5 is 20X cheaper than GPT 4.gpt-3.5-turbo-0125 has a cost per request $0.50 / 1M tokens and output $1.50 / 1M tokens
- gpt-4-turbo-2024-04-09 has a cost per request $10 / 1M tokens and output $30 / 1M tokens

This will always be the case, where the previous model is more restricted in terms of input/output tokens but cheaper. For many use cases, you want to keep your costs low, but avoid the limitation in sizing of the cheaper model. The AI module uses a default model that you can define (in our case gpt-3.5-turbo-0125) and a secondary model (gpt-4-turbo-2024-04-09).

When requests for the default model start erroring out due to timeout or rate limiting, it will automatically fallback to the secondary model (more expensive), additionally, it will internally leverage the secondary model for the user for the next 5 minutes, to give time to the OpenAI key to go back to a good state.

Observability

Keep track of errors and latency

Each request is logged (input and output) additionally if you adopt the Analytics module, for each response status code a metric event is emitted, for you to track remotely.

Additionally, time to execute the requests are also in local development.

AI

25%

Multiple Models

Observability

Launch your app in days, not weeks