LLM Pricing
Discover the best LLM API models for your budget with our free comparison tool. Quick, up-to-date pricing from top providers at your fingertips!
Source | ||||||
---|---|---|---|---|---|---|
OpenAI | gpt-4-32k | 32K | $60 | $120 | March 16, 2024 | |
OpenAI | gpt-4 | 8K | $30 | $60 | March 16, 2024 | |
OpenAI | gpt-4o | 128K | $5 | $15 | May 16, 2024 | |
OpenAI | gpt-4o-2024-08-06 | 128K | $2.5 | $10 | August 16, 2024 | |
OpenAI | gpt-4o-mini | 128K | $0.15 | $0.6 | July 19, 2024 | |
OpenAI | gpt-4o-mini-2024-07-18 | 128K | $0.15 | $0.6 | July 19, 2024 | |
OpenAI | o1-preview | 128K | $15 | $60 | September 12, 2024 | |
OpenAI | o1-preview-2024-09-12 | 128K | $15 | $60 | September 12, 2024 | |
OpenAI | o1-mini | 128K | $3 | $12 | September 12, 2024 | |
OpenAI | o1-mini-2024-09-12 | 128K | $3 | $12 | September 12, 2024 | |
OpenAI | gpt-4-turbo-2024-04-09 | 128K | $10 | $30 | April 11, 2024 | |
OpenAI | gpt-4-0125-preview | 128K | $10 | $30 | March 16, 2024 | |
OpenAI | gpt-4-1106-preview | 128K | $10 | $30 | March 16, 2024 | |
OpenAI | gpt-4-vision-preview | 128K | $10 | $30 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-0125 | 16K | $0.5 | $1.5 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-instruct | 4K | $1.5 | $2 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-1106 | 4K | $1 | $2 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-0613 | 4K | $1.5 | $2 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-16k-0613 | 4K | $3 | $4 | March 16, 2024 | |
OpenAI | gpt-3.5-turbo-0301 | 4K | $1.5 | $2 | March 16, 2024 | |
Azure | gpt-4-32k | 32K | $60 | $120 | March 16, 2024 | |
Azure | gpt-4 | 8K | $30 | $60 | March 16, 2024 | |
Azure | gpt-4-turbo | 128K | $10 | $30 | March 16, 2024 | |
Azure | gpt-4-turbo-vision | 128K | $10 | $30 | March 16, 2024 | |
Azure | gpt-3.5-turbo-0125 | 16K | $0.5 | $1.5 | March 16, 2024 | |
Azure | gpt-3.5-turbo-instruct | 4K | $1.5 | $2 | March 16, 2024 | |
Anthropic | claude-3.5-sonnet | 200K | $3 | $15 | June 26, 2024 | |
Anthropic | claude-3-opus | 200K | $15 | $75 | March 16, 2024 | |
Anthropic | claude-3-sonnet | 200K | $3 | $15 | March 16, 2024 | |
Anthropic | claude-3-haiku | 200K | $0.25 | $1.25 | March 16, 2024 | |
Anthropic | claude-2.1 | 200K | $8 | $24 | March 16, 2024 | |
Anthropic | claude-2.0 | 100K | $8 | $24 | March 16, 2024 | |
Anthropic | claude-instant-1.2 | 100K | $0.8 | $2.4 | March 16, 2024 | |
AWS | jurassic-2-ultra | 32K | $18.8 | $18.8 | March 16, 2024 | |
AWS | jurassic-2-mid | 32K | $12.5 | $12.5 | March 16, 2024 | |
AWS | titan-text-lite | 32K | $0.3 | $0.4 | March 16, 2024 | |
AWS | titan-text-express | 32K | $0.8 | $1.6 | March 16, 2024 | |
AWS | claude-instant | 32K | $0.8 | $2.4 | March 16, 2024 | |
AWS | claude-2.0/2.1 | 32K | $8 | $24 | March 16, 2024 | |
AWS | claude-3-sonnet | 32K | $3 | $15 | March 16, 2024 | |
AWS | claude-3-haiku | 32K | $0.25 | $1.25 | March 16, 2024 | |
AWS | command | 32K | $1.5 | $2 | March 16, 2024 | |
AWS | command-light | 32K | $0.3 | $0.6 | March 16, 2024 | |
AWS | llama-2-chat-13B | 32K | $0.75 | $1 | March 16, 2024 | |
AWS | llama-2-chat-70B | 32K | $1.95 | $2.56 | March 16, 2024 | |
AWS | mistral-7b | 32K | $0.15 | $0.2 | March 16, 2024 | |
AWS | mistral-8x7b | 32K | $0.45 | $0.7 | March 16, 2024 | |
Google | gemini-1.0-pro | 32K | $0.5 | $1.5 | September 16, 2024 | |
Google | gemini-1.5-pro | 128K | $1.25 | $5 | October 4, 2024 | |
Google | gemini-1.5-pro | 2M | $2.5 | $10 | October 4, 2024 | |
Google | gemini-1.5-flash | 128K | $0.08 | $0.3 | August 11, 2024 | |
Google | gemini-1.5-flash | 1M | $0.15 | $0.6 | October 11, 2024 | |
Google | gemini-1.5-flash-8B | 128K | $0.04 | $0.15 | October 11, 2024 | |
Google | gemini-1.5-flash-8B | 1M | $0.08 | $0.3 | October 11, 2024 | |
Google | palm-2-for-chat | 8K | $0.25 | $0.5 | March 16, 2024 | |
Google | palm-2-for-chat-32k | 32K | $0.25 | $0.5 | March 16, 2024 | |
Google | palm-2-for-text | 8K | $2.5 | $7.5 | March 16, 2024 | |
Google | palm-2-for-text-32k | 32K | $2.5 | $5 | March 16, 2024 | |
Mistral | mistral-large | 32K | $8 | $24 | March 16, 2024 | |
Mistral | mistral-medium | 32K | $2.7 | $8.1 | March 16, 2024 | |
Mistral | mistral-small | 32K | $2 | $6 | March 16, 2024 | |
Mistral | mixtral-8x7b | 32K | $0.7 | $0.7 | March 16, 2024 | |
Mistral | mixtral-8x22b | 64K | $2 | $6 | April 19, 2024 | |
Mistral | mistral-7b | 32K | $0.25 | $0.25 | March 16, 2024 | |
Cohere | command-r-plus | 128K | $3 | $15 | April 9, 2024 | |
Cohere | command-r | 4K | $0.5 | $1.5 | March 16, 2024 | |
Cohere | command-light | 4K | $0.3 | $0.6 | March 16, 2024 | |
Cohere | command-light-fine-tuned | 4K | $0.3 | $0.6 | March 16, 2024 | |
Groq | llama-2-70b | 4K | $0.7 | $0.8 | March 16, 2024 | |
Groq | llama-2-7b | 2K | $0.1 | $0.1 | March 16, 2024 | |
Groq | mixtral-8x7b | 32K | $0.27 | $0.27 | March 16, 2024 | |
Groq | gemma-7b | 8K | $0.1 | $0.1 | March 16, 2024 | |
Databricks | DBRX | 32K | $2.25 | $6.75 | April 1, 2024 | |
Databricks | llama-2-70b | 4K | $2 | $6 | April 1, 2024 | |
Databricks | mixtral-8x7b | 32K | $1.5 | $1.5 | April 1, 2024 | |
Databricks | mpt-30b | 32K | $1 | $1 | April 1, 2024 | |
Databricks | mpt-30b | 8K | $1 | $1 | April 1, 2024 | |
Databricks | llama-2-13b | 4K | $0.95 | $0.95 | April 1, 2024 | |
Databricks | mpt-7b | 8K | $0.5 | $0.5 | April 1, 2024 | |
Databricks | mpt-7b | 512 | $0.5 | $0.5 | April 1, 2024 | |
Cloudflare | llama-2-7b-chat-fp16 | 2K | $0.56 | $6.66 | April 19, 2024 | |
Cloudflare | llama-2-7b-chat-int8 | 2K | $0.16 | $0.24 | April 19, 2024 | |
Cloudflare | mistral-7b-instruct | 32K | $0.11 | $0.19 | April 19, 2024 |
LLM Pricing: Quick Overview
Hey there! Let's dive into the fascinating world of AI and the different flavors of Large Language Models (LLMs) offered by the big players like OpenAI, Anthropic, Google, Cohere, and Meta. If you're thinking about incorporating these brainy bots into your projects, getting a handle on their pricing is pretty essential. So, let's break it down, shall we?
The Lowdown on Tokens
First off, the pricing for these AI wonders usually revolves around something called "tokens." Imagine a token as a tiny slice of a word. To put it in perspective, 1,000 tokens are roughly equivalent to about 750 words. For example, the sentence "This paragraph is 5 tokens" counts as 5 tokens itself.
A handy rule of thumb is that in English, a token is about four characters long, which works out to roughly three-quarters of a word. If you're working with languages other than English, like Japanese, the math changes a bit.
What's the Deal with Context Length?
When we talk about LLMs, especially those from OpenAI, you'll often hear about "context length." This is a key concept because it affects how well the model performs, what it can do, and, yep, how much it costs.
So, What Exactly is Context Length?
Think of context length as the model's short-term memory for the task at hand. It's the amount of info (or number of tokens) the model can juggle at any given moment. Say a model has a context length of 8,000 tokens; it means it can consider up to 8,000 tokens from what you feed it in one go.
Why Should You Care About Context Length?
- Task Complexity: Bigger context lengths let the model tackle more complex stuff, like summarizing a long read or digging into detailed documents.
- Smooth Conversations: For chatbots, a longer context means the model can remember more of the chat, leading to replies that make more sense and are more on point.
- Price Tag: Generally, the longer the context length, the pricier the model because it needs more computing oomph.
Different Models for Different Needs
The big names in AI have cooked up a variety of models, each with its own strengths and price points, and they usually charge per 1,000 tokens.
-
OpenAI GPT-4: This one's a bit of a know-it-all, great at following complex instructions and solving tough problems. It's pricier and not the fastest kid on the block. The new GPT-4 Turbo version, though, is three times cheaper and can handle a whopping 128K tokens at once! Also, you can access it through Microsoft's Azure OpenAI Service.
-
OpenAI GPT-3.5 Turbo: Optimized for chit-chat, making it a go-to for chatbots and conversational interfaces. It's speedy and won't break the bank. Available through Microsoft's Azure OpenAI Service too.
-
Anthropic's Claude 3: Known for its impressive 200k token context length, making it a champ at summarizing or handling Q&As on hefty documents. The trade-off? It's on the slower and pricier side.
-
Llama 2: Meta's gift to the world, Llama 2 is an open-source model that's pretty much on par with GPT-3.5 Turbo in performance and can even give GPT-4 a run for its money in English text summarization—at 30x less cost! The catch? It's English-only.
-
Gemini: Google's latest, split into Gemini Ultra, Gemini Pro, and Gemini Nano, announced on December 6, 2023. Gemini Ultra is eyeing the throne currently held by OpenAI's GPT-4, while Gemini Pro is more akin to GPT-3.5 in terms of performance.
-
PaLM 2: An older model from Google that shines in multilingual, reasoning, and coding tasks. Trained on texts in over 100 languages, it's a whiz at navigating complex language nuances and boasts impressive logic and coding skills.
-
Mistral: A newcomer on the scene, Mistral AI has released some nifty open-source models that are both fast and affordable. Mistral 7B and Mistral 8x7B (Mixtral) are standout options, offering performance comparable to GPT-3.5 Turbo at 2.5x less cost. Mistral Large, though private, is showing promise in reasoning tasks across several languages.
-
DBRX: General-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to the measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.
And there you have it—a whirlwind tour of the LLM pricing landscape. Whether you're building the next great app or just dabbling in AI, there's a model out there that fits the bill. Happy coding!