Cache System
The cache feature is highly recommended.
ProxAI provides a way to cache API calls. This is very useful for speeding up provider API calls and reducing costs.
- By default, the cache feature is disabled.
Simple Cache Usage
Setting the cache_path
parameter in the px.connect()
function enables the
cache.
px.connect(cache_path='~/proxai-cache')
- This will create necessary cache files in the
~/proxai-cache
directory.
Let’s start with a simple example. We want to check the square root of numbers from 1 to 10 by following code:
import proxai as px
import datetime
px.connect(cache_path='~/proxai-cache')
for i in range(1, 11):
start_time = datetime.datetime.now()
answer = px.generate_text(
prompt=f'What is the square root of {i}?')
end_time = datetime.datetime.now()
print(f'{(end_time - start_time).total_seconds()} seconds: {answer}')
We can run this script in shell and see the output:
$ python ai_square_root_experiment.py
0.590122 seconds: The square root of 1 is 1.
0.375481 seconds: The square root of 2 is approximately 1.41421356.
0.425706 seconds: The square root of 3 is approximately 1.732.
0.462752 seconds: The square root of 4 is 2.
0.359494 seconds: The square root of 5 is approximately 2.2360679775.
0.374919 seconds: The square root of 6 is approximately 2.449.
0.405291 seconds: The square root of 7 is approximately 2.646.
0.31904 seconds: The square root of 8 is approximately 2.83.
0.328825 seconds: The square root of 9 is 3.
0.372298 seconds: The square root of 10 is approximately 3.16227766017.
This code will make 10 API calls to the provider and will cache the results. Let’s say we want to same experiment but with 20 numbers from 1 to 20.
...
for i in range(1, 21):
...
$ python ai_square_root_experiment.py
0.001944 seconds: The square root of 1 is 1.
0.001516 seconds: The square root of 2 is approximately 1.41421356.
0.001444 seconds: The square root of 3 is approximately 1.732.
0.001411 seconds: The square root of 4 is 2.
0.001459 seconds: The square root of 5 is approximately 2.2360679775.
0.001516 seconds: The square root of 6 is approximately 2.449.
0.001484 seconds: The square root of 7 is approximately 2.646.
0.001488 seconds: The square root of 8 is approximately 2.83.
0.001566 seconds: The square root of 9 is 3.
0.001579 seconds: The square root of 10 is approximately 3.16227766017.
0.457652 seconds: The square root of 11 is approximately 3.31662479036.
0.300274 seconds: The square root of 12 is approximately 3.464.
0.409981 seconds: The square root of 13 is approximately 3.60555.
0.281187 seconds: The square root of 14 is approximately 3.74.
0.3315 seconds: The square root of 15 is approximately 3.87298.
0.321044 seconds: The square root of 16 is 4.
0.394377 seconds: The square root of 17 is approximately 4.123105625617661.
0.819696 seconds: The square root of 18 is approximately 4.2426.
0.411953 seconds: The square root of 19 is approximately 4.36.
0.305057 seconds: The square root of 20 is approximately 4.47.
This time, the code get results for the numbers from 1 to 10 from the cache. Results for the numbers from 11 to 20 are computed by the provider.
Generate Text Cache Options
The px.generate_text()
function has two cache related parameters
(see all parameters in Generate Text):
use_cache
: By default, this parameter is set toTrue
. If set toFalse
, the function will not use the cache and always make the provider API call.unique_response_limit
: This parameter overrides theunique_response_limit
parameter in thepx.connect()
function. See the Unique Response Limit section below for more details. It has the same logic for unique responses but this parameter overrides and sets the limit for this specificpx.generate_text()
call.
Advanced Cache Options
For more control over the cache, you can use the px.connect()
function with
the cache_options
parameter.
px.connect(
cache_path='~/proxai-cache',
cache_options=px.CacheOptions(
unique_response_limit=3
retry_if_error_cached=True,
clear_query_cache_on_connect=True,
clear_model_cache_on_connect=True,
model_cache_duration=1200))
Query Signature
Before moving to options, let’s understand how the cache works. Each provider
query has a unique signature. This signature generated from the parameters of
the px.generate_text()
function call such as prompt
, model
, temperature
,
etc. If the same query is made with different parameters, the signature will be
different. This signature is used to identify the query in the cache.
Unique Response Limit
By default, the cache will store the response of the query immediately with proper query signature. If the same query is made again, the cache will return the stored response.
However, if the unique_response_limit
parameter is set, the cache will store
multiple responses for the same query. The number of responses is limited by
the unique_response_limit
parameter. This allows user to make same query
multiple times and get different responses. After the limit is reached, the
responses are returned in a round-robin manner.
px.connect(
cache_path='~/proxai-cache',
cache_options=px.types.CacheOptions(unique_response_limit=5))
for i in range(15):
answer = px.generate_text(
prompt=f'Give me a random number between 1 and 1000.')
print(f'{i}: {answer}')
$ python ai_unique_response_limit.py
0: 473
1: 553
2: 764
3: 417
4: 567
5: 473
6: 553
7: 764
8: 417
9: 567
10: 473
11: 553
12: 764
13: 417
14: 567
The first 5 responses are from actual API calls. The rest of the responses are from the cache in round-robin manner.
Retry If Error Cached
Sometimes, the providers return an error. This “query, error” pair is also
stored in the cache. When the same query with same signature is made again, the
cache will return metadata of the last query and error. However, if the
retry_if_error_cached
parameter is set to True
, the cache will retry the
query and make the provider API call.
Let’s say we run the following code and it gave us budget exceeded error from the provider.
px.connect(cache_path='~/proxai-cache')
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
QueryResponseRecord(
error=(
'Error: Budget exceeded. Please login to your account and increase '
'your budget.'))
Let’s say we login to our account and increase the budget. Now, if we run the same query again, the cache will return the error and metadata of the last query without making the provider API call.
px.connect(cache_path='~/proxai-cache')
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
QueryResponseRecord(
error=(
'Error: Budget exceeded. Please login to your account and increase '
'your budget.'))
To fix this, we can set the retry_if_error_cached
parameter to True
. This
will make the cache retry the query to the provider and return the result.
px.connect(
cache_path='~/proxai-cache',
cache_options=px.types.CacheOptions(retry_if_error_cached=True))
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
The square root of 100 is 10.
Note: The reasoning behind why this flag is set to False
by default is
that it is very common to get error for some providers or models but user just
want to skip these for now. Retrying the broken queries again and again in
experimentation phase can be very annoying.
Model Cache vs Query Cache
ProxAI provides two distinct types of caching mechanisms:
-
Query Cache: This caches the results of AI queries made through
px.generate_text()
function calls.- Stores prompts, responses, and query parameters
- Helps reduce repeated API calls for identical queries
- Controlled by options like
unique_response_limit
andclear_query_cache_on_connect
- Directly impacts your costs by preventing duplicate provider API calls
-
Model Cache: This caches the availability and metadata of models from different providers.
- Stores which models are available and which have failed
- Helps speed up functions like
px.models.list_models()
andpx.models.list_providers()
- Controlled by options like
disable_model_cache
,clear_model_cache_on_connect
, andmodel_cache_duration
- Prevents unnecessary checks to provider APIs during model discovery
For example, if you need to regularly check available models but have changed your API keys:
# Clear the model cache to force a fresh check of available models
px.connect(
cache_path='~/proxai-cache',
cache_options=px.CacheOptions(clear_model_cache_on_connect=True))
# Now this will re-evaluate all models with new API keys
available_models = px.models.list_models()
You can also set a duration for how long the model cache should be valid:
# Set model cache to expire after 20 minutes (1200 seconds)
px.connect(
cache_path='~/proxai-cache',
cache_options=px.CacheOptions(model_cache_duration=1200))
Note: Model cache works even without specifying a cache_path
, while query cache requires a cache_path
to be set.
Cache Options Parameters
Option | Type | Default Value | Description |
---|---|---|---|
cache_path | Optional[str] | None | Path of the root cache directory. • If cache_path is not set in px.connect() , this path is
used as the root cache path. • Both cache_path in px.connect() and cache_path in
px.CacheOptions() cannot be set at the same time. It raises an
error if both are set. |
unique_response_limit | Optional[int] | 1 | Maximum number of unique responses to store for the same query. |
retry_if_error_cached | Optional[bool] | False | If True , the cache will retry the queries for previously returned
errors. |
clear_query_cache_on_connect | Optional[bool] | False | If True , clears all cached queries when connecting. Useful for
starting with a fresh empty cache. |
disable_model_cache | Optional[bool] | False | If True , disables the model cache. Forces
px.models.list_models() to re-evaluate all providers and models
on next call. |
clear_model_cache_on_connect | Optional[bool] | False | If True , clears the model cache when connecting. Forces
px.models.generate_text() to re-evaluate all providers and models
on next call. |
model_cache_duration | Optional[int] | None | Duration of the model cache in seconds. If None , the model cache
will be disabled. |
Privacy
To use caching, we need to store queries and responses in the cache directory.
This means that the cache directory will contain sensitive information such as
prompt
, response
, etc. from your most recent queries.
px.connect(cache_path='~/proxai-cache')
When setting the cache_path
parameter, be careful about the privacy and
security of your cache directory.