Cache System

⭐️

The cache feature is highly recommended.

ProxAI provides a way to cache API calls. This is very useful for speeding up provider API calls and reducing costs.

  • By default, the cache feature is disabled.

Simple Cache Usage

Setting the cache_path parameter in the px.connect() function enables the cache.

px.connect(cache_path='~/proxai-cache')
  • This will create necessary cache files in the ~/proxai-cache directory.

Let’s start with a simple example. We want to check the square root of numbers from 1 to 10 by following code:

import proxai as px
import datetime
 
px.connect(cache_path='~/proxai-cache')
 
for i in range(1, 11):
  start_time = datetime.datetime.now()
  answer = px.generate_text(
      prompt=f'What is the square root of {i}?')
  end_time = datetime.datetime.now()
  print(f'{(end_time - start_time).total_seconds()} seconds: {answer}')

We can run this script in shell and see the output:

$ python ai_square_root_experiment.py
0.590122 seconds: The square root of 1 is 1.
0.375481 seconds: The square root of 2 is approximately 1.41421356.
0.425706 seconds: The square root of 3 is approximately 1.732.
0.462752 seconds: The square root of 4 is 2.
0.359494 seconds: The square root of 5 is approximately 2.2360679775.
0.374919 seconds: The square root of 6 is approximately 2.449.
0.405291 seconds: The square root of 7 is approximately 2.646.
0.31904 seconds: The square root of 8 is approximately 2.83.
0.328825 seconds: The square root of 9 is 3.
0.372298 seconds: The square root of 10 is approximately 3.16227766017.

This code will make 10 API calls to the provider and will cache the results. Let’s say we want to same experiment but with 20 numbers from 1 to 20.

...
for i in range(1, 21):
...
$ python ai_square_root_experiment.py
0.001944 seconds: The square root of 1 is 1.
0.001516 seconds: The square root of 2 is approximately 1.41421356.
0.001444 seconds: The square root of 3 is approximately 1.732.
0.001411 seconds: The square root of 4 is 2.
0.001459 seconds: The square root of 5 is approximately 2.2360679775.
0.001516 seconds: The square root of 6 is approximately 2.449.
0.001484 seconds: The square root of 7 is approximately 2.646.
0.001488 seconds: The square root of 8 is approximately 2.83.
0.001566 seconds: The square root of 9 is 3.
0.001579 seconds: The square root of 10 is approximately 3.16227766017.
0.457652 seconds: The square root of 11 is approximately 3.31662479036.
0.300274 seconds: The square root of 12 is approximately 3.464.
0.409981 seconds: The square root of 13 is approximately 3.60555.
0.281187 seconds: The square root of 14 is approximately 3.74.
0.3315 seconds: The square root of 15 is approximately 3.87298.
0.321044 seconds: The square root of 16 is 4.
0.394377 seconds: The square root of 17 is approximately 4.123105625617661.
0.819696 seconds: The square root of 18 is approximately 4.2426.
0.411953 seconds: The square root of 19 is approximately 4.36.
0.305057 seconds: The square root of 20 is approximately 4.47.

This time, the code get results for the numbers from 1 to 10 from the cache. Results for the numbers from 11 to 20 are computed by the provider.

Generate Text Cache Options

The px.generate_text() function has two cache related parameters (see all parameters in Generate Text):

  • use_cache: By default, this parameter is set to True. If set to False, the function will not use the cache and always make the provider API call.
  • unique_response_limit: This parameter overrides the unique_response_limit parameter in the px.connect() function. See the Unique Response Limit section below for more details. It has the same logic for unique responses but this parameter overrides and sets the limit for this specific px.generate_text() call.

Advanced Cache Options

For more control over the cache, you can use the px.connect() function with the cache_options parameter.

px.connect(
  cache_path='~/proxai-cache',
  cache_options=px.CacheOptions(
      unique_response_limit=3
      retry_if_error_cached=True,
      clear_query_cache_on_connect=True,
      clear_model_cache_on_connect=True,
      model_cache_duration=1200))

Query Signature

Before moving to options, let’s understand how the cache works. Each provider query has a unique signature. This signature generated from the parameters of the px.generate_text() function call such as prompt, model, temperature, etc. If the same query is made with different parameters, the signature will be different. This signature is used to identify the query in the cache.

Unique Response Limit

By default, the cache will store the response of the query immediately with proper query signature. If the same query is made again, the cache will return the stored response.

However, if the unique_response_limit parameter is set, the cache will store multiple responses for the same query. The number of responses is limited by the unique_response_limit parameter. This allows user to make same query multiple times and get different responses. After the limit is reached, the responses are returned in a round-robin manner.

px.connect(
  cache_path='~/proxai-cache',
  cache_options=px.types.CacheOptions(unique_response_limit=5))
 
for i in range(15):
  answer = px.generate_text(
      prompt=f'Give me a random number between 1 and 1000.')
  print(f'{i}: {answer}')
$ python ai_unique_response_limit.py
0: 473
1: 553
2: 764
3: 417
4: 567
5: 473
6: 553
7: 764
8: 417
9: 567
10: 473
11: 553
12: 764
13: 417
14: 567

The first 5 responses are from actual API calls. The rest of the responses are from the cache in round-robin manner.

Retry If Error Cached

Sometimes, the providers return an error. This “query, error” pair is also stored in the cache. When the same query with same signature is made again, the cache will return metadata of the last query and error. However, if the retry_if_error_cached parameter is set to True, the cache will retry the query and make the provider API call.

Let’s say we run the following code and it gave us budget exceeded error from the provider.

px.connect(cache_path='~/proxai-cache')
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
QueryResponseRecord(
  error=(
    'Error: Budget exceeded. Please login to your account and increase '
    'your budget.'))

Let’s say we login to our account and increase the budget. Now, if we run the same query again, the cache will return the error and metadata of the last query without making the provider API call.

px.connect(cache_path='~/proxai-cache')
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
QueryResponseRecord(
  error=(
    'Error: Budget exceeded. Please login to your account and increase '
    'your budget.'))

To fix this, we can set the retry_if_error_cached parameter to True. This will make the cache retry the query to the provider and return the result.

px.connect(
  cache_path='~/proxai-cache',
  cache_options=px.types.CacheOptions(retry_if_error_cached=True))
 
px.generate_text(prompt='What is the square root of 100?')
$ python ai_retry_if_error_cached.py
The square root of 100 is 10.

Note: The reasoning behind why this flag is set to False by default is that it is very common to get error for some providers or models but user just want to skip these for now. Retrying the broken queries again and again in experimentation phase can be very annoying.

Model Cache vs Query Cache

ProxAI provides two distinct types of caching mechanisms:

  1. Query Cache: This caches the results of AI queries made through px.generate_text() function calls.

    • Stores prompts, responses, and query parameters
    • Helps reduce repeated API calls for identical queries
    • Controlled by options like unique_response_limit and clear_query_cache_on_connect
    • Directly impacts your costs by preventing duplicate provider API calls
  2. Model Cache: This caches the availability and metadata of models from different providers.

    • Stores which models are available and which have failed
    • Helps speed up functions like px.models.list_models() and px.models.list_providers()
    • Controlled by options like disable_model_cache, clear_model_cache_on_connect, and model_cache_duration
    • Prevents unnecessary checks to provider APIs during model discovery

For example, if you need to regularly check available models but have changed your API keys:

# Clear the model cache to force a fresh check of available models
px.connect(
  cache_path='~/proxai-cache',
  cache_options=px.CacheOptions(clear_model_cache_on_connect=True))
 
# Now this will re-evaluate all models with new API keys
available_models = px.models.list_models()

You can also set a duration for how long the model cache should be valid:

# Set model cache to expire after 20 minutes (1200 seconds)
px.connect(
  cache_path='~/proxai-cache',
  cache_options=px.CacheOptions(model_cache_duration=1200))

Note: Model cache works even without specifying a cache_path, while query cache requires a cache_path to be set.

Cache Options Parameters

px.types.CacheOptions():

OptionTypeDefault ValueDescription
cache_pathOptional[str]NonePath of the root cache directory.
• If cache_path is not set in px.connect(), this path is used as the root cache path.
• Both cache_path in px.connect() and cache_path in px.CacheOptions() cannot be set at the same time. It raises an error if both are set.
unique_response_limitOptional[int]1Maximum number of unique responses to store for the same query.
retry_if_error_cachedOptional[bool]FalseIf True, the cache will retry the queries for previously returned errors.
clear_query_cache_on_connectOptional[bool]FalseIf True, clears all cached queries when connecting. Useful for starting with a fresh empty cache.
disable_model_cacheOptional[bool]FalseIf True, disables the model cache. Forces px.models.list_models() to re-evaluate all providers and models on next call.
clear_model_cache_on_connectOptional[bool]FalseIf True, clears the model cache when connecting. Forces px.models.generate_text() to re-evaluate all providers and models on next call.
model_cache_durationOptional[int]NoneDuration of the model cache in seconds. If None, the model cache will be disabled.

Privacy

To use caching, we need to store queries and responses in the cache directory. This means that the cache directory will contain sensitive information such as prompt, response, etc. from your most recent queries.

px.connect(cache_path='~/proxai-cache')

When setting the cache_path parameter, be careful about the privacy and security of your cache directory.