Skip to content
Cresva
Developers
PerformanceEngineering

ACP Performance Benchmarks: Sub-200ms Across 10K Storefronts

We benchmarked the ACP query pipeline end-to-end across 10,000 storefronts. The result: p99 latencies under 200ms for product discovery, the most latency-sensitive operation in agent commerce. This post details the benchmark methodology, the architectural decisions that make these numbers possible, and the tradeoffs we made along the way.

Why latency matters in agent commerce

Agent commerce operates within conversational latency budgets. When a user asks an AI agent to find a product, the agent has roughly 2-3 seconds to query storefronts, rank results, and compose a natural language response. The commerce query is one step in a multi-step pipeline that includes intent parsing, query formulation, result ranking, and response generation.

If the commerce query takes 500ms, that is 20-25% of the total latency budget consumed by a single step. At 1 second, the agent either exceeds its latency budget or has to cut corners on other steps like result ranking and response quality. At 2 seconds, the user experience is noticeably degraded.

Our target was p99 under 200ms for the full query pipeline: receive an agent's query, fan out to relevant storefronts, collect responses, merge results, and return a ranked product list. This gives agents 90% of their latency budget for everything else.

Benchmark methodology

Test environment

The benchmark ran against ACP's production infrastructure during a representative traffic period (Tuesday 2:00-4:00 PM UTC, our peak traffic window). We did not use synthetic benchmarks or isolated test environments - these numbers reflect real production conditions.

Query corpus

We used 10,000 unique product discovery queries sampled from actual agent traffic. Queries were deduplicated and anonymized. The distribution matched production patterns:

  • 42% broad category queries ("running shoes under $120")
  • 31% specific product queries ("Nike Pegasus 41 size 10")
  • 18% attribute-filtered queries ("wireless headphones with ANC, battery life > 30 hours")
  • 9% comparison queries ("compare AirPods Pro 3 vs Sony WF-1000XM6")
  • Storefront coverage

    All 10,000 registered storefronts were included. Storefront sizes range from 50 products (small specialty brands) to 2.3 million products (large marketplaces). The median storefront has 4,200 products.

    Measurement

    Latency was measured from the moment ACP's edge proxy received the agent's HTTP request to the moment the complete HTTP response was sent. This includes TLS termination, request parsing, storefront fan-out, response aggregation, ranking, and serialization. It does not include network transit time between the agent and ACP's edge, which varies by geography.

    Results

    Product discovery (the primary benchmark)

    | Percentile | Latency |

    |-----------|---------|

    | p50 | 47ms |

    | p75 | 82ms |

    | p90 | 134ms |

    | p95 | 162ms |

    | p99 | 191ms |

    | p99.9 | 287ms |

    The p99 of 191ms meets our sub-200ms target. The p50 of 47ms means that half of all queries resolve in under 50ms - fast enough that the commerce query is negligible in the agent's overall latency budget.

    Other operations

    | Operation | p50 | p95 | p99 |

    |-----------|-----|-----|-----|

    | Product discovery | 47ms | 162ms | 191ms |

    | Storefront trust score | 8ms | 15ms | 22ms |

    | Negotiation create | 89ms | 198ms | 274ms |

    | Negotiation respond | 72ms | 165ms | 231ms |

    | Transaction create | 112ms | 224ms | 298ms |

    | Webhook delivery | 34ms | 89ms | 142ms |

    Transaction creation is the slowest operation because it involves payment authorization, which requires a synchronous call to the payment processor. The 298ms p99 is well within acceptable bounds for a non-conversational operation (users are not waiting in real-time for transaction creation to complete).

    How we achieved these numbers

    Tiered caching

    ACP uses a three-tier caching architecture:

    L1: Edge cache. Deployed at 28 PoPs (Points of Presence) globally. Handles exact-match queries with a TTL of 60 seconds. Cache hit rate: 34%. When an agent asks the exact same query that another agent asked 30 seconds ago, the response is served from the edge in under 5ms. L2: Regional cache. Deployed in 6 regional data centers. Handles semantic query matching - queries that are not exact matches but would produce the same result set. For example, "running shoes under $120" and "running shoes, max price 120 USD" map to the same canonical query. Cache hit rate: 28%. Responses served from L2 in under 15ms. L3: Storefront response cache. Caches individual storefront responses (before aggregation and ranking). When a query fans out to 50 storefronts and 45 of them have cached responses, only 5 require live API calls. Cache hit rate: 89% at the individual storefront level.

    The combined effect: 34% of queries are served entirely from L1. Of the remaining 66%, about 28% are served from L2. Of the remaining 48%, 89% of individual storefront lookups are cached at L3. The actual fan-out to live storefront APIs happens for only a small fraction of the total query volume.

    Adaptive fan-out

    Not every query needs to reach every storefront. ACP's query planner determines which storefronts are relevant to a given query and only fans out to those.

    The query planner uses three signals:

    Category affinity. Each storefront declares which product categories it carries. A query for "running shoes" skips storefronts that only sell electronics. This typically reduces fan-out from 10,000 to 200-500 storefronts. Historical relevance. The query planner tracks which storefronts have returned results for similar queries in the past. Storefronts that have never returned relevant results for a query pattern are deprioritized (but not eliminated - they are checked periodically to detect catalog changes). Trust score threshold. If the agent specifies a minimum trust score (common for premium agents), storefronts below the threshold are excluded from fan-out entirely. This is a pure optimization - it reduces fan-out without affecting result quality, because the agent would have filtered those results anyway.

    After these filters, a typical query fans out to 50-200 storefronts instead of 10,000. This reduces both latency (fewer responses to wait for) and cost (fewer API calls to process).

    Connection pooling

    ACP maintains persistent HTTP/2 connections to every active storefront. Connection setup (DNS resolution, TCP handshake, TLS negotiation) typically adds 50-150ms of latency. By keeping connections warm, the first byte of a storefront response arrives within 5-10ms of sending the request.

    The connection pool is managed per-region. Each regional instance maintains up to 100 concurrent connections per storefront, with idle connections kept alive for 5 minutes. Storefronts that have not been queried in 5 minutes have their connections closed and must be re-established on the next query - this adds latency to the first query after a cold period but keeps resource usage bounded.

    Parallel fan-out with early termination

    When ACP fans out to 100 storefronts, it does not wait for all 100 to respond. The query engine uses a parallel fan-out with early termination strategy:

  • Send requests to all relevant storefronts simultaneously.
  • As responses arrive, add products to the result buffer.
  • When the result buffer contains enough high-quality results to satisfy the query (typically 2-3x the requested limit), start a termination countdown.
  • The countdown gives remaining storefronts 50ms to respond. Any results that arrive within the countdown are included; the rest are abandoned.
  • This strategy means that a slow storefront does not hold up the entire query. If 95 out of 100 storefronts respond within 100ms and 5 are slow, the query completes at ~150ms instead of waiting for the slowest storefront.

    Abandoned responses are still processed asynchronously and added to the L3 cache, so subsequent queries benefit from the data even though the original query did not wait for it.

    Result ranking

    Once responses are collected, ACP ranks products using a lightweight scoring function that considers:

  • Relevance to the query (based on attribute matching, not text similarity)
  • Storefront trust score
  • Price competitiveness (relative to the result set average)
  • Availability confidence (in-stock, high quantity, no restock uncertainty)
  • The ranking function is intentionally simple - it runs in under 2ms even for 500 results. More sophisticated ranking can be layered on by the agent platform; ACP provides a reasonable default sort that agents can override.

    Tradeoffs we made

    Every performance optimization involves tradeoffs. Here are the ones we consciously accepted:

    Eventual consistency over strong consistency. Cached product data might be up to 60 seconds stale at L1, and up to 15 minutes stale at L3 in the worst case. For most agent commerce queries, this is acceptable. For price-sensitive operations (negotiation, transaction creation), we bypass caches and query live APIs. Incomplete results over slow results. The early termination strategy means that some queries might miss products from slow storefronts. We accept this tradeoff because a fast, slightly incomplete result set is more useful to agents than a slow, complete one. Abandoned storefronts' data is still cached for future queries. Memory cost over latency cost. The L3 cache holds individual storefront responses for every recent query pattern. This requires significant memory per region (approximately 120GB per regional cache). We accepted the memory cost to achieve the cache hit rates that make sub-200ms p99 possible.

    Reproducing these benchmarks

    We publish our benchmark methodology and a subset of our query corpus so that anyone running an ACP-compatible infrastructure can reproduce similar measurements. The benchmark toolkit is available at github.com/cresva/acp-bench.

    We encourage the community to run benchmarks against their own infrastructure and share results. Performance is a feature, and we want the entire ACP ecosystem to benefit from the techniques described in this post.


    Questions about ACP performance? Reach out at developers@cresva.ai or join our GitHub discussions.