Антропик использует транзакции и уникальные ограничения в базе данных

This is an excellent, deep-dive technical guideline on Idempotency, likely extracted from an article or a senior engineer's internal monologue (resembling the style of "The Pragmatic Programmer" or similar systems engineering deep dives). It moves beyond the simple "store the key" implementation into the complex reality of distributed systems, business logic, and recovery.

Here is a structured synthesis of the key concepts, best practices, and architectural implications found in your text:

The most critical takeaway is that idempotency is about memory, not just logic. * Contract: An idempotent request must return the exact same result (or a documented equivalent) upon replay. * Storage Requirement: To honor this, the server must store the original response (or enough metadata to reconstruct it), not just the request. * The Trap: Simply returning the current state of the resource (e.g., a payment now showing SETTLED instead of PENDING) breaks the contract. Clients expecting the creation result will see a state change. * Solution: Store the response body (or schema + data) associated with the operation ID. Treat this storage like sensitive data, not a cache.

Idempotency at the API gateway (HTTP level) is insufficient if the consequences (side effects) are duplicated downstream. * The Outbox Pattern: Always use the Outbox pattern. 1. Insert business data and the Outbox event in the same transaction. 2. Publish the event asynchronously. 3. Consumers deduplicate by a unique business key (e.g., ledger_payment_pay_789), not just the event ID. * Unique Constraints: Every consumer that performs a side effect must have a UNIQUE constraint on the business key. The database naturally handles the "duplicate prevention" logic better than application code. * Recovery Logic: If a message is duplicated because the consumer crashed, the retry shouldn't just resend; it should check if entry exists, return success before doing any work.

Idempotency records have a lifecycle. You cannot treat them all equally. * States: * IN_PROGRESS: Locked. Do not allow new writes. If stale (e.g., waiting for a timeout), handle as a recovery case (retry with care or fail). * COMPLETED: The operation succeeded. Replay returns the stored response. * FAILED_RETRYABLE: The operation failed, but the failure was temporary (e.g., network). Allow retry. * FAILED_REPLAYABLE: The operation failed deterministically (e.g., INSUFFICIENT_FUNDS). Do not retry without a new key. * EXPIRED: The window to retry has passed. A new request with the same key should create a new operation (if business rules allow) or fail. * Cleanup: Do not blindly delete IN_PROGRESS rows after expiry. They might represent stuck workers or pending reconciliations. Delete in batches or archive them while keeping metadata.

A major pitfall is API schema changes. * Scenario: Client A stores a response in v2 (status: PENDING). * Event: Schema changes to v3 (state: PENDING, adds createdAt). * Problem: If Client A retries and gets a v3 response, the cached v2 response is invalid, breaking the idempotency contract. * Resolution: * Store the response schema version alongside the body. * Accept that replaying a v2 response to a v3 client might be a contract violation; either reject the replay or ensure the response is robust to versioning. * Recommendation: For endpoints requiring exact replay, store the full body. For others, reconstruct from current state if the contract allows.

The Client's Key vs. The Server's ID: The Idempotency-Key sent by the client is often different from the internal paymentId generated by the server.
- Map Client Key -> Internal Operation ID -> Resource ID.
Randomness: Clients should generate random keys (UUIDs), not timestamps.
- Timestamp keys allow duplicates if two retries happen in different milliseconds.
- Counter keys require coordination and are prone to wrap-around bugs.
Scope: Scope the key to (tenant_id, operation_name, key_value) to prevent cross-tenant or cross-operation collisions.

Not all failures are created equal. * Syntax Errors: Do not store these. Repeating the request will always fail. * Business Rejections: * INSUFFICIENT_FUNDS: Deterministic? If the balance doesn't change, yes. If the balance updates between retries, the first rejection might be replayed incorrectly. Decide if the first decision is "binding." * Auth Failures: Do not create idempotency records for auth errors. If authorized later, a retry should work. * Rate Limits: Do not treat these as idempotent outcomes. A retry after a rate limit is usually valid. * Server Errors (5xx): * Before side effect: Safe to retry. * After side effect: Dangerous. If you processed a payment but crashed before sending the response, a retry must not create a duplicate payment. You need a local lock or a "did we process this?" check.

Based on the text, here is a hardened checklist before shipping an idempotency layer:

[ ] Unique Constraint: Is there a unique constraint on the scoped idempotency key in the database?
[ ] Replay Contract: Does the stored response match the canonical command used to create it? (Handling schema changes).
[ ] Side Effect Durability: Are outbox/inbox patterns used? Are side effects written to durable storage before marking the operation as processed?
[ ] State Definition: Have we defined IN_PROGRESS, COMPLETED, FAILED_RETRYABLE, UNKNOWN, and EXPIRED states clearly?
[ ] Recovery Logic: When an IN_PROGRESS record is stale, does the system attempt recovery (check external state) rather than ignoring it or crashing?
[ ] Monitoring: Are we tracking idempotency.replay.count, idempotency.expired_retry.count, and idempotency.unknown_state.count?
[ ] Test Coverage: Have we tested concurrent identical requests, timeouts after downstream success, and partial failures?

"The second request is not a repeat until proven."

Idempotency is not a "set it and forget it" cache. It is a recovery mechanism. The system must be able to distinguish between "The user clicked twice" (duplicate side effect risk) and "The network failed, retrying the same intent" (safe replay). The cost of building this is the durable memory and the complexity of recovery logic; the value is preventing the silent corruption of financial ledgers and inventory counts.