Laravel Queue Reliability in 2026: Idempotency, Retry Policies, and Dead-Letter Design

March 27, 2026

Laravel queues look simple on the surface — dispatch a job, process it later. But in production, “later” introduces an entire class of problems: duplicate deliveries, silent failures, retry storms, and jobs that vanish without a trace. If you’ve ever woken up to a support ticket about a customer charged twice or an email that never sent, you know how quickly queue issues erode trust.

This guide covers the patterns and guardrails that make Laravel queues behave reliably under real conditions.

Failure Modes in Real Queues

Before building defenses, it helps to understand what actually goes wrong. These are the failure modes that hit production Laravel queues most often:

Duplicate delivery. At-least-once semantics mean your queue driver can deliver the same job more than once. This happens after timeouts, worker restarts, or network partitions between your app and Redis/SQS. If your job charges a credit card or sends a notification, duplicates are visible to users.

Silent job loss. A worker picks up a job, crashes mid-execution, and the job disappears. With Redis, this happens when a job is popped but never acknowledged. SQS handles this better with visibility timeouts, but misconfigured timeouts cause the same problem.

Retry storms. A job fails because a downstream API is down. It retries immediately, fails again, retries again — hammering the already-struggling service and filling your queue with backed-up work. Other jobs behind it starve.

Poison pills. A job that can never succeed — bad payload, missing dependency, schema mismatch — sits in the retry loop forever, consuming worker capacity and generating noise in your logs.

Ordering violations. If job B depends on job A completing first, parallel workers can easily process them out of order. This is especially common with event-driven architectures where multiple jobs are dispatched in quick succession.

Understanding these modes shapes everything below.

Idempotency Patterns

The single most important property for reliable queue jobs is idempotency: running the same job twice (or five times) produces the same result as running it once.

Use an Idempotency Key

Attach a unique key to each job at dispatch time. Before executing, check if that key has already been processed:

class ProcessPayment implements ShouldQueue
{
    public function __construct(
        private int $orderId,
        private string $idempotencyKey,
    ) {}

    public function handle(): void
    {
        $lockKey = "payment:idempotent:{$this->idempotencyKey}";

        $acquired = Cache::lock($lockKey, ttl: 3600)->get();

        if (! $acquired) {
            // Already processed or currently processing
            return;
        }

        // Check if this payment was already completed
        if (Payment::where('idempotency_key', $this->idempotencyKey)->exists()) {
            return;
        }

        $this->chargeCustomer();
    }
}

The combination of a distributed lock (to prevent concurrent execution) and a database check (to prevent re-execution after completion) covers both race conditions and replayed jobs.

Design for Upserts, Not Inserts

Where possible, write jobs so their database operations are naturally idempotent:

// Fragile: duplicate execution creates duplicate records
Order::create(['user_id' => $userId, 'product_id' => $productId]);

// Idempotent: same input always produces same state
Order::updateOrCreate(
    ['user_id' => $userId, 'product_id' => $productId, 'reference' => $ref],
    ['status' => 'confirmed'],
);

Separate Side Effects

External side effects (API calls, emails, webhooks) are the hardest to make idempotent. Isolate them into their own jobs with their own idempotency tracking rather than burying them inside a larger job that also does database work.

Retry and Backoff Policy Design

Laravel gives you $tries, $backoff, and $retryUntil — but picking values for these requires thinking about what your job is actually doing.

Match Retry Strategy to Failure Type

Not every failure deserves a retry. Categorize your failures:

  • Transient (network timeout, 503 from an API): retry with backoff.
  • Permanent (validation error, missing record, 400 response): fail immediately and route to dead-letter handling.
  • Unknown (unexpected exception): retry a limited number of times, then dead-letter.
class SyncToExternalApi implements ShouldQueue
{
    public int $tries = 5;

    public function backoff(): array
    {
        // Exponential: 10s, 30s, 90s, 270s, 810s
        return [10, 30, 90, 270, 810];
    }

    public function handle(): void
    {
        try {
            $response = Http::timeout(15)->post('https://api.example.com/sync', [
                'data' => $this->payload,
            ]);

            if ($response->clientError()) {
                // 4xx: permanent failure, don't retry
                $this->fail(new PermanentJobFailure(
                    "API returned {$response->status()}"
                ));
                return;
            }

            $response->throw(); // 5xx will throw and trigger retry
        } catch (ConnectionException $e) {
            // Transient: let it retry via normal mechanism
            throw $e;
        }
    }
}

Use retryUntil for Time-Sensitive Work

When a job only makes sense within a time window, use retryUntil instead of a flat retry count:

public function retryUntil(): DateTime
{
    // Stop retrying after 4 hours — the data window has closed
    return now()->addHours(4);
}

Add Jitter to Backoff

When many jobs fail at the same time (a downstream outage), deterministic backoff means they all retry at the same time too. Add randomness:

public function backoff(): array
{
    return array_map(
        fn ($seconds) => $seconds + random_int(0, $seconds),
        [10, 30, 90, 270]
    );
}

Dead-Letter Queue Handling

When a job exhausts its retries, Laravel calls its failed method and stores it in the failed_jobs table. This is your dead-letter queue — but the default setup is barely functional for production.

Structure Your Failed Method

The failed method is where you capture context for debugging. Don’t just log a generic message:

public function failed(Throwable $exception): void
{
    Log::error('SyncToExternalApi permanently failed', [
        'job_id' => $this->job->getJobId(),
        'payload_id' => $this->payload['id'] ?? null,
        'attempts' => $this->attempts(),
        'exception' => $exception->getMessage(),
        'trace' => $exception->getTraceAsString(),
    ]);

    // Notify the team through your alerting channel
    FailedJobAlert::dispatch($this, $exception);
}

Make Failed Jobs Replayable

The default failed_jobs table stores a serialized payload, but replaying jobs with php artisan queue:retry only works if the job class and its constructor signature haven’t changed since the failure. For critical jobs, store enough context to reconstruct the job independently:

public function failed(Throwable $exception): void
{
    FailedJobRecord::create([
        'job_class' => static::class,
        'payload' => json_encode($this->buildReplayPayload()),
        'exception' => $exception->getMessage(),
        'failed_at' => now(),
        'can_retry' => ! $exception instanceof PermanentJobFailure,
    ]);
}

This lets you build a simple admin panel or CLI command that can replay jobs based on structured data rather than relying on PHP serialization.

Prune the Failed Jobs Table

Left unchecked, failed_jobs grows indefinitely. Schedule regular pruning:

// In your console kernel
$schedule->command('queue:prune-failed --hours=168')->daily();

Keep failures for a week (or whatever your incident-response window is), then clean them out.

Monitoring and Alerting

Queues fail silently by default. Without active monitoring, you won’t know something is wrong until users tell you.

Key Metrics to Track

  • Queue depth over time — a growing queue means workers aren’t keeping up. This is the earliest signal of trouble.
  • Job processing latency — time from dispatch to completion. Spikes here mean either slow jobs or contention.
  • Failure rate — track per job class. A sudden spike in one class usually points to a specific downstream issue.
  • Worker health — are all expected workers running? Supervisor and Horizon both have ways to check this, but neither alerts you by default.

Laravel Horizon

If you’re on Redis, Horizon gives you a dashboard and built-in metrics. The key configuration is setting up notifications:

// In HorizonServiceProvider
Horizon::routeMailNotificationsTo('[email protected]');
Horizon::routeSlackNotificationsTo('https://hooks.slack.com/...', '#alerts');

Horizon’s LongWaitDetected notification is especially useful — it fires when jobs wait longer than a threshold you define, catching capacity problems early.

Custom Health Checks

For critical queues, add a health check that dispatches a canary job and verifies it completes within an expected window:

class QueueHealthCheck
{
    public function check(): bool
    {
        $token = Str::uuid()->toString();

        dispatch(new CanaryJob($token));

        // In practice, check this asynchronously
        // and alert if the token doesn't appear within threshold
        return true;
    }
}

If the canary job doesn’t complete within your threshold, your queue infrastructure has a problem — regardless of what your worker process list looks like.

Rollout Checklist

Before shipping queue changes to production, walk through this list:

Job design:

  • Every job that writes data or triggers side effects has an idempotency mechanism
  • Jobs that call external services distinguish transient from permanent failures
  • Side effects (emails, webhooks, API calls) are in separate, independently retriable jobs
  • Job payloads contain only serializable IDs/references, not full models or large objects

Retry configuration:

  • $tries or retryUntil is set explicitly on every job (never rely on the global default)
  • Backoff values match the expected recovery time of the downstream dependency
  • Jobs that should never retry (one-shot operations) use $tries = 1 with proper failed() handling

Dead-letter handling:

  • Every job with $tries > 1 has a meaningful failed() method
  • Failed jobs are stored with enough context to replay or investigate
  • failed_jobs table has scheduled pruning

Monitoring:

  • Queue depth is tracked and has an alert threshold
  • Failure rate is tracked per job class
  • Worker process count is monitored (Supervisor/Horizon)
  • There is a notification channel for long-wait and failure-spike events

Deployment:

  • Workers are restarted after deploy (php artisan queue:restart)
  • Job class renames or constructor changes account for jobs already in the queue
  • New jobs are deployed to workers before the code that dispatches them

Queue reliability isn’t a feature you ship once. It’s a set of habits: every new job gets idempotency by default, every failure path gets thought through before it happens, and monitoring tells you when reality diverges from your assumptions. The patterns here aren’t exotic — they’re the baseline that keeps your queues boring, which is exactly what you want.


Published by Artiphp who lives and works in San Francisco building useful things.