The S3 Cache-Control Trap - Why Misconfigured CDN Behaviour Quietly Inflates Costs

7 min read

Many teams rely on S3 and CloudFront to serve static files. When configured correctly, this pairing performs well and scales predictably. Yet a recurring issue appears across deployments: unusually low CloudFront cache hit rates and unexpectedly high S3 request charges, even for sites with mostly unchanging assets.

A pattern often emerges during investigation — the absence of explicit Cache-Control metadata on S3 objects. This small oversight can lead to a cascade of revalidation requests, edge evictions, and origin fetches. The same workload behaves very differently when served through CDN-first platforms that apply broader caching defaults.

This article examines why this happens, how to verify the configuration, and how AWS and Cloudflare differ in their handling of static content.

Understanding CloudFront’s Behaviour When Metadata Is Missing

CloudFront respects origin metadata whenever it exists. When objects lack Cache-Control and Expires headers, CloudFront must fall back to a combination of:

  • Distribution behaviour TTLs
  • Heuristic decisions based on object type and request patterns

Although CloudFront does not “ignore” behaviour settings, the absence of strong caching signals makes the object a low-confidence candidate for long retention on the edge.

This creates several downstream effects.

Browser Revalidation

Browsers tend to revalidate objects that have no clear caching instructions. They send conditional requests (If-None-Match or If-Modified-Since) on repeat visits, which CloudFront may relay to the origin.

Even when S3 replies with 304 Not Modified, the request still counts toward origin billing.

Object Eviction

A TTL indicates how long an object may remain fresh, but it does not guarantee its place in cache. CloudFront’s eviction decisions consider:

  • Object popularity
  • Object size
  • Edge node capacity
  • Metadata strength

Files without explicit metadata are weaker candidates and are more easily evicted during cache pressure.

Low Cache Hit Rates

When both browsers and CloudFront revalidate frequently, the cache hit rate often drops into the 10–30% range for static workloads that should achieve far higher numbers.

This can multiply S3 request volume significantly.

How to Confirm Whether S3 Metadata Is Missing

Verifying the metadata on a single object:

aws s3api head-object \
  --bucket your-bucket \
  --key path/to/file.js

If no Cache-Control header appears, the object relies entirely on the CDN’s inferred logic and distribution settings.

Checking CloudFront behaviour TTLs:

aws cloudfront get-distribution \
  --id YOUR_DISTRIBUTION_ID \
  --query 'Distribution.DistributionConfig.DefaultCacheBehavior'

This shows the configured MinTTL, DefaultTTL, and MaxTTL, but these cannot compensate for missing origin metadata because they do not override browser revalidation patterns.

Correcting the Issue: Setting Explicit Cache-Control Metadata

S3 does not support default metadata inheritance. Every object must be uploaded with correct metadata or updated afterwards.

For New Deployments

aws s3 sync ./build s3://your-bucket/ \
  --cache-control "public, max-age=31536000, immutable"

This is appropriate for hashed or fingerprinted assets that will never change after publication.

For Existing Objects

aws s3 cp s3://your-bucket/ s3://your-bucket/ --recursive \
  --metadata-directive REPLACE \
  --cache-control "public, max-age=31536000, immutable"

Be mindful that replacing metadata rewrites the object in full, which counts as both a PUT request and data transfer inside AWS.

HTML or Entry Files

Entry points (such as index.html) usually require far shorter lifetimes:

aws s3 cp ./dist/index.html s3://your-bucket/ \
  --cache-control "public, max-age=3600" \
  --metadata-directive REPLACE

Automation

For teams with multiple upload sources, an S3 event-triggered Lambda can enforce metadata on every new object.

How Cloudflare Handles Static Files by Default

Cloudflare’s behaviour differs because its design originated as a CDN rather than a storage layer.

Key differences include:

  • Broader default caching for common static file types
  • Edge retention that does not rely heavily on origin metadata
  • Revalidation handled at the edge rather than forwarded to the origin
  • Integrated asset fingerprinting in Cloudflare Pages
  • Cache Rules for structured overrides

This does not make Cloudflare universally superior; rather, its defaults favour workloads where content is static and deployments are atomic.

Cloudflare Pages

Pages applies predictable headers such as:

Cache-Control: public, max-age=0, must-revalidate

Assets are retained on the edge until a new deployment replaces them. Revalidation is handled by Cloudflare rather than the origin source.

Cloudflare R2 With CDN

When used with R2:

  • Static file extensions are cached readily
  • Cache Rules allow granular control
  • R2 avoids origin egress charges, which removes one cost vector present in AWS deployments

These defaults often work well for static sites without requiring additional steps.

Cache Hit Rate Benchmarks

The disparity between observed and expected cache hit rates is often the clearest indicator of configuration issues.

ScenarioExpected CloudFront Hit RateExpected Cloudflare Hit Rate
Static assets with correct headers95–99%95–99%
Static assets without headers10–35%75–95% (depending on defaults)
HTML with short TTL50–70%60–80%

These numbers represent common operational ranges rather than guaranteed values.

AWS vs Cloudflare: Neutral Architectural Comparison

Caching Defaults

AspectAWS S3 + CloudFrontCloudflare CDN
Behaviour when metadata is absentRelies on origin behaviour + heuristicsBroader caching of static extensions
Browser revalidation handlingOften forwarded to originUsually resolved at the edge
Edge eviction sensitivityHigher for objects lacking metadataLower, due to broader defaults

Configuration Effort

TaskAWSCloudflare
Enforcing correct metadataRequired per objectUsually optional for static workloads
Asset fingerprintingManualBuilt into Pages
Invalidation modelRequires API or console actionAutomatic per deployment (Pages)

Cost Considerations

Cost FactorAWSCloudflare
Origin request costApplies (S3 pricing)Avoided with R2
EgressApplies (CloudFront rates)Included for most plans / zero for R2
Cache miss effectDirect impact on costMinimal impact

The goal is not to claim one platform is superior, but to highlight that their default assumptions differ. CloudFront expects explicit control. Cloudflare assumes cacheability unless instructed otherwise.

Production Implementation Examples

AWS Deployment Script

#!/bin/bash
# deploy.sh

# Cache static assets for one year
aws s3 sync ./dist/assets s3://my-app/assets/ \
  --cache-control "public, max-age=31536000, immutable"

# Set short TTL for HTML
aws s3 cp ./dist/index.html s3://my-app/ \
  --cache-control "public, max-age=3600" \
  --metadata-directive REPLACE

aws cloudfront create-invalidation \
  --distribution-id E123456789 \
  --paths "/*"

Cloudflare Pages Deployment

wrangler pages deploy ./dist --project-name=my-app

Cloudflare Worker Example for Custom Cache Logic

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const url = new URL(request.url)
  const object = await env.MY_BUCKET.get(url.pathname)

  return new Response(object.body, {
    headers: { 'Cache-Control': 'public, max-age=86400' }
  })
}

Additional AWS Considerations

Origin Shield

AWS offers Origin Shield to reduce origin fetches further, but it adds cost and does not replace the need for correct metadata.

S3 Replication Rules

Metadata may be rewritten or omitted depending on replication configuration.

When correcting metadata for large buckets, verify replication behaviour before running bulk updates.

Metadata Replacement Cost Impact

Bulk metadata replacement rewrites entire objects, which should be accounted for in cost planning.

Best Practice Summary

On AWS

  • Upload all assets with explicit Cache-Control metadata.
  • Use long lifetimes for fingerprinted assets and short lifetimes for entry documents.
  • Monitor cache hit rate in CloudWatch and investigate drops promptly.
  • Confirm metadata via head-object and avoid assuming defaults will suffice.
  • Use CloudFront behaviours to supplement, not replace, origin metadata.

On Cloudflare

  • Rely on defaults for static workloads unless special logic is needed.
  • Use Cache Rules to override behaviour cleanly.
  • Confirm behaviour through Cache Analytics.
  • Apply Workers sparingly for exceptions.

Further notes...

The root cause of unexpectedly high origin traffic is often not the platform itself but how caching signals are defined. AWS expects precise metadata to guide CloudFront’s behaviour. Cloudflare applies broader assumptions that suit static sites but still benefit from explicit headers when present.

Static workloads on either platform can achieve hit rates above 95% when configured correctly. The absence of metadata is what creates the gap — not a fault in AWS or an inherent advantage in Cloudflare.

For teams operating on AWS, Cache-Control metadata should be treated as a required part of deployment infrastructure, not a secondary optimisation. For teams on Cloudflare, defaults tend to offer a stronger baseline but still benefit from clear rules.

Monitoring cache hit rate remains the fastest way to detect misconfiguration across both systems.