Sync Where There Should Be Async


performance scalability reliability

System Design Scenario

Sync Where There Should Be Async

When your API waits for hours while users wait for timeouts

⏱ 12 min read📐 Intermediate🔒 Performance

It’s Tuesday afternoon. Alex clicks “Upload Video” and selects his 200MB product demo. The browser shows a progress bar. 30% uploaded… 60%… 90%… complete! Then nothing. The API is processing the video synchronously - transcoding, thumbnail generation, metadata extraction. 25 seconds pass. 28 seconds. 30 seconds. Timeout.

The user sees “Upload failed. Please try again.” Alex doesn’t know the video actually uploaded successfully and is being processed right now. He clicks retry. Another 200MB upload starts. Another 30-second processing window begins. Another timeout. Now you have two identical videos processing simultaneously.

It’s like a restaurant where customers have to stand at the counter while the chef prepares their meal from scratch. The line backs up. Customers leave. Orders get duplicated when hungry people try to order again. This is synchronous processing where asynchronous processing should be used.

Why This Happens

The root cause is treating long-running operations like short-running operations. Most web frameworks are designed around the request-response cycle: client sends request, server processes immediately, server returns response. This pattern works perfectly for database lookups or simple calculations but breaks catastrophically for heavyweight processing.

Smart engineers assume that if they can make individual operations fast enough, the synchronous model will scale - but processing time and file sizes grow faster than optimization can keep up.

user uploads 200MB video
  -> API starts synchronous processing
    -> transcoding takes 45 seconds
      -> HTTP request times out at 30 seconds
        -> client shows error and retries
          -> duplicate processing starts
            -> server resources exhausted
Key Insight

HTTP timeouts are fundamentally incompatible with variable-length processing jobs - the network stack can’t wait indefinitely, but video processing can take arbitrarily long.

The Naive Solution (and where it breaks)

Most engineers reach for two approaches: increase the timeout values, or optimize processing speed. Both seem reasonable but both fail at scale.

Increasing timeouts sounds logical - if processing takes 45 seconds, set the timeout to 60 seconds. Add some buffer, make it 120 seconds. This creates the illusion of solving the problem.

Naive approach showing increased timeouts and processing optimization

Here’s where it breaks:

Small files: 10MB video -> processes in 8 seconds -> works fine
Large files: 500MB video -> processes in 3 minutes -> connection pools exhausted

Long-running HTTP connections consume server resources for their entire duration. A server that can handle 1000 concurrent short requests might only handle 10 concurrent video processing requests. Your capacity plummets by 99%.

Watch Out

Increasing timeouts makes the problem worse by keeping connections open longer, which reduces server capacity and makes timeouts more likely for subsequent requests.

The Better Solution

Here’s what actually fixes this. You need to separate upload from processing, and processing from response. The client should get an immediate response acknowledging the upload, then poll for processing status separately.

Layer 1: Immediate Upload Response

The first step is acknowledging the upload immediately without waiting for processing to complete. Store the uploaded file and return a job ID that the client can use to track progress.

# Immediate response after upload
@app.post("/videos/upload")
async def upload_video(file: UploadFile):
    # Store file immediately - no processing yet
    file_path = await store_uploaded_file(file)
    
    # Create job record with PENDING status  
    job = VideoProcessingJob(
        id=str(uuid4()),
        file_path=file_path,
        status="PENDING",
        created_at=datetime.utcnow()
    )
    await job_repository.create(job)
    
    # Queue processing for background worker
    await video_queue.enqueue(
        process_video_job,
        job_id=job.id,
        file_path=file_path
    )
    
    # Immediate response - no waiting
    return {
        "job_id": job.id,
        "status": "PENDING",
        "message": "Video uploaded successfully. Processing started."
    }

This breaks the timeout problem entirely - the API responds in milliseconds regardless of file size or processing complexity.

Immediate upload response with job ID and background processing queue

Layer 2: Background Job Processing

The heavy lifting happens in background workers that aren’t bound by HTTP timeouts. These workers can process videos for minutes or hours without affecting API responsiveness.

# Background worker processes videos asynchronously
async def process_video_job(job_id: str, file_path: str):
    job = await job_repository.get(job_id)
    
    try:
        # Update status to IN_PROGRESS
        job.status = "IN_PROGRESS"
        job.started_at = datetime.utcnow()
        await job_repository.update(job)
        
        # Heavy processing happens here - can take minutes
        transcoded_path = await transcode_video(file_path)
        thumbnail_path = await generate_thumbnail(file_path)
        metadata = await extract_metadata(file_path)
        
        # Store results and update status
        job.status = "COMPLETED" 
        job.completed_at = datetime.utcnow()
        job.transcoded_path = transcoded_path
        job.thumbnail_path = thumbnail_path
        job.metadata = metadata
        await job_repository.update(job)
        
        # Clean up original upload
        await cleanup_file(file_path)
        
    except Exception as e:
        job.status = "FAILED"
        job.error_message = str(e)
        job.failed_at = datetime.utcnow()
        await job_repository.update(job)

Background workers run independently of web requests. If a worker crashes, the job queue ensures work doesn’t get lost. If processing fails, the job status reflects the failure state.

Real World

YouTube’s upload system works exactly this way - you get immediate confirmation that your video uploaded, then poll a separate status API to track encoding progress.

Layer 3: Status Polling API

The client needs a way to check processing status without blocking. A dedicated polling endpoint provides real-time progress updates.

# Status polling endpoint - fast, cacheable
@app.get("/videos/jobs/{job_id}/status")
async def get_job_status(job_id: str):
    job = await job_repository.get(job_id)
    if not job:
        raise HTTPException(status_code=404, detail="Job not found")
    
    response = {
        "job_id": job.id,
        "status": job.status,
        "created_at": job.created_at.isoformat(),
    }
    
    # Include progress details based on status
    if job.status == "IN_PROGRESS":
        response["progress"] = await get_processing_progress(job.id)
        response["estimated_completion"] = await estimate_completion(job.id)
    elif job.status == "COMPLETED":
        response["video_url"] = job.transcoded_path
        response["thumbnail_url"] = job.thumbnail_path
        response["metadata"] = job.metadata
    elif job.status == "FAILED":
        response["error"] = job.error_message
        response["retry_allowed"] = True
    
    return response

Status endpoints are designed for high frequency polling - they’re fast, lightweight, and easily cacheable. The client can poll every few seconds without impacting server performance.

Layer 4: Presigned Upload URLs

For large files, even the upload itself can timeout. Presigned URLs let clients upload directly to cloud storage, bypassing your API entirely for the actual file transfer.

# Generate presigned URL for direct upload
@app.post("/videos/upload-url")
async def generate_upload_url(filename: str, content_type: str):
    # Generate unique key for storage
    file_key = f"uploads/{uuid4()}/{filename}"
    
    # Create job record before upload
    job = VideoProcessingJob(
        id=str(uuid4()),
        file_key=file_key,
        status="UPLOADING",
        created_at=datetime.utcnow()
    )
    await job_repository.create(job)
    
    # Generate presigned URL for S3 upload
    upload_url = s3_client.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': 'video-uploads',
            'Key': file_key,
            'ContentType': content_type
        },
        ExpiresIn=3600  # 1 hour to complete upload
    )
    
    return {
        "job_id": job.id,
        "upload_url": upload_url,
        "file_key": file_key
    }

# Webhook triggered after successful S3 upload
@app.post("/videos/upload-complete")
async def upload_complete(file_key: str, job_id: str):
    # Update job status and start processing
    job = await job_repository.get(job_id)
    job.status = "PENDING"
    job.file_path = f"s3://video-uploads/{file_key}"
    await job_repository.update(job)
    
    # Queue processing job
    await video_queue.enqueue(
        process_video_job,
        job_id=job.id,
        file_path=job.file_path
    )
    
    return {"status": "processing_started"}

Presigned uploads eliminate upload timeouts entirely - the client uploads directly to S3, which has no timeout limits. Your API only handles small metadata operations.

Presigned URL upload flow with direct client-to-storage transfer

The Full Architecture

Complete async processing architecture with upload, queue, workers, and polling

The complete architecture separates every slow operation from the HTTP request cycle. File uploads go directly to storage. Processing happens in background workers. Status updates come from fast polling endpoints. No HTTP connection waits for heavyweight processing.

The client experience is dramatically improved - immediate confirmation of uploads, real-time progress tracking, and no mysterious timeouts. Server capacity increases by 99% because connections aren’t held open during processing.

Key Insight

The critical design principle is never blocking HTTP responses on unbounded operations - anything that can take variable time should happen asynchronously.

Component Deep Dive

Job Queue System

The job queue’s job is reliably storing work that needs to happen later, with retry mechanisms and failure handling.

# Redis-based job queue with Celery
from celery import Celery

app = Celery('video_processor', broker='redis://localhost:6379/0')

app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    result_serializer='json',
    timezone='UTC',
    enable_utc=True,
    task_routes={
        'process_video_job': {'queue': 'video_processing'},
        'generate_thumbnail': {'queue': 'thumbnails'},
        'extract_metadata': {'queue': 'metadata'}
    },
    task_acks_late=True,
    worker_prefetch_multiplier=1
)

@app.task(bind=True, max_retries=3)
def process_video_job(self, job_id, file_path):
    try:
        # Actual processing logic here
        result = perform_video_processing(job_id, file_path)
        return result
    except Exception as exc:
        # Retry with exponential backoff
        raise self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))

The queue configuration ensures jobs survive worker crashes and retries happen with exponential backoff. Different job types use different queues so thumbnail generation doesn’t block video transcoding.

Idempotency Keys

To prevent duplicate processing when users retry uploads, every job needs an idempotency key based on file content.

import hashlib

def generate_idempotency_key(file_content: bytes, filename: str) -> str:
    # Hash file content + filename for uniqueness
    hasher = hashlib.sha256()
    hasher.update(file_content)
    hasher.update(filename.encode())
    return hasher.hexdigest()

@app.post("/videos/upload")  
async def upload_video(file: UploadFile):
    file_content = await file.read()
    idempotency_key = generate_idempotency_key(file_content, file.filename)
    
    # Check if we've already processed this exact file
    existing_job = await job_repository.get_by_idempotency_key(idempotency_key)
    if existing_job:
        return {
            "job_id": existing_job.id,
            "status": existing_job.status,
            "message": "File already uploaded"
        }
    
    # Continue with new upload...

Idempotency keys prevent wasted processing when users retry uploads. The same file always maps to the same job ID, so duplicate uploads just return the existing job status.

Chunked Upload Support

For extremely large files, even presigned uploads can timeout. Chunked uploads let clients resume interrupted transfers.

@app.post("/videos/start-chunked-upload")
async def start_chunked_upload(filename: str, total_size: int, chunk_size: int):
    upload_id = str(uuid4())
    
    # Initialize multipart upload in S3
    response = s3_client.create_multipart_upload(
        Bucket='video-uploads',
        Key=f'uploads/{upload_id}/{filename}'
    )
    
    upload_session = ChunkedUploadSession(
        upload_id=upload_id,
        s3_upload_id=response['UploadId'],
        filename=filename,
        total_size=total_size,
        chunk_size=chunk_size,
        uploaded_chunks=[]
    )
    await session_repository.create(upload_session)
    
    return {
        "upload_id": upload_id,
        "chunk_size": chunk_size,
        "total_chunks": math.ceil(total_size / chunk_size)
    }

@app.post("/videos/upload-chunk/{upload_id}/{chunk_number}")
async def upload_chunk(upload_id: str, chunk_number: int, chunk: UploadFile):
    session = await session_repository.get(upload_id)
    
    # Upload chunk to S3 with part number
    response = s3_client.upload_part(
        Bucket='video-uploads',
        Key=f'uploads/{upload_id}/{session.filename}',
        PartNumber=chunk_number,
        UploadId=session.s3_upload_id,
        Body=await chunk.read()
    )
    
    # Track completed chunk
    session.uploaded_chunks.append({
        'PartNumber': chunk_number,
        'ETag': response['ETag']
    })
    await session_repository.update(session)
    
    # Check if upload is complete
    if len(session.uploaded_chunks) == session.total_chunks:
        await complete_multipart_upload(session)
    
    return {"chunk_uploaded": chunk_number}

Chunked uploads with resume capability make large file uploads reliable even over unstable network connections.

Progress Tracking

Background jobs should report progress so clients can show meaningful status updates instead of generic “processing” messages.

# Progress tracking within video processing job
async def process_video_job(job_id: str, file_path: str):
    await update_job_progress(job_id, 0, "Starting transcoding...")
    
    # Transcoding with progress callbacks
    await transcode_video(
        file_path,
        progress_callback=lambda pct: update_job_progress(
            job_id, pct * 0.7, f"Transcoding... {int(pct)}%"
        )
    )
    
    await update_job_progress(job_id, 70, "Generating thumbnail...")
    thumbnail_path = await generate_thumbnail(file_path)
    
    await update_job_progress(job_id, 90, "Extracting metadata...")
    metadata = await extract_metadata(file_path)
    
    await update_job_progress(job_id, 100, "Completed")

async def update_job_progress(job_id: str, progress: int, message: str):
    # Store progress in fast cache for polling endpoint
    await redis_client.hset(
        f"job_progress:{job_id}",
        mapping={
            "progress": progress,
            "message": message,
            "updated_at": datetime.utcnow().isoformat()
        }
    )
    await redis_client.expire(f"job_progress:{job_id}", 3600)  # 1 hour TTL

Progress tracking with meaningful messages creates a much better user experience than binary “processing” vs “complete” states.

Comparison Table

ApproachResponse TimeServer CapacityFailure RecoveryUser ExperienceImplementation Complexity
Synchronous ProcessingHoursVery LowPoor - lost workTimeouts and retriesLow
Increased TimeoutsHoursVery LowPoor - still lost workFewer timeouts, same slownessLow
Simple Background JobsMillisecondsHighGood - queue persistenceGreat with pollingMedium
Presigned + Chunked UploadMillisecondsVery HighExcellent - resumableGreat with progressHigh
Full Async PipelineMillisecondsVery HighExcellent - all recoverableExcellent - real-time updatesHigh

The full async pipeline requires significantly more upfront development but provides the best user experience and server efficiency. You’re trading implementation complexity for operational simplicity.

Key Takeaways

  • HTTP timeouts are incompatible with variable-length processing - separate upload from processing entirely
  • Background job queues provide reliability and scalability that synchronous processing cannot match
  • Presigned URLs eliminate upload timeouts by removing your API from the file transfer path
  • Idempotency keys prevent duplicate processing when users inevitably retry failed uploads
  • Status polling gives users real-time feedback without holding connections open
  • Chunked uploads make large file transfers resumable and reliable over unreliable networks
  • Progress tracking transforms user experience from anxiety-inducing silence to confidence-building updates
  • Job queue persistence ensures work survives server crashes and deployment rollouts

The counter-intuitive lesson is that making operations asynchronous often makes them feel faster to users, even when the actual processing time is unchanged. Immediate acknowledgment with progress tracking feels faster than synchronous processing with timeouts.

Frequently Asked Questions

Q: What happens if the background worker crashes while processing a video? A: Job queues like Celery use acknowledgment-based delivery. If a worker crashes, the job returns to the queue automatically and another worker picks it up. The job status in your database reflects the current state.

Q: How do you handle partial failures in multi-step processing (transcode succeeds, thumbnail fails)? A: Design jobs as state machines with granular status tracking. If thumbnail generation fails, mark that specific step as failed but allow retry without re-transcoding. Store intermediate results separately from final job status.

Q: Is it worth implementing chunked uploads for smaller files under 50MB? A: No - chunked uploads add significant complexity. Use simple presigned URLs for files under 100MB. Only implement chunked uploads when you regularly see files over 500MB that face network reliability issues.

Q: How do you prevent the status polling endpoint from becoming a bottleneck? A: Use aggressive caching with short TTLs (5-10 seconds) and store job status in fast storage like Redis. The polling endpoint should be a simple key lookup, not a database query with joins.

Q: What’s the right polling frequency for the client? A: Start with 2-second intervals for active jobs, back off to 10-second intervals after 30 seconds, and cap at 30-second intervals for long-running jobs. Use exponential backoff on errors to prevent thundering herd.

Interview Questions

Q: Design an async file processing system that can handle 10,000 concurrent uploads with 99.9% reliability. Expected depth: Discuss horizontal scaling of job workers, queue partitioning strategies, circuit breakers around external services, monitoring and alerting for job lag, and graceful degradation modes.

Q: How would you migrate an existing synchronous file processing API to asynchronous without breaking existing clients? Expected depth: Feature flags, dual-mode operation, client SDK updates, backwards compatibility, gradual rollout strategy, and rollback plans. Mention API versioning and deprecation timelines.

Q: What are the tradeoffs between using a managed queue service (SQS) vs self-hosted (Redis/RabbitMQ)? Expected depth: Compare cost, operational overhead, feature sets, scaling characteristics, and failure modes. Discuss durability guarantees, message ordering, and integration complexity.

Q: How do you ensure exactly-once processing in an async job system? Expected depth: Idempotency keys, database transactions, at-least-once delivery with deduplication, distributed locks, and the challenges of exactly-once semantics in distributed systems.

Q: Design a rate limiting strategy for async job processing to prevent resource exhaustion. Expected depth: Per-user job limits, priority queues, job scheduling algorithms, resource quotas, back-pressure mechanisms, and circuit breaker patterns for downstream services.

Continue Learning

Want to see how these patterns hold up when traffic spikes 50x at 3 AM? That's exactly what this Premium deep-dive covers.