wallcrawler

Created: June 10, 2025
Last commit: October 3, 2025
Go68.3%
TypeScript23.6%
Shell5.4%
Makefile1.1%
Dockerfile0.8%
+1 more
remote browser platformbrowser automationLLM‑powered browsingself‑hosted/cloud deploymentsession management / remote browser sessionsUI embedding of live browser sessionsinfrastructure as code / cloud infrastructureCDKAWS CDKCloud Development KitAPI GatewayAPI GWECSFargateEventBridgeDynamoDBS3Amazon S3RedisCloudWatchJWTJSON Web TokenLLMLarge Language ModelSDKREST APICLIpnpmmulti‑arch Dockermulti‑architecture DockerBrowserbaseStagehandBrowserViewportpnpm (package manager)AWS CLICDK Toolkit (cdk)Docker (local builds, multi‑arch images)DynamoDB (data store)S3 (object storage)Redis (cache/data store)CloudWatch (logging/observability)API Gateway (request routing)EventBridge (eventing)
README.md

Wallcrawler Monorepo

Self‑hosted, AWS‑backed remote browser platform with Stagehand LLM browsing, compatible with Browserbase APIs. This monorepo contains the infrastructure, backend services, SDK, and UI components to run Wallcrawler in your own AWS account.

Quick links

Packages

Prerequisites

  • Node.js >= 18 and pnpm >= 8
  • Go >= 1.21 (for backend)
  • AWS CLI configured for your target account
  • Docker (for local builds and multi‑arch images)

Getting started

# 1) Initialize submodules (if any)
pnpm install:submodules

# 2) Install dependencies
pnpm install

# 3) Build everything
pnpm build

# 4) Generate local env (CDK helpers)
pnpm generate-env

# 5) Deploy (see Deployment Guide for environments/config)
pnpm deploy

Additional scripts:

  • Lint: pnpm lint
  • Tests: pnpm test
  • Dev (package‑scoped): pnpm -r dev
  • CDK Toolkit: pnpm cdk

Configuration

The backend reads several environment variables at runtime:

  • WALLCRAWLER_MAX_SESSION_TIMEOUT — Maximum allowed session duration in seconds (defaults to 3600).
  • PROJECTS_TABLE_NAME, API_KEYS_TABLE_NAME, CONTEXTS_TABLE_NAME — Automatically injected by the CDK stack for the Lambda functions.
  • CONTEXTS_BUCKET_NAME — S3 bucket that stores browser context archives for persisted sessions.
  • SESSIONS_TABLE_NAME — Sessions table (wallcrawler-sessions by default).
  • Contexts (browser profiles) remain project-scoped. If you expose contexts to end users, ensure your application filters by both projectId and your own user identifier before forwarding requests to Wallcrawler.
  • API keys can be associated with multiple projects. When a key has more than one project, include x-wc-project-id on each request to select the target project; the authorizer denies access if the requested project is not in the key's allowlist.

Data Stores

  • DynamoDB
    • wallcrawler-sessions — Session metadata, lifecycle history, and connection info.
    • wallcrawler-projects — Project configuration (default timeout, concurrency limits, billing tier).
    • wallcrawler-api-keys — SHA-256 hashed API keys mapped to one or more projects (projectIds attribute) with status flags.
    • wallcrawler-contexts — Browser context metadata and S3 object keys. Add per-user ownership metadata in your app if you need user-level isolation.
  • S3
    • wallcrawler-contexts-* — Stores compressed Chrome user data directories for persisted contexts.

API compatibility

Wallcrawler provides Browserbase‑compatible APIs and Stagehand endpoints. For exact routes, request/response shapes, and streaming behavior, see:

  • docs/api/api-endpoints-reference.md
  • docs/api/sdk-integration-guide.md

Architecture overview

High‑level design, event flows, and data models are covered in the docs referenced above. For a visual, see docs/infra/wallcrawler-aws-architecture.png.