1. Introduction

This Phase 2 addendum has been prepared in response to the real-time data-warehouse requirements raised by Aoife Duna, Chief Data and AI Officer at CompuCycle. It builds directly on Section 4.5 (Phase 2 — Dual Destination Delivery) of our April 2026 proposal, now that the target platform has been confirmed as Azure Event Hub → Databricks (replacing the earlier Snowflake/Lambda placeholder) and the outbox-pattern architecture has been agreed.

This addendum supersedes the one-line Phase 2 reference in the original proposal and scopes the full streaming pipeline as a standalone, costed engagement. It covers:

  • An outbox table as the durable source of truth for every bid event
  • Refactoring the existing HubSpot integration to consume from the outbox (rather than firing directly)
  • A second destination — an Azure Event Hub producer feeding Databricks structured streaming
  • Two independent workers with separate retry / status tracking per destination
  • Monitoring infrastructure — heartbeats, per-auction sequence numbers, auction-closed events, total bid count, and a pipeline-health endpoint
  • A wrapped event schema with dedicated metadata, optional historical backfill, and the reconciliation pull API
Why not a simple “second URL”?

A lightweight dual-webhook (the existing webhook also POSTing to a second URL) was considered, but it cannot meet the stated requirements — no-loss guarantees at auction close, strict per-auction ordering, deduplication, sequence numbers, and heartbeat monitoring. The outbox architecture below is the correct design for those guarantees and is therefore scoped as a proper addendum.

2. Confirmed Requirements & Decisions

The following decisions have been confirmed by CompuCycle and form the basis of this scope and estimate.

Item Confirmed Decision
Target platform Azure Event Hub → Databricks (structured streaming)
Authentication SAS token (Send permission)
Environments Separate dev and production Event Hubs
Partition strategy By auction_id (preserves per-auction order)
Ordering guarantee Strict per-auction order
Delivery guarantee At-least-once with event_id deduplication
Payload HubSpot field set + metadata, wrapped as { data, metadata }
Heartbeat Adaptive — 60s standard, every 5s for the last 5 minutes before close
Sequence numbers Per-auction, from 1 at auction.opened, reset each auction
Latency target 500 ms p95 / 2 s p99 (best-effort — see Section 9)
Outbox retention 30 days
Failure escalation Email to start; Microsoft Teams channel likely later
Backfill All historical data (optional add-on — component B2)
Reconciliation Pull API (Section 6.2) retained (component B3)
Hosting egress Pressable permits outbound HTTPS:443 to *.servicebus.windows.net (confirmed)

3. Answers to Your Questions

3.1 Is bid_id deterministic across retries?

Yes. The bid identifier is the auto-increment row id of the auction bid-log table (woo_ua_auction_log), assigned once by the database when the bid is recorded. It is a stable integer for the same logical bid — not a per-attempt UUID — and remains identical across every delivery retry. We will derive the event_id deterministically from it (for example, bid.placed-<bid_id>), so downstream deduplication on event_id behaves exactly as expected.

3.2 How does the near real-time latency target impact scope?

The outbox pattern decouples bid acceptance from delivery: the bid is recorded and acknowledged to the bidder immediately, and delivery to HubSpot and Event Hub happens in the worker layer. This keeps the bidder experience fast regardless of downstream speed. The 500 ms p95 / 2 s p99 target for producer→Event Hub is achievable on a healthy host, but end-to-end latency at peak concurrency is dependent on the hosting platform’s available workers at auction close (see Section 9). We design and tune for the target, but it is a best-effort SLA tied to host capacity rather than a hard guarantee.

3.3 How does full historical backfill impact cost / scope?

Backfilling all historical bids is a separate, one-time effort (component B2): a throttled batch job reads historical bid-log records, wraps each in the event envelope, and produces them to Event Hub without saturating live workers. It is priced as an optional add-on because effort scales with historical volume and it is independent of the live pipeline. The live pipeline (B1) can go live first; backfill can run afterward.

3.4 Is the Inception pull API (Section 6.2) still in scope?

Yes. The reconciliation pull API from Section 6.2 of the original proposal is retained (component B3). It supports validation, reconciliation, and verification of backfilled data — i.e. confirming that what landed in Databricks matches the source of truth.

4. Architecture Overview

The design uses the transactional outbox pattern so that HubSpot and the data warehouse stay in sync without either destination slowing the other down, and so no event is lost during high-volume moments such as auction close.

4.1 Flow
  • On each bid (and on auction.opened / auction.closed), an event row is written to a durable outbox table — the single source of truth.
  • HubSpot worker: consumes the outbox and delivers to HubSpot (a refactor of the existing integration so it reads from the outbox instead of firing directly), with its own retry and status tracking.
  • Event Hub worker (producer): consumes the same outbox and produces to the Azure Event Hub using SAS authentication, partitioned by auction_id, with its own retry, dead-letter, and status tracking.
  • Databricks subscribes to the Event Hub via structured streaming.
4.2 Guarantees
  • Delivery: at-least-once, with a deterministic event_id for downstream deduplication.
  • Ordering: strict per-auction order, achieved by using auction_id as the partition key.
  • Durability: every event persists in the outbox until both destinations confirm; rows are pruned after 30 days.
4.3 Monitoring
  • Adaptive heartbeats — 60 seconds normally, tightening to every 5 seconds for the last 5 minutes of any auction approaching its scheduled end.
  • Per-auction sequence numbers (monotonic from 1 at auction.opened) to detect gaps.
  • auction.closed events carrying total_bid_count for end-of-auction reconciliation.
  • A pipeline-health endpoint and email alerts on repeated delivery failure.

5. Event Schema & Sample Payload

Events use a wrapped envelope, { data, metadata }, as confirmed. The data block matches the existing HubSpot field set so the Phase 1 mapping can be reused; the metadata block carries streaming/monitoring fields without polluting the bid payload, allowing cleaner schema evolution.
Sample — bid.placed event:

{

“metadata”: {

“event_id”: “bid.placed-3891”,

“event_type”: “bid.placed”,

“schema_version”: “1.0”,

“producer_timestamp”: “2026-06-09T14:23:11.412Z”,

“bid_sequence”: 12

},

“data”: {

“auction_id”: 1042, “product_name”: “Dell PowerEdge R740 Server”,

“sku”: “PE-R740-2024”, “auction_type”: “Proxy”,

“starting_price”: 500.00, “current_price”: 875.00,

“start_date”: “2026-04-01T09:00:00Z”, “end_date”: “2026-04-15T18:00:00Z”,

“bid_id”: 3891, “bid_amount”: 875.00,

“bid_timestamp”: “2026-04-09T14:23:11Z”, “winning_status”: “Winning”,

“proxy_bid”: true, “proxy_bid_low_end”: 800.00, “proxy_bid_high_end”: 1200.00,

“bidder_user_id”: 587, “bidder_name”: “Emily Hollingsworth”,

“bidder_email”: “[email protected]

}

}

Sample — auction.closed event (carries total_bid_count):

{

“metadata”: {

“event_id”: “auction.closed-1042”,

“event_type”: “auction.closed”,

“schema_version”: “1.0”,

“producer_timestamp”: “2026-04-15T18:00:01.004Z”,

“bid_sequence”: 48

},
“data”: {

“auction_id”: 1042,

“total_bid_count”: 47,

“current_price”: 1180.00

}

}

6. Data Dictionary (Additions)

The data block reuses the Phase 1 field set (see the April 2026 proposal). The following new fields are added in the metadata block (unless noted).

Field (metadata) Type Description
event_id String Deterministic per logical event; used for downstream dedup (derived from the bid-log id)
event_type String bid.placed / auction.opened / auction.closed / producer.heartbeat
schema_version String e.g. “1.0”; bumped on breaking changes
producer_timestamp DateTime When the producer enqueued the event (distinct from bid_timestamp)
bid_sequence Integer Per-auction monotonic counter from 1 at auction.opened, reset each auction
total_bid_count Integer On auction.closed events (carried in the data block)

7. Design & Development (Scope)

The work is organised into three components so the optional items can be selected independently.

7.1 B1 — Core Streaming Pipeline (required)
  • Outbox table schema and the transactional write on each bid / auction event
  • Refactor of the existing HubSpot integration to consume from the outbox, with per-destination retry and status
  • Azure Event Hub producer worker: SAS auth, partition by auction_id, wrapped envelope, at-least-once with event_id, retry and dead-letter handling
  • Event taxonomy: bid.placed, auction.opened, auction.closed, producer.heartbeat
  • Per-auction sequence numbers and total_bid_count on close
  • Adaptive heartbeat (60s → 5s in the final 5 minutes) and a pipeline-health endpoint
  • Email alerting on repeated failure; 30-day outbox retention/pruning
7.2 B2 — Historical Backfill (optional add-on)
  • Throttled one-time batch job that replays historical bids into Event Hub using the same envelope, without saturating live workers
  • Progress tracking and resumability; verification against the reconciliation API
7.3 B3 — Reconciliation Pull API (Section 6.2)
  • Confirm and retain the pull API for validation, reconciliation, and backfill verification
  • Filtering by Auction (Product ID), Bidder (User ID), and date range
7.4 Documentation
  • Updated payload/schema documentation (envelope + metadata) delivered alongside the codebase to support dbt model maintenance

8. Testing & Quality Assurance

  • Unit tests for the outbox writer, envelope builder, and both workers
  • End-to-end test against a dev Event Hub → confirm events arrive in Databricks structured streaming
  • Ordering test — verify strict per-auction order via the auction_id partition key
  • Deduplication test — force a retry and confirm event_id dedup yields a single logical record
  • Failure/retry/dead-letter test — simulate timeouts and non-2xx responses
  • Load test at auction close — simulate a high-frequency burst (≈ 1 bid/second) to validate behaviour and latency under peak concurrency
  • Monitoring test — heartbeat cadence (including the 5-minute tightening), sequence-gap detection, and health endpoint

9. Assumptions & Risk Factors

Hosting / latency risk (important)

The producer and workers run on the WordPress/Pressable host. Pressable is a shared, managed platform optimised for largely static / cached sites; auction close is the opposite — uncached, high-frequency, highly concurrent. This pipeline adds work to each bid path (the outbox write) plus background workers that, for every external delivery, require a cron/worker tick + processing + database queries per API call.

Under heavy concurrent load at close, this is additive load on the same limited PHP-worker pool that previously caused auction-close latency. Page/edge caching cannot help here because the data is real-time, and front-end optimisation has limited effect under heavy load — the real-time delivery may degrade at the exact moment (auction close) when it matters most.

If Pressable cannot sustain the producer + workers at peak concurrency, a dedicated server / VPS is recommended as a last-option contingency — it gives full control over PHP workers and removes shared load-balancer request caps. This is presented as a fallback to plan for, not a requirement of this engagement.

  • The 500 ms p95 / 2 s p99 latency target is best-effort and dependent on host capacity, Event Hub responsiveness, and concurrent load at close.
  • Each external destination (HubSpot and Event Hub) is a separate outbound API call; at high volume these are additive and consume worker time and DB queries per event.
  • Pressable permits outbound HTTPS on port 443 to *.servicebus.windows.net (your team confirmed with their support), so the direct producer path is viable; any non-standard port would require the client to add an egress firewall rule.
  • Event Hub namespace/entity names and the SAS token are to be provided by the client; separate dev and production hubs are assumed.
  • Azure Event Hub and Databricks consumption, plus any Azure-side costs, are the client’s responsibility.
  • Third-party (HubSpot / Azure) retry and callback behaviour depends on those platforms’ native capabilities.
  • Effort estimates are subject to final confirmation once Event Hub access is provided and the design is locked.

10. Effort & Commercials

10.1 Effort Breakdown
Deliverable Hours
B1 — Outbox foundation + HubSpot worker refactor 20
B1 — Azure Event Hub producer worker (SAS, partition, envelope, retry/dead-letter) 24
B1 — Event taxonomy & monitoring (event types, sequence, adaptive heartbeat, health endpoint, alerts) 20
B1 — QA, load & failover testing 12
B1 Subtotal (Core — required) 76
B2 — Historical backfill (optional add-on) 16
B3 — Reconciliation pull API (confirm / retain) 6
10.2 Project Cost
Component Hours Cost (USD)
B1 — Core Streaming Pipeline (required) 76 $3,420
B2 — Historical Backfill (optional) 16 $720
B3 — Reconciliation Pull API (optional) 6 $270
Core total (B1) 76 $3,420
All components (B1 + B2 + B3) 98 $4,410

 

Hourly rate: USD 45 / hour. 

Note: This fee does not include any third-party costs such as Azure Event Hub, Databricks, hosting, domain, theme, or plugin purchases, which are to be borne by the Client.