This Phase 2 addendum has been prepared in response to the real-time data-warehouse requirements raised by Aoife Duna, Chief Data and AI Officer at CompuCycle. It builds directly on Section 4.5 (Phase 2 — Dual Destination Delivery) of our April 2026 proposal, now that the target platform has been confirmed as Azure Event Hub → Databricks (replacing the earlier Snowflake/Lambda placeholder) and the outbox-pattern architecture has been agreed.
This addendum supersedes the one-line Phase 2 reference in the original proposal and scopes the full streaming pipeline as a standalone, costed engagement. It covers:
| Why not a simple “second URL”?
A lightweight dual-webhook (the existing webhook also POSTing to a second URL) was considered, but it cannot meet the stated requirements — no-loss guarantees at auction close, strict per-auction ordering, deduplication, sequence numbers, and heartbeat monitoring. The outbox architecture below is the correct design for those guarantees and is therefore scoped as a proper addendum. |
The following decisions have been confirmed by CompuCycle and form the basis of this scope and estimate.
| Item | Confirmed Decision |
|---|---|
| Target platform | Azure Event Hub → Databricks (structured streaming) |
| Authentication | SAS token (Send permission) |
| Environments | Separate dev and production Event Hubs |
| Partition strategy | By auction_id (preserves per-auction order) |
| Ordering guarantee | Strict per-auction order |
| Delivery guarantee | At-least-once with event_id deduplication |
| Payload | HubSpot field set + metadata, wrapped as { data, metadata } |
| Heartbeat | Adaptive — 60s standard, every 5s for the last 5 minutes before close |
| Sequence numbers | Per-auction, from 1 at auction.opened, reset each auction |
| Latency target | 500 ms p95 / 2 s p99 (best-effort — see Section 9) |
| Outbox retention | 30 days |
| Failure escalation | Email to start; Microsoft Teams channel likely later |
| Backfill | All historical data (optional add-on — component B2) |
| Reconciliation | Pull API (Section 6.2) retained (component B3) |
| Hosting egress | Pressable permits outbound HTTPS:443 to *.servicebus.windows.net (confirmed) |
Yes. The bid identifier is the auto-increment row id of the auction bid-log table (woo_ua_auction_log), assigned once by the database when the bid is recorded. It is a stable integer for the same logical bid — not a per-attempt UUID — and remains identical across every delivery retry. We will derive the event_id deterministically from it (for example, bid.placed-<bid_id>), so downstream deduplication on event_id behaves exactly as expected.
The outbox pattern decouples bid acceptance from delivery: the bid is recorded and acknowledged to the bidder immediately, and delivery to HubSpot and Event Hub happens in the worker layer. This keeps the bidder experience fast regardless of downstream speed. The 500 ms p95 / 2 s p99 target for producer→Event Hub is achievable on a healthy host, but end-to-end latency at peak concurrency is dependent on the hosting platform’s available workers at auction close (see Section 9). We design and tune for the target, but it is a best-effort SLA tied to host capacity rather than a hard guarantee.
Backfilling all historical bids is a separate, one-time effort (component B2): a throttled batch job reads historical bid-log records, wraps each in the event envelope, and produces them to Event Hub without saturating live workers. It is priced as an optional add-on because effort scales with historical volume and it is independent of the live pipeline. The live pipeline (B1) can go live first; backfill can run afterward.
Yes. The reconciliation pull API from Section 6.2 of the original proposal is retained (component B3). It supports validation, reconciliation, and verification of backfilled data — i.e. confirming that what landed in Databricks matches the source of truth.
The design uses the transactional outbox pattern so that HubSpot and the data warehouse stay in sync without either destination slowing the other down, and so no event is lost during high-volume moments such as auction close.
Events use a wrapped envelope, { data, metadata }, as confirmed. The data block matches the existing HubSpot field set so the Phase 1 mapping can be reused; the metadata block carries streaming/monitoring fields without polluting the bid payload, allowing cleaner schema evolution.
Sample — bid.placed event:
{
“metadata”: {
“event_id”: “bid.placed-3891”,
“event_type”: “bid.placed”,
“schema_version”: “1.0”,
“producer_timestamp”: “2026-06-09T14:23:11.412Z”,
“bid_sequence”: 12
},
“data”: {
“auction_id”: 1042, “product_name”: “Dell PowerEdge R740 Server”,
“sku”: “PE-R740-2024”, “auction_type”: “Proxy”,
“starting_price”: 500.00, “current_price”: 875.00,
“start_date”: “2026-04-01T09:00:00Z”, “end_date”: “2026-04-15T18:00:00Z”,
“bid_id”: 3891, “bid_amount”: 875.00,
“bid_timestamp”: “2026-04-09T14:23:11Z”, “winning_status”: “Winning”,
“proxy_bid”: true, “proxy_bid_low_end”: 800.00, “proxy_bid_high_end”: 1200.00,
“bidder_user_id”: 587, “bidder_name”: “Emily Hollingsworth”,
“bidder_email”: “[email protected]”
}
}
Sample — auction.closed event (carries total_bid_count):
{
“metadata”: {
“event_id”: “auction.closed-1042”,
“event_type”: “auction.closed”,
“schema_version”: “1.0”,
“producer_timestamp”: “2026-04-15T18:00:01.004Z”,
“bid_sequence”: 48
},
“data”: {
“auction_id”: 1042,
“total_bid_count”: 47,
“current_price”: 1180.00
}
}
The data block reuses the Phase 1 field set (see the April 2026 proposal). The following new fields are added in the metadata block (unless noted).
| Field (metadata) | Type | Description |
|---|---|---|
| event_id | String | Deterministic per logical event; used for downstream dedup (derived from the bid-log id) |
| event_type | String | bid.placed / auction.opened / auction.closed / producer.heartbeat |
| schema_version | String | e.g. “1.0”; bumped on breaking changes |
| producer_timestamp | DateTime | When the producer enqueued the event (distinct from bid_timestamp) |
| bid_sequence | Integer | Per-auction monotonic counter from 1 at auction.opened, reset each auction |
| total_bid_count | Integer | On auction.closed events (carried in the data block) |
The work is organised into three components so the optional items can be selected independently.
| Hosting / latency risk (important)
The producer and workers run on the WordPress/Pressable host. Pressable is a shared, managed platform optimised for largely static / cached sites; auction close is the opposite — uncached, high-frequency, highly concurrent. This pipeline adds work to each bid path (the outbox write) plus background workers that, for every external delivery, require a cron/worker tick + processing + database queries per API call. Under heavy concurrent load at close, this is additive load on the same limited PHP-worker pool that previously caused auction-close latency. Page/edge caching cannot help here because the data is real-time, and front-end optimisation has limited effect under heavy load — the real-time delivery may degrade at the exact moment (auction close) when it matters most. If Pressable cannot sustain the producer + workers at peak concurrency, a dedicated server / VPS is recommended as a last-option contingency — it gives full control over PHP workers and removes shared load-balancer request caps. This is presented as a fallback to plan for, not a requirement of this engagement. |
| Deliverable | Hours |
|---|---|
| B1 — Outbox foundation + HubSpot worker refactor | 20 |
| B1 — Azure Event Hub producer worker (SAS, partition, envelope, retry/dead-letter) | 24 |
| B1 — Event taxonomy & monitoring (event types, sequence, adaptive heartbeat, health endpoint, alerts) | 20 |
| B1 — QA, load & failover testing | 12 |
| B1 Subtotal (Core — required) | 76 |
| B2 — Historical backfill (optional add-on) | 16 |
| B3 — Reconciliation pull API (confirm / retain) | 6 |
| Component | Hours | Cost (USD) |
|---|---|---|
| B1 — Core Streaming Pipeline (required) | 76 | $3,420 |
| B2 — Historical Backfill (optional) | 16 | $720 |
| B3 — Reconciliation Pull API (optional) | 6 | $270 |
| Core total (B1) | 76 | $3,420 |
| All components (B1 + B2 + B3) | 98 | $4,410 |
Hourly rate: USD 45 / hour.
Note: This fee does not include any third-party costs such as Azure Event Hub, Databricks, hosting, domain, theme, or plugin purchases, which are to be borne by the Client.