TechBlast - Tech News for Builders and Operators

Our team hit a familiar Snowflake paradox: slower ETL runs arrived at the same time as FinOps alerts about rising credits.

The warehouse in question was WH_ETL_BRONZE_01, a multi-cluster warehouse dedicated to Bronze layer ingestion and merge workloads. What looked like a simple cost problem turned out to be a workload-isolation and concurrency problem.

The Problem

We started with this setup:

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 8;  -- default behavior

And we saw both of these at once:

Higher queue times (queued_overload_time)
Higher daily credits

The critical issue was workload mix. Long-running MERGE statements (30 to 60 minutes) were running alongside tiny queries. When too many heavy MERGE statements landed on the same node/cluster, they competed for resources and all slowed down.

In other words, even with multi-cluster enabled, assignment patterns could create unstable performance if too many heavyweight queries were packed together.

What We Changed

This was not a single before/after discovery from history. We ran controlled experiments in sequence.

Phase 1: Baseline

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 8;

Phase 2: First Experiment

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 2;

Goal: force scale-out behavior earlier and reduce the chance that many heavy MERGE jobs share the same node.

Phase 3: Final Optimization

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'SMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 3;

This became the best balance for our workload profile.

Why This Helped

From deeper analysis of the workload behavior:

The number of queries and output volume could look similar across runs, but bytes scanned at table level could still rise.
Heavy MERGE overlap was a key instability driver.
On slow runs, many heavy MERGE statements could get assigned to the same node, causing resource contention.
On fast runs, fewer heavy MERGE statements shared a node, leaving room for short-running queries.

Lowering concurrency reduced the tendency to over-pack heavyweight merges. Then moving to SMALL provided enough per-query resources to cut elapsed time and queueing further.

Configuration Summary

Phase	Size	Max Clusters	Max Concurrency	Intent
Baseline	XSMALL	10	8	Initial setup
Experiment	XSMALL	10	2	Force earlier scale-out / reduce heavy-query packing
Final	SMALL	10	3	Balance throughput, queueing, and cost

The two final intentional changes vs baseline were:

WAREHOUSE_SIZE: XSMALL -> SMALL
MAX_CONCURRENCY_LEVEL: 8 -> 3

How We Measured

We used Snowflake ACCOUNT_USAGE views for both performance and cost, comparing baseline and final optimization windows.

Query Performance (Before/After)

SELECT
  CASE
    WHEN start_time < '2026-05-05' THEN 'XSMALL Period'
    ELSE 'SMALL Period'
  END AS period,
  COUNT(*) AS total_queries,
  ROUND(AVG(total_elapsed_time)/1000, 2) AS avg_elapsed_sec,
  ROUND(AVG(queued_overload_time)/1000, 2) AS avg_queued_sec,
  ROUND(MAX(total_elapsed_time)/1000, 2) AS max_elapsed_sec
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE warehouse_name = 'WH_ETL_BRONZE_01'
  AND start_time >= '2026-04-30'
GROUP BY period
ORDER BY period;

Credit Consumption (Daily)

SELECT
  CASE
    WHEN start_time::DATE < '2026-05-05' THEN 'XSMALL Period'
    ELSE 'SMALL Period'
  END AS period,
  ROUND(SUM(credits_used_compute) / COUNT(DISTINCT start_time::DATE), 2) AS daily_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE warehouse_name = 'WH_ETL_BRONZE_01'
  AND start_time::DATE >= '2026-04-30'
GROUP BY period
ORDER BY period;

Results

Cost

Period	Size	Daily Credits
Baseline (5 days)	XSMALL	15.12/day
Optimized (3 days)	SMALL	13.50/day

Daily credits dropped by about 11% after upsizing.

Performance

Metric	XSMALL	SMALL	Improvement
Avg query time	26.15s	24.13s	8% faster
Max query time	3,475s	2,850s	18% faster
Total queue time	2.20 min	0.28 min	87% less queuing

Note: The intermediate XSMALL plus concurrency 2 phase was used to validate behavior and direction. The published KPI table above compares the stable baseline period against the final tuned period.

The Counterintuitive Lesson

In some workloads, a bigger warehouse can cost less.

1. Faster execution means earlier suspend

Queries completed faster on SMALL, so the warehouse reached AUTO_SUSPEND = 60 sooner. Less runtime plus less idle overhead translated to fewer daily credits.

2. Higher concurrency can reduce cluster sprawl

Concurrency must match workload shape. In our case, moving from default 8 down to 2 first improved isolation for heavy MERGE jobs, then landing at 3 with SMALL gave the right balance of throughput and stability.

3. Less queueing improves throughput and utilization

With 87% less queue time, work finished in tighter windows. The warehouse did useful work and went to sleep sooner.

Key Takeaways

Do not assume smaller is always cheaper.
Monitor queued_overload_time closely. Persistent queueing often means you are under-sized or under-concurrented.
Tune MAX_CONCURRENCY_LEVEL with warehouse size and workload type; they should be optimized together.
Separate or schedule heavyweight workflows when possible to reduce overlap and contention.
Use both QUERY_HISTORY (performance) and WAREHOUSE_METERING_HISTORY (cost) for before/after decisions.
Keep aggressive auto-suspend for bursty workloads. Faster queries plus short suspend windows compound savings.

Closing Thought

Warehouse right-sizing is a performance and FinOps exercise, not just a cost-control exercise. In many real workloads, a slightly larger warehouse with slightly higher concurrency can win on both speed and spend.

Disclosure and Scope

Findings were validated with input from Snowflake Support and Professional Services.
Configuration decisions, measurements, and outcomes reflect our specific environment and workload profile.
Results may vary for other teams depending on query patterns, data shape, and scheduling behavior.
Warehouse and workload identifiers in this post are anonymized.

This analysis used Snowflake built-in ACCOUNT_USAGE views only. No third-party monitoring stack required.

How a Simple Warehouse Resize Saved Us 11% in Daily Credits While Boosting Performance