When I see people trying to test the performance of queries, I typically see them making a few key mistakes that may affect their results.
To properly benchmark queries on Snowflake, we should first understand the different levels of caching on Snowflake:
- Results cache, where Snowflake caches the results of queries by query text, so that if you send the same query twice, it will only be evaluated once. This in the condition that the query does not contain non deterministic functions, that the underlying objects have not changed, and that the query text hashes to the exact same value. This type of cache was less common in legacy data warehouses, because they were not built like LAMP websites.
- SSD cache, where each Snowflake worker caches read partitions to speed up queries, as this speeds up IO considerably. This is more common in other data warehouses, and is highly dependant on the state of system, which is typically used by many users simultaneously.
Now going back to benchmarking, we should apply some rigour such that results are reproducible, and the margin of error is quantified. As some randomness effect occur in a complex system like a data warehouse, we want to execute our query multiple times, and take the mean execution time as well as the standard deviation.
So first, we should disable the query results cache.
Second, we should decide what we want to measure. Across the lifetime of a warehouse, query performance will not look the same depending on the state of the system. We should either measure query execution time at cold start, or we should report a stable execution time after warm up. Warm state can simply be measured as the time when the execution time stabilises. For cold start, we can use the fact that snowflake warehouses are designed to be transient. We could create a new warehouse every time to be sure to get a clean cache, but a better way to invalidate the cache is to simply suspend the warehouse. Regardless of whether we resume it in 2 seconds, the Snowflake engine will have to invalidate the cache, thus we can trigger a cold start on demand.
So in summary, to benchmark a query we should:
- Disable the results cache
- Compute the mean execution time over a significant enough sample
- Separate cold start vs warm up