BigQuery performance best practice: use semi joins when possible
SQL is an amazing language, it lets you declaratively say what you want, and the engine figures out for you the best way to return it to you. Or should I say, it figures out the best way to return it to you given the information it has and the capabilities of the engine itself.
In this post, we鈥檒l discuss a performance optimization technique for BigQuery (also other advanced enough Enteprise Data Warehouses and databases support SEMI JOINS, but I'll focus on BigQuery since it's the one I use the most these days): using semi joins.