馃摐 blog post

Avoiding select * in BigQuery

A quick workaround to avoid people doing "SELECT *" on your tables (which I consider harmful, see stackoverflow) in BigQuery is the following:

  1. Create a policy tag in the policy tags page. I create it under a new taxonomy "sql_rules", and call it "no_select_star"
  2. Once created, in the policy tag taxonomy page I enable the "enforce access control" option. If you need users to bypass this policy (e.g. needs to run bigquery copy operations) you can add the "fine graded reader" role to the principal associated with the policy tag in the same page.
  3. In the table you want to protect, add a dummy column (I call it "do_not_select_star") and assign it the policy tag
  4. Done! From now on, SELECT * will throw an error.

For more info see Google Cloud docs

馃摐 blog post

BigQuery performance best practice: use semi joins when possible

SQL is an amazing language, it lets you declaratively say what you want, and the engine figures out for you the best way to return it to you. Or should I say, it figures out the best way to return it to you given the information it has and the capabilities of the engine itself.

In this post, we鈥檒l discuss a performance optimization technique for BigQuery (also other advanced enough Enteprise Data Warehouses and databases support SEMI JOINS, but I'll focus on BigQuery since it's the one I use the most these days): using semi joins.

Continue reading