Authentication on GCP - The binary with a thousand faces

July 11, 2020
gcp authentication golang

Authentication is, unfortunately a pain in the keister. You want need to do it securely, it’s better when it’s straightforward, but at the end of the day, all it does is allow multiple parties to agree on the identity your code is running as. Once identity is communication, then services can evaluate things like permissions and authorization.

There’s a lot of ways to authenticate. You can navigate the oauth2 protocol directly. Tools like the cloud SDK can build and store credentials for reuse. Various GCP services provide compute abstractions for running arbitrary processes, and each can implement it’s own way to associate an identity to something like a service account.

For those unfamiliar, service accounts are essentially a shorthand way of saying “I want to associate this program or workload with an identity when it talks to other services”.

So, what’s a developer to do to support all these possible authentication mechanisms? The answer is generally…as little as possible.

The World of Application Default Credentials (ADC)

In GCP, tools and libraries support a method of helping automatically pick an appropriate mechanism for getting an identity called Application Default Credentials (ADC). Think of ADC as a meta-protocol for discovering the most appropriate source of credentials for your code. Rather than forcing you, as a developer, to explicitly choose or configure the mechanisms, the ADC meta-protocol is built into the libraries provided by Google.

ADC knows how to find identity using a variety of sources:

If you want to know more by seeing the implementations directly, check some of them out:

Showing ADC in action: One binary, multiple identities

Here’s a simple CLI written in Go that talks to two endpoints: the oauth2 userinfo endpoint (accepts an identity token and responds with the identity as an email-like identity), and the BigQuery service.

Why both? Pragmatism and laziness:

So, let’s get started.

Build a binary

I’ve used the go toolchain (e.g. go build) to generate a static binary that we can use to demonstrate ADC working in the background. The binary name is simply called simple_cli. I’ll be starting this while running on a local osx environment, but we’ll need to revisit this down the line as we we’ll move to a cloud VM later.

Example: local environment, no credentials

Let’s first destroy possible local sources of credentials for the binary. One possible source is the GOOGLE_APPLICATION_CREDENTIALS environment variable, and another is the application default credentials I’ve defined through the gcloud SDK tools installed on my system.

% unset GOOGLE_APPLICATION_CREDENTIALS
% rm ~/.config/gcloud/application_default_credentials.json

By destroying local sources of credentials, I should have no latent credentials on my system. So let’s see try running the program and see what it reports:

% ./simple-cli

2020/07/11 19:28:21 Couldn't resolve identity: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

2020/07/11 19:28:22 bigquery.NewClient: bigquery: constructing client: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Recap: We can see in this case that ADC didn’t find credentials, and a (hopefully useful) error was generated pointing the caller towards the ADC concept documentation.

In both requests, we’ve demonstrated that we don’t have any credentials getting resolved from ADC. The oauth2 endpoint failed to generate a success response, and BigQuery is similarly flummoxxed. It also indirectly demonstrates an important GCP concept: “Public” often maps to allAuthenticatedUsers, e.g. you need an identity but the identity doesn’t matter. In this case, without an identity I’m granted no access to enumerate public resources in BigQuery.

Example: local environment, user’s gcloud credentials

Now, let’s use gcloud to setup some application default credentials (which will invoke a browser flow and pickup some credential I already have present in the browser). Then, let’s invoke the sample program again.

% gcloud auth application-default login
(output elided)

% ./simple-cli

2020/07/11 19:39:27 Using identity: shollyman.demo@gmail.com

Listing up to 5 datasets:
dataset: austin_311
dataset: austin_bikeshare
dataset: austin_crime
dataset: austin_incidents
dataset: austin_waste

Recap: ADC detected that I have default credentials stored as part of my use of gcloud, and used those transparently for me. The userinfo endpoint correctly identifies my gmail account as the identity, and BigQuery is now allowing me to enumerate datasets:

Example: local environment, service account credentials

Now, let’s indicate we prefer to use a local credential file, which should trump the gcloud credential choice. I’ve created a service account in the Cloud Console (via the IAM & Admin –> Service Accounts menu) and downloaded the JSON credentials to my local system.

% export GOOGLE_APPLICATION_CREDENTIALS=/Users/myuserid/sample-service-cert.json

% ./simple-cli

2020/07/11 19:41:55 Using identity: demo-service-account@throwaway-project123.iam.gserviceaccount.com

Listing up to 5 datasets:
dataset: austin_311
dataset: austin_bikeshare
dataset: austin_crime
dataset: austin_incidents
dataset: austin_waste

Recap: I’ve indicated I want to explicitly use the credentials found in that json file. ADC by default will treat presence of the GOOGLE_APPLICATION_CREDENTIALS as the priority place to find credentials, so it used the service account credentials instead of the gcloud credentials, which are still present.

Another thing to note: I’ve named my service account demo-service-account, but I get a qualified domain based on the project I was in when I created the service account. Put another way, the service account is a resource in the throwaway-project123 project. The iam.gserviceaccount.com component is an idiom used by service accounts created in this way, but it’s entirely possible your service account has a different suffix.

Example: remote environment (compute engine), various configurations

So now, let’s try using one of the GCP compute services to run our code, and see how ADC interacts there. I’m intentionally going to do some wrong things to show how even with ADC things can go off the rails. I’ll be using Compute Engine here, but you the experience should be similar with other compute-like services.

But first, I need to solve a quick problem I alluded to earlier. My simple_cli binary is built for my local osx environment, but I want to run a linux image. Fortunately, I can just recompile for another platform, and we’ll call this binary simple_cli_linux:

CGO_ENABLED=0 GOOS=linux go build -mod=readonly -v -o simple_cli_linux

Now, I have the simple_cli binary for my local use, and simple_cli_linux which I can copy to remote linux machines and run there.

Attempt 1: no scopes

Now, let’s try running inside a Compute engine VM instance. I’m going to create this via the gcloud compute subcommands of the SDK, because I think it’s more explicit than some UI screenshots from the cloud console. More info: gcloud compute instances create

% gcloud compute instances create demo-node \
  --zone=us-central1-a \
  --machine-type=f1-micro \
  --no-scopes 
Created [https://www.googleapis.com/compute/v1/projects/throwaway-project123/zones/us-central1-a/instances/demo-node].

Now, let’s copy the binary over from my local machine to the VM:

% gcloud compute scp simple_cli_linux demo-node:

And finally, we can login to the VM and run our binary:

% gcloud compute ssh demo-node --zone=us-central1-a
Linux demo-node 4.19.0-9-cloud-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Jul 12 02:56:28 2020 from 97.113.140.231

$ ./simple_cli_linux

2020/07/12 02:58:35 Couldn't resolve identity: googleapi: Error 401: Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project., unauthorized

Listing up to 5 datasets:
2020/07/12 02:58:35 iterator failed: googleapi: Error 403: Request had insufficient authentication scopes.
More details:
Reason: insufficientPermissions, Message: Insufficient Permission

$ exit

Recap: Something has gone quite wrong. We logged into the VM, but the code wasn’t able to successfully run. Why? Even though the VM has an identity, I’ve intentionally spun up my VM with the --no-scopes flag. Authentication scopes are essentially just the list of services we want this VM to be able to communicate with, and affects how identity tokens are constructed. For the curious, here’s all the public scopes at Google.

Attempt 2: add the cloud-platform scope

So, with no scopes I have an identity, but a service it’s not scoped for can’t consume it. So let’s widen our scopes and include the default “cloud-platform” scope.

But first, we’ll destroy our existing VM and start over.

% gcloud compute instances delete demo-node
 
% gcloud compute instances create demo-node \
  --zone=us-central1-a \
  --machine-type=f1-micro \
  --scopes=https://www.googleapis.com/auth/cloud-platform

% gcloud compute scp simple_cli_linux demo-node:

% gcloud compute ssh demo-node

myuserid@demo-node:~$ ./simple_cli_linux

2020/07/12 03:11:05 Couldn't resolve identity: googleapi: Error 401: Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project., unauthorized

Listing up to 5 datasets:
dataset: austin_311
dataset: austin_bikeshare
dataset: austin_crime
dataset: austin_incidents
dataset: austin_waste

myuserid@demo-node:~$ exit

Recap: Interestingly, we were able to talk to BigQuery, but not the oauth2 userinfo endpoint. Why? It’s back to scopes again: the cloud-platform scope allows the token to work with the GCP services, but it does NOT contain the userinfo service.

Attempt 3: cloud-platform and userinfo scope

So, let’s adjust our scopes slightly, just to show a least-privilege solution. We can fix this with explicitly including just the userinfo scope and the bigquery scope via the --scopes flag.

% gcloud compute instances delete demo-node

% gcloud compute instances create demo-node \
  --zone=us-central1-a \
  --machine-type=f1-micro \
  --scopes=https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/userinfo.email

% gcloud compute scp simple_cli_linux demo-node:

% gcloud compute ssh demo-node

myuserid@demo-node:~$ ./simple_cli_linux

2020/07/12 04:01:08 Using identity: 406558270276-compute@developer.gserviceaccount.com

Listing up to 5 datasets:
dataset: austin_311
dataset: austin_bikeshare
dataset: austin_crime
dataset: austin_incidents
dataset: austin_waste

myuserid@demo-node:~$ exit

% gcloud compute instances delete demo-node

Recap: We’ve specified our VM with a sufficient set of scopes for talking to both the services we’re interested in, and now we’re getting a different identity (406558270276-compute @ developer.gserviceaccount.com). Where’s it coming from? It’s actually the compute engine default service account for my project, which is the documented behavior of the gcloud command:

If not provided, the instance will use the project’s default service account.

As an aside, you can also spin up the VM using a specific service account (via the --service-account flag), which is an exercise I’ll leave to the reader.

Conclusion

Hopefully this was a useful exploration of some of the identity functionality you may encounter when working with GCP services and libraries.

My goal was to explain and demonstrate how Application Default Credentials worked, by having it resolve several different kinds of credentials with no changes necessary for the application code. I also threw some minor landmines in my path (fighting with compute engines ability to limit scopes).