Exfiltrating Go current goroutine ID

by Ciprian Dorin Craciun (https://volution.ro/ciprian) on 

About obtaining Go current goroutine ID, and why sometimes we should trust the developer to do the right thing... Else the developer is forced to embark on a journey that looks more like a mission impossible data exfiltration movie, than a day-to-day job...

// permanent-link // Lobsters // HackerNews // index // RSS





Summary

A few days ago I needed to obtain the Go current goroutine identifier, and because it seems that the process borders insanity, I though others might benefit from my torment, and thus I'll try to describe the hoops I had to jump through to make it happen...

For the impatient, the bottom line is this:

The context

(This section can be safely skipped, if one knows about the topic.)

First of all what is a "goroutine"? A "goroutine" is Go's answer to concurrent / multi-core problem, namely as described by Effective Go and Go Language Specification, it is an embodiment of the green threads concept. While almost all other programming languages out there -- like Rust, Python, Ruby, Java, not to mention C/C++ -- allow the developer to create OS-backed threads, Go -- and for that matter Erlang -- have taken the road of multiplexing a fixed number of OS-backed threads as many (as in thousands even) synthetic threads managed and scheduled by the language's runtime itself. (I don't want to go into a debate about which approach is better; suffice to say that each has its own merits and issues; used properly, each model can do wonders...)

What is a "goroutine identifier"? Starting from the UNIX/POSIX model of concurrency, each process has one or many threads, and each can be uniquely identified by a number; for example on Linux they are exposed through getpid and gettid syscalls. (As a sidenote, these numbers, especially regarding processes, are actually "temporarily unique" as long as that process / thread is still alive, but after it exits, the identifier could be reused due to integer wrapping, although the wrapping can be extended through the /proc/sys/kernel/pid_max sysctl knob.) Thus a goroutine identifier is similarly an unique numerical identifier that uniquely identifies a goroutine within a Go process.

The problem

Now, the Go development team explicitly states that they don't want developers to get their hands on any "special" information that could "deanonymize" a goroutine.

This stems mainly out of the fear that one will miss-use that to "destroy" the magic behind this wonderful piece of technology... Granted they might be right, as many who have queried about them were interested in thread local storage, which finally leads to API's that are "tied" only to certain goroutines.

However, there are legitimate use-cases like for example tracking if functions run on the correct goroutine. The irony... :)

Anyway, my use-case involves a custom logging library that would use the goroutine identifier to allow putting log messages into the correct context in a heavily concurrent application.

Solution no. 1 -- Go's own dirty little secret

Unsurprisingly, the irony, the first solution can be found in Go's own source code and it involves almost ~150 lines of code that snapshot the current stack trace, format it as text, then parse it to extract the identifier from the message...

All I can say is that it's a real engineering marvel; it uses the perfect abstractions, and achieves optimal performance... :)

[...]

func curGoroutineID() uint64 {
	bp := littleBuf.Get().(*[]byte)
	defer littleBuf.Put(bp)
	b := *bp
	b = b[:runtime.Stack(b, false)]
	// Parse the 4707 otu of "goroutine 4707 ["
	b = bytes.TrimPrefix(b, goroutineSpace)
	i := bytes.IndexByte(b, ' ')
	if i < 0 {
		panic(fmt.Sprintf("No space found in %q", b))
	}
	b = b[:i]
	n, err := parseUintBytes(b, 10, 64)
	if err != nil {
		panic(fmt.Sprintf("Failed to parse goroutine ID out of %q: %v", b, err))
	}
	return n
}

var goroutineSpace = []byte("goroutine ")

[...]

Solution no. 2 -- if it would be just that simple

Someone once proposed writing a simple C function that uses the Go C runtime to obtain that field from the goroutine's runtime structure.

Unfortunately it doesn't work anymore... The underlying Go C runtime was improved in the interim to weed out any miss-features... :)

#include "runtime.h"

// func Id() int64
int64 ·Id(void) {
	return g->goid;
}

Solution no. 3 -- something is better than nothing

Someone else wrote (as part of a thread local storage library, the irony) a piece of assembly to get the current goroutine's runtime structure pointer and re-use that as a unique identifier.

However, just like with UNIX process identifiers, that piece of memory could be reused after the goroutine dies, and thus the obtained identifier is just "temporarily unique"...

TEXT ·GoID(SB),NOSPLIT,$0-8
	get_tls(CX)
	MOVQ g(CX), AX
	MOVQ AX, goid+0(FP)
	RET

Solution no. 4 -- just throw everything at the problem

So after digging a little bit through Go runtime source code, apparently there is a goid field part of the goroutine's runtime structure.

type g struct {
	[...]
	goid int64
	[...]
}

This appears to be initialized (in fact in several different ways) from a monotonically increasing 64bit counter, which will in fact provide "practical uniqueness" within the same Go process.

	gp.goid = int64(atomic.Xadd64(&sched.goidgen, 1))

Now, given that solution no. 3 gave us a way to access the current goroutine runtime structure pointer, if we would just know the offset of the goid field within this structure we would have an efficient solution to our problem.

However, given that the structure's layout can change at any time, it's impractical to hard-code its value into our source code.

Luckily we can "throw everything at the problem" by:

In fact there are two other projects on GitHub that try to achieve something similar:

The final source code

goroutineid.go

// NOTE:  Add package header and missing imports.


func GoRoutineId () (uint64) {
	
	_ptr := uintptr (goRoutinePtr ())
	_struct := (*[32]uint64) (unsafe.Pointer (_ptr))
	
	_offset := atomic.LoadUint64 (&goRoutineIdOffset)
	
	if _offset != 0 {
		
		return _struct[int (_offset)]
		
	} else {
		
		_slow := goRoutineIdSlow ()
		
		_matchedCount := 0
		_matchedOffset := 0
		
		for _offset, _value := range _struct[:] {
			if _value == _slow {
				_matchedOffset = _offset
				_matchedCount += 1
				if _matchedCount >= 2 {
					break
				}
			}
		}
		
		if _matchedCount == 1 {
			atomic.StoreUint64 (&goRoutineIdOffset, uint64 (_matchedOffset))
		}
		
		return _slow
	}
}


func goRoutinePtr () (uint64)


var goRoutineIdOffset uint64 = 0

goroutineid_amd64.s

#include "go_asm.h"
#include "textflag.h"
#include "../../src/runtime/go_tls.h"


TEXT ·goRoutinePtr(SB),NOSPLIT,$0-8
	get_tls(CX)
	MOVQ g(CX), AX
	MOVQ AX, goid+0(FP)
	RET

goroutineid_slow.go

// NOTE:  Just embed the code from Go's own source code.
//        https://github.com/golang/net/blob/master/http2/gotrack.go

func goRoutineIdSlow () (uint64) {
	[...]
}

[...]