Go 1.26
This post is not complete.
The new release is not headline-grabbing, but I have been playing with the beta and release candidate (esp. new SIMD stuff) since before Christmas and there is some very cool stuff.
The new garbage collector, which I discussed in detail last time, is now the default. It is good (but if you suspect it is causing issues you can still revert to the previous one).
As well as these I’ll look at these things:
- the built-in function
new()has a useful overload - generic types can now refer to themselves
- goroutine leaks can be found using a new profiler
- go fix has been rewritten (in preparation for Go 2.0? ;)
- standard library additions for performance and security
But my favourite addition is the archsimd package that makes using SIMD instructions easy to use. I did a lot of tests of this and found that it can boost the performance of code by up to 20 times!
Unfortunately, archsimd currently only supports amd64 (Intel/AMD). I’ll undoubtedly look at it again when it supports other architectures (like ARM) and hopefully some sort of high-level “architecture independent” package.
Standard Library
SIMD
SIMD (single instruction-multiple data) refers to a special class of CPU instructions that have appeared over time in microprocessor instruction sets. They let you perform the same operations on an array or “vector” of values (integer or float) in about the same time it would normally take to operate on a single value.
If you can find a way to use them in your code you can expect at least an order of magnitude (10X) speed up in code. This sort of improvement can be crucial for low-level “hot” functions.
SIMD Uses
SIMD has esoteric uses that have become important in areas such as 3D graphics, AI, encryption, modelling, etc. In particular, a lot of scientific and technical software requires vector and matrix processing.
SIMD instructions are now ubiquitous in modern CPUs. They are often used in unexpected areas. For example, the new Green Tea GC (see below) uses SIMD instructions to improve speed of garbage collection.
Most modern compilers (or their code-generation back end) will make use of SIMD instructions if supported by the instruction set they are generating for. When you build code in Go you are, depending on the architecture, perhaps already using SIMD instructions.
Why do I need it?
So if the compiler already uses SIMD then why do you need to worry about it?
Unfortunately, compilers aren’t clever enough to understand what you are trying to do. It may be possible to translate very simple loops into SIMD instruction(s) but this is a difficult problem. The Go compiler doesn’t even try, which (I believe) is the right decision.
It really is up to you to understand how you can use SIMD in your code. I’ll give a few examples to show what is possible.
Note that, until now, you had to use assembler to use SIMD instructions in Go. (Some C/C++ compilers have supported SIMD intrinsics for at least a decade, but I have found these cumbersome to use.)
Using SIMD in Go 1.26
The good news is that the approach taken in Go (as usual :) makes things simple. You import the simd/archsimd package
Although this just appears to be a package there is a lot of support built into the compiler. This allows multiple operations to be performed directly in SIMD registers without involving memory.
As a simple example, here is a function that operates on three arrays (of 32 bytes) returning an array of the result. On my machine this loop takes about 27 nsecs to run.
func Add3(x, y, z [32]int8) (r [32]int8) {
for i := range x {
r[i] = x[i] + y[i] + z[i]
}
return
}
Doing the same thing with archsimd takes about 7 nsecs.
import "simd/archsimd"
...
func Add3(x, y, z [32]int8) (r [32]int8) {
xvec := archsimd.LoadInt8x32(&x)
yvec := archsimd.LoadInt8x32(&y)
zvec := archsimd.LoadInt8x32(&z)
result := xvec.Add(yvec).Add(zvec)
result.Store(&r)
return
}
Moreover, the actual line that does the work (using the two calls to the Add() method) generates just 2 SIMD instructions and runs in less than 2 nsecs.
Types
All SIMD operations work with “vector” registers. The first SIMD instructions performed integer operations on 128-bit registers, treating them as vectors of 8-bit, 16-bit, 32-bit or 64-bit integers. Later instructions used 256-bit and even 512-bit registers and also performed floating point operations on 32-bit and 64-bit (IEEE) floats.
For each register size Go has a type that supports storing eight different integer types (int8, uint8, int16, …, uint64) and two floating point types (float32 and float64). That is, there are 30 SIMD “vector” types: 3 different register sizes which can store 10 different Go types.
| BITS | Integer | Unsigned | Float | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 128 | int8x16 | int16x8 | int32x4 | int64x2 | uint8x16 | ... | uint64x2 | float32x4 | float64x2 |
| 256 | int8x32 | int16x16 | int32x8 | int64x4 | uint8x32 | ... | uint64x4 | float32x8 | float64x4 |
| 512 | int8x64 | int16x32 | int32x16 | int64x8 | uint8x64 | ... | uint64x8 | float32x16 | float64x8 |
As we saw in the above example, using the int8x32 type we can perform the same operation on 32 bytes simultaneously.
The package also has other types for masks (see below).
Operations
TODO: masks, saturated ops
Build Considerations
Since archsimd currently only supports Intel/AMD instructions it is recommended that you use build tags to ensure that the compiler only builds the code for amd64 architecture. You can do this by including the build tag at the start of your source file:
//go:build amd64
package ...
or, preferably, using a file name that ends in _amd64.go such as simd_test_amd64.go.
See the relevant section in my blog on build tags Targeting OS/ARCH
Of course, if you also build for other architecture(s) you need the equivalent code in your package for those architecture(s). You could use separate file for each architecture such as no_simd_arm.go, no_simd_wasm.go, etc. More likely, you would just have a single file (perhaps called no_simd.go) with a !amd64 build tag:
//go:build !amd64
package ...
Runtime Considerations
TODO: checking for op support
Language
New new() overload
Go has really nice support for creating literals of just about any type. All languages have numeric literals but Go lets you create literals for structs, maps, etc.
num := 42 // integer literal
loc := point{x:1, y:2} // struct literal
lut := dict{ "a": 10, "b": 20} // map literal
You can also create pointer literals but only composite types.
pLoc := &point{x:1, y:2} // ptr to struct literal
pLut := &dict{ "a": 10, "b": 20} // ptr to map literal
pNum := &42 // ERROR
I always thought it was a bit of a oversight that you can’t similarly create a pointer to a basic type without first creating a temporary. In Go 1.26 there is an overload of the new() function that allows you to do just that.
pNum := new(42)
In other words, the argument to new() can now be a type (as before) or an expression. This is particularly useful for structs with pointer fields, which are often used to differentiate between null value when dealing with databases, serialisation or anything that has optional values.
p1 := new(string) // type parameter - new returns ptr to ""
p2 := new(strings.Repeat("x", 10)) // value parameter - returns ptr to "xxxxxxxxxx"
p3 := new(nil) // ERROR - nil has no concrete type
Don’t forget that, although new() always returns a pointer, the value pointed to is not necessarily placed on the (garbage collected) heap. It’s up to the Go compiler’s escape analysis to determine whether the object lives on the heap or somewhere else (stack).
Runtime
Green Tea GC
Last time I talked about the new heap manager.
The change was given the code name Green Tea GC though the changes are more about the data structures used for the heap than changes to the garbage collection strategy.
Nothing much has changed with GTGC except that it is now the default.
If you think the change is causing issues then, in Go 1.26, you can go back to the old one by setting the experiment nogreenteagc. Of course, you should file an issue, with a code example (if possible)
The GTGC was a huge change, so I was expecting something impressive from it, but I did a lot of tests in Go 1.25 (with/without greantestgc experiment turned on) and was not able to get any measureable performance improvements.
Was GTGC worth the effort?
The only difference I could see is that when a garbage collection runs (in the background) the runtime was grabbing fewer goroutines to do it’s work.
However, now it’s running on production servers I have seen a reduction in the size of the maximum spikes in latency of requests.
GC Latency Issues
About 10 years ago (Go 1.7?) Go’s GC STW (stop the world) pauses were reduced to sub-msec times. This should have meant that you could stop worrying about GC pauses, but there were still problems. First a badly behaved goroutine would (under, admittedly rare, circumstances) not yield to the goroutine scheduler – this was cleverly addressed about 5 years ago (Go 1.14?) using a preemptive goroutine scheduler.
But there was still an issue that I (and others) noticed – the software would slow down when the GC goroutines were doing there cleanup in the background. The effect was much larger than would be expected due to the reduced number of goroutines available to the rest of the code.
I never heard of anyone getting to the very bottom of this. I suspect there were a lot of interacting reasons. But someone must have been working on it as later releases reduced the problem, at least for me.
My suspicion was always that it was mainly a cache problem since the heap would allocate memory all over the place. It seems that the GTGC has reduced latency problems further by allocating memory in a more cache friendly way.
Heap Address Randomisation
Speaking of the heap – the runtime no longer uses a fixed heap base address. This makes it harder for external malware to know where to look.
Tools
Go fix
Go fix is a tool that you can use to automatically upgrade your code when a new Go release makes language and library changes.
In the early days of Go (before Go 1.0) there were often breaking language changes and go fix was a big help to update code.
Nowadays, go fix does not get a lot of use, due to Go’s backward compatibility. However, it can be useful to convert code to use new features. This can make your code simpler, faster, or simply in line with the latest practices.
As it’s pretty ancient, and not used a lot, my impression was that go fix was fading away. I was surprised to find it has been completely rewritten to use the new Go analysis
Goroutine Leak Profiler
Go makes concurrency easy, sometimes too easy, and a lot of Go software has undiagnosed goroutine leak(s). This is one reason I was very happy to find that synctest package can expose this issue (see Synctest - Detecting Goroutine Leaks). Unfortunately, you can only use synctest in tests now but luckilly there is now a new leak profiler.
Conclusion
TODO
Comments