Building Go binaries with TensorFlow on arm64: Summary of steps

Photo by Roman Mager on Unsplash

TensorFlow is a very powerful computing platform, particularly for working with matrices, linear algebra and machine learning. Such computation can be very verbose and difficult to express in low level languages such as C and Go and we need an ability to design the computation in high level languages. TensorFlow, with its Python interface, allows us to express complex computations and at the same time provides the path for seamless integration with low level languages for production use cases.

The real beauty of TensorFlow is the ability to export a computation graph that is language agnostic. This makes it possible to express complex computation sequences in Python and then export those steps to be integrated in production via other coding languages such as Go, Java etc.

This post is a quick summary for steps I needed to take in order to build TensorFlow for arm64 architecture and then integrate a computation graph using Go programming language. The final goal is to achieve a binary that can be easily packaged in a container. Let’s first define a sample computation problem:

0.6047 0.9405 0.6646 0.4377 0.4246 
0.6868 0.0656 0.1565 0.0970 0.3009
0.5152 0.8136 0.2143 0.3807 0.3181
0.4689 0.2830 0.2931 0.6791 0.2186
0.2032 0.3609 0.5707 0.8625 0.2931

Let’s say we have a matrix of data as shown above and we need to compute it’s inverse. It is trivial to perform this computation in Python using TensorFlow

import tensorflow as tfx = [
[0.6047, 0.9405, 0.6646, 0.4377, 0.4246],
[0.6868, 0.0656, 0.1565, 0.0970, 0.3009],
[0.5152, 0.8136, 0.2143, 0.3807, 0.3181],
[0.4689, 0.2830, 0.2931, 0.6791, 0.2186],
[0.2032, 0.3609, 0.5707, 0.8625, 0.2931],
y = tf.linalg.inv(x)print(y)

which outputs the inverse of x as follows:

[[ 1.3268485 -0.13446523 -1.5245299 3.4717727 -2.7188532 ]
[ 0.56479335 -1.2117027 0.9216604 0.4177374 -0.88607126]
[ 2.958812 -0.5985456 -3.399865 1.1653783 -0.8511223 ]
[-1.0250404 -0.5553139 0.57051307 1.3094081 0.45926073]
[-4.360103 4.384766 4.863161 -9.043573 6.693542 ]], shape=(5, 5), dtype=float32)

Now that we have the hello-world in Python working, the problem is to be able to do the same in Go using TensorFlow dynamic library and it’s C API. Let’s fast forward to a point where are able to build such a binary. Inspecting the binary (saymain) highlights its dependency on various system libraries and the TensorFlow dynamic library at /usr/local/lib/

└─ $ ▶ ldd main (0x00007ffe9d769000) => /usr/local/lib/ (0x00007f0e1e6ea000) => /lib64/ (0x00007f0e1e6b9000) => /lib64/ (0x00007f0e1e4ea000) => /usr/local/lib/ (0x00007f0e1c6a9000) => /lib64/ (0x00007f0e1c565000) => /lib64/ (0x00007f0e1c55e000) => /lib64/ (0x00007f0e1c551000) => /lib64/ (0x00007f0e1c332000) => /lib64/ (0x00007f0e1c317000)
/lib64/ (0x00007f0e2cbae000)

Executing the binary prints the original data followed by it’s inverse:

└─ $ ▶ ./main
input data:
0.6047 0.9405 0.6646 0.4377 0.4246
0.6868 0.0656 0.1565 0.0970 0.3009
0.5152 0.8136 0.2143 0.3807 0.3181
0.4689 0.2830 0.2931 0.6791 0.2186
0.2032 0.3609 0.5707 0.8625 0.2931
1.3263 -0.1338 -1.5241 3.4710 -2.7183
0.5647 -1.2114 0.9217 0.4173 -0.8857
2.9600 -0.5993 -3.4010 1.1674 -0.8529
-1.0257 -0.5548 0.5711 1.3083 0.4602
-4.3596 4.3834 4.8627 -9.0423 6.6930

And that worked on arm64 allowing us to use TensorFlow on RaspberryPi supporting 64-bit OS. In order to build this binary, we need three things:

  • TensorFlow dynamic library and it’s C API
  • A Go library to interface with TensorFlow
  • A Go wrapper code for data input and output (shown below)

Building TensorFlow library for arm64

TensorFlow C library is available as a tarball for various operating systems, however, arm64 is currently not supported. It fairly easy to build the library.

Start with a large enough compute instance on the cloud (or you can do this on your laptop if it is powerful enough). Building the library takes some time and compiles thousands of targets, so I found it to be easy to configure a cloud virtual machine for building. The configuration of the virtual machine looked as follows:

  • 16 vCPU, 64GB memory
  • Ubuntu 21.04 OS with 100GB disk
  • Bazel v3.7.2
  • TensorFlow v2.5.0
  • Python 3.8.8

bazel is the build system for TensorFlow and there is a strict dependency of its version to a particular version of TensorFlow.

Install build tools:

sudo apt-get update && \
sudo apt-get install -y zip build-essential

Download bazel:

chmod 755
sudo ./

Download Python:

chmod 755
exit # and login again

At this point python is setup, but might not be activated, so it’s good to exit and log back in and confirm the version:

$ which python
$ python --version
Python 3.8.8

Download TensorFlow code:

git clone
cd tensorflow
git checkout tags/v2.5.0

configure build params:

./configure #answering defaults for most

The build can now be started for the default arch as follows:

bazel build -c opt //tensorflow/tools/lib_package:libtensorflow

To build for arm64

bazel build -c opt --config=elinux_aarch64 //tensorflow/tools/lib_package:libtensorflow

This can take some time but should result in a tarball at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz

At this point we are now ready to integrate with Go code, but first we need to install the library on the target machine, which is a Raspberry Pi in my case

# on a Raspberry Pi running 64-bit os
sudo tar -C /usr/local -xzf libtensorflow.tar.gz
sudo ldconfig
export LD_LIBRARY_PATH="/usr/local/lib"

Preparing Computation Graph

Now that the TensorFlow library is installed, we can start to prepare a computation graph in Python and export it for integration with Go. The computation graph is essentially a declarative manifest that defines inputs, outputs and operations on the data. Read more about the graphs here.

In our case, we need to express an idea of matrix inversion in the graph. To do this we start with following Python code:

import tensorflow as tf# define python function over input array
# reshape array into matrix
def inv(x, dim):
y = tf.reshape(x, shape=dim)
return tf.linalg.inv(y)
# wrap python function
tfFuncInv = tf.function(inv)
# get graph implying that input data would be of
# arbitrary length and will be reshaped into
# a matrix
g = tfFuncInv.get_concrete_function(
# print graph manifest to visually inspect it
# finally export the graph as a protobuf file, "./", "graph.pb", as_text=False)

The file, graph.pb now contains the workflow description of what needs to happen to the input data. We can now start to integrate it and feed it data dynamically.

package mainimport (
_ "embed"
tf ""
//go:embed graph.pb
var def []byte
func main() {
// import the graph
g := tf.NewGraph()
if err := g.Import(def, ""); err != nil {
// print available operations in the graph
for i, operation := range g.Operations() {
fmt.Println(i, operation.Name())
data := make([]float64, 25)
for i := range data {
data[i] = rand.Float64()
// prepare input data
x, err := tf.NewTensor(data)
if err != nil {
// prepare shape of the matrix
shape, err := tf.NewTensor([]int32{5, 5})
if err != nil {
// prepare data feed specifying names of the operation
feeds := map[tf.Output]*tf.Tensor{
g.Operation("x").Output(0): x,
g.Operation("dim").Output(0): shape,
// prepare data outputs from tensorflow run
fetches := []tf.Output{
// start new session
sess, err := tf.NewSession(
if err != nil {
defer sess.Close()
// run session feeding feeds and fetching fetches
out, err := sess.Run(feeds, fetches, nil)
if err != nil {
// reshape output data as vector
y := out[0]
if err := y.Reshape([]int64{25}); err != nil {
yRaw, ok := y.Value().([]float64)
if !ok {
log.Fatal("type assertion error")
var k int// print input
fmt.Println("input data:")
k = 0
for i := 0; i < 5; i++ {
for j := 0; j < 5; j++ {
fmt.Printf("%.4f ", data[k])
// print output
k = 0
for i := 0; i < 5; i++ {
for j := 0; j < 5; j++ {
fmt.Printf("%.4f ", yRaw[k])

The code can now be built using go build main.go and it will run as shown earlier in this post.

Have fun!




Software engineer and entrepreneur currently building Kubernetes infrastructure and cloud native stack for edge/IoT and ML workflows.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Build your own private Proof of Authority [PoA] Ethereum Blockchain (Tutorial)

Writer of the Week: Harendra Verma

Create Ubuntu Server 20.04 on Pi 4

Art of searching the web for soft-dev 1

Day 13, Hour 147.

End-to-end testing with Selenium, Gradle, JUnit

How to call UDF over the dataset in spark java.

Hacking Away @Google

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Saurabh Deoras

Saurabh Deoras

Software engineer and entrepreneur currently building Kubernetes infrastructure and cloud native stack for edge/IoT and ML workflows.

More from Medium

Embedded System

The Shellshock vulnerability — #20

How to Install Docker with NVIDIA support?

Dockerizing CUDA apps 101