Embedded Development for NVIDIA Jetson TX2

YND Team

Published on:
December 11, 2019

One of the core challenges when developing AI/ML embedded processing systems is the ability to balance processing power, portability and development speed. While traditionally AI/ML problems are being solved in the Cloud, some of our special projects involving live video processing and computer vision call for an on-site, low-latent & power-efficient solution.

The recently released NVIDIA Jetson TX2 supercomputer board has been a game-changer in the selection of platforms available for our choice when dealing with this topic. The TX2 remains the fastest, most power-efficient embedded AI computing device at the moment.

With the TX2, NVIDIA managed to create a very powerful package which is ideal for our requirements in live video processing, object/face detection and Deep Neural Network analysis. With 4 standard CPU cores (+2 auxiliary Denver cores), it delivers enough power for running data-intensive processing + the additional 256 CUDA cores on the GPU give our data scientists something to run their complex deep learning neural networks.

Why Swift?

One practical challenge we encountered when working with NVIDIA hardware (and embedded systems in general), is choosing a programming language/platform to build our end-user solutions. Different dev teams have experience with different technology stacks. Programming languages commonly used by data scientists differ greatly from ones used by application programmers to implement business logic or hardware integrations. The problems we’re trying to solve however, require a close cooperation between the two departments.

That’s why we decided to integrate Swift, the latest programming language from Apple, which took the iPhone/iOS development world by storm since its introduction in 2014. From our perspective, Swift has multiple advantages:

Performance: being a statically-typed, machine-code-compiled language it has performance equal to that of native C/C++. This is critical for applications like video processing and computer vision.
High level programming: Swift is a modern language with multiple features which promotes fast & correct programming. Paired with a relatively complete standard library and useful dependencies, it gives us development efficiency compared to traditional scripting/interpreted languages.
Integration: being a natively-compiled language with its roots in ObjectiveC means we can easily integrate with C/C++ code, which is traditionally used to develop components of ML/Vision processing pipeline.
A wide pool of dev talent: it’s much easier to find talented Swift developers to help us with embedded app development without major on-boarding/tech-switch hurdles.
Future Python integration: Chris Lattner, author of Swift, recently hinted at the upcoming Python integration which will make it even easier for us to integrate upper software layers with low-level science code which is often available in Python. Similar advantages are offered by Kotlin Native, but we didn’t explore that path yet.

Getting Swift to run on TX2 wasn’t obvious and we were clearly pioneers in this area. Below, we’d like to share specific steps & fixes we had to apply to get our full stack (including Swift Package Manager) running on NVIDIA Jetson system-on-chip.

Build issues

“Nor” memory constraint issue in LLVM

The Swift version open-sourced by Apple and openly-developed on Github is mostly tweaked to run on x64_64 platform (Intel 64bit). TX2 is based on aarch64 architecture (ARM 64bit, the less popular variant). The LLVM compiler infrastructure has a rare issue generating aarch64 code in which nor memory constraint cannot be resolved properly (during libdispatch compilation). This is mostly documented in the LLVM Bugzilla but unresolved at the moment of publishing this post.

With a little help from Saleem at Facebook/LLVM.org we were able to replace nor with a more generic m constraint and get working properly. With the patch applied, LLVM is able to build libdispatch required for Core Foundation library to work properly.

	diff --git a/lib/Target/AArch64/AArch64ISelLowering.h b/lib/Target/AArch64/AArch64ISelLowering.h
	index d5154fb..49fca3f 100644
	--- a/lib/Target/AArch64/AArch64ISelLowering.h
	+++ b/lib/Target/AArch64/AArch64ISelLowering.h
	@@ -619,8 +619,11 @@ private:
	SelectionDAG &DAG) const override;

	unsigned getInlineAsmMemConstraint(StringRef ConstraintCode) const override {
	- if (ConstraintCode == "Q")
	+ if (ConstraintCode == "Q") {
	return InlineAsm::Constraint_Q;
	+ } else if (ConstraintCode == "o") {
	+ return InlineAsm::Constraint_m;
	+ }
	// FIXME: clang has code for 'Ump', 'Utf', 'Usa', and 'Ush' but these are
	// followed by llvm_unreachable so we'll leave them unimplemented in
	// the backend for now.

view raw llvm-aarch64-contraint.patch hosted with ❤ by GitHub

ld gold linker issue in NVIDIA Jetpack

NVIDIA ships a slightly older version of binutils which has relocation issues when applying TLSDESC relocations with no TLS segment. Fortunately the issue has been fixed in newer versions and a patch is available. We repacked binutils with the patch applied for Jetpack:

Swift Package Manager hardcoded x86_64 paths

Even though Swift compiler by itself is fully cross-platform ready (after all it’s used to compile all the ARM-based iPhone apps!) the Swift Package Manager is not fully there yet, a lot of things are hard coded. Even though the Package Manager has basic notion of 3 platforms (Darwin, Linux, Android), it makes a hard-assumption that the Linux runs in favor of x86_64-unknown-linux. This can be observed in i.e. Triple.swift source:

	public static let macOS = try! Triple(“x86_64-apple-macosx10.10”)
	public static let linux = try! Triple(“x64_64-unknown-linux”)
	public static let android = try! Triple(“armv7-unknown-linux-androideabi”)

view raw Triple.swift hosted with ❤ by GitHub

We worked around it by adding proper aarch64 enums to the class and replacing x86_64 with aarch64 where necessary. This is a somewhat temporary solution and a proper fix would be to add true cross-platform support to Swift Package Manager (work in progress, as we were able to learn).

	diff --git a/Sources/Build/Triple.swift b/Sources/Build/Triple.swift
	index 1ce1556..3e0b0f0 100644
	--- a/Sources/Build/Triple.swift
	+++ b/Sources/Build/Triple.swift
	@@ -34,6 +34,7 @@ public struct Triple {
	case x86_64
	case armv7
	case s390x
	+ case aarch64
	}

	public enum Vendor: String {
	@@ -104,7 +105,7 @@ public struct Triple {
	}

	public static let macOS = try! Triple("x86_64-apple-macosx10.10")
	- public static let linux = try! Triple("x86_64-unknown-linux")
	+ public static let linux = try! Triple("aarch64-unknown-linux")
	public static let android = try! Triple("armv7-unknown-linux-androideabi")

	#if os(macOS)
	diff --git a/Utilities/bootstrap b/Utilities/bootstrap
	index ea3d293..8b979ec 100755
	--- a/Utilities/bootstrap
	+++ b/Utilities/bootstrap
	@@ -878,7 +878,7 @@ def main():
	elif platform.processor() == 's390x':
	build_target = "s390x-unknown-linux"
	else:
	- build_target = 'x86_64-unknown-linux'
	+ build_target = 'aarch64-unknown-linux'

	build_path = os.path.join(g_project_root, args.build_path)
	sandbox_path = os.path.join(build_path, ".bootstrap")

view raw swift-package-manager-aarch64.patch hosted with ❤ by GitHub

Variadic arguments issue in NSString Foundation implementation

Finally, the last problem came out during CoreFoundation compilation. Variadic args support on aarch64 seems to be “almost finished” in LLVM/Swift yet lacking the last VaListBuilder bit required for proper conversion when calling to C. This results in build errors in style of:

cannot convert value of type ‘CVaListPointer’ to expected argument type ‘va_list’ (aka ‘__va_list’)

view raw error hosted with ❤ by GitHub

in Swift itself, when working with native C/Swift vaargs conversion. We weren’t able to find a fully reliable solution to this problem yet and we simply asserted all the vaargs-based initialisers for NSString. It seems those constructors are very rarely used and we weren’t able to hit a code path with our usage where the assert would be triggered. But this issue definitely requires fixing. Hopefully this will be provided soon, as there seems to be a lot of active development in this area.

Summary

After resolving all the issues, we were able to get a fully working Swift compiler + Foundation library and a Swift Package Manager being able to build integrated Swift/C code in one go.

This setup allows us to use Swift, a modern, high-level system programming language to quickly develop and iterate forward business/app/model layers of our projects. We also benefit from a wide talent of senior Swift/iOS developers on board to review & improve our code. Plus, we don’t lose the ability to integrate C/C++ industry-standard ML/AI/Data-science tools & libraries that make full use of NVIDIA GPU superpowers.

Hopefully, bootstrapping the existing hurdles of aarch64 can be resolved in the LLVM/Swift repositories soon. In the next weeks/months, we will be working with the upstream developers to make sure this works out of the box without any compromises on all arm64 boards.

We think this represents a top of the line dev ecosystem for development of modern, reliable software embedded AI/ML Vision Processing applications.

This post was written by Michał Dominik Kostrzewa, Head Of Special Projects. In need of some brain power? Reach out to us via hello@ynd.co with your questions about ML/AI projects.

Getting Swift to run on NVIDIA Jetson TX2