CUDA-based implementations of Softassign and EM-ICP
Toru Tamaki, Miho Abe, Bisser Raytchev, Kazufumi Kaneda, Marcos Slomp
Hiroshima University, Japan
Contact address:


This demonstration provides CUDA-based implementations of two 3D point sets registration algorithms: Softassign [Gold 1998] and EM-ICP [Granger 2002]. Both algorithms are known for being time demanding, even on modern multi-core CPUs. Our GPU-based implementations vastly outperform CPU ones. For instance, our CUDA EM-ICP aligns 5000 points in less than 7 seconds on a GeForce 8800GT, while the same implementation in OpenMP on an Intel Core 2 Quad would take 7 minutes.

Registration (alignment) of 3D point sets is one of the most important problems in computer vision and several methods have been developed over the last two decades. The widely used Iterative Closest Point (ICP) algorithm [Besl 1992] provides quick registration, but requires a good initial alignment in order to prevent local minima and produce a plausible match. Softassign and EM-ICP represent efforts to overcome such limitations: instead of looking for "hard" correspondences between points (each point in one of the sets has to uniquely map to another point in the other set), the latter two algorithms focus on "soft" correspondences (each point in one of the sets corresponds somehow to every point in the other set by some weight). Although these algorithms can handle any initial arrangement, their associated computational cost has been preventing their practical usefulness even for moderately large number of points.

Recent advances in graphics hardware and software have motivated us to implement Softassign and EM-ICP on a GPU and evaluate their corresponding behavior and performance. Our contribution is twofold: we introduce the GPU implementations and also demonstrate that most steps of these algorithms are GPU-friendly, consisting of either vector-matrix multiplications or element-wise operations. The registration process is iteratively tracked in a window with interactive manipulation.

Softassign and EM-ICP on GPU from Toru Tamaki
SSII2012 2D&3Dレジストレーション ~画像と3次元点群の合わせ方~ 第1部 from Toru Tamaki

Connection to the main conference

This demonstration deals with 3D geometry, making it well-suited to the main conference, workshops, and tutorials:

This demonstration is also related to the following workshops:


The videos presented below consist of the registration of sets of 1000, 3000 and 5000 points, sequentially spaced in each video. The starting point-set (white) is progressively matched (cyan) to the target one (yellow).

All movies were captured in real-time via recordmydesktop on a regular desktop PC running Fedora 11, equipped with an Intel Core 2 Quad Q9650 (max 4 threads) with 8GB RAM and a single NVIDIA GeForce 8800GT with 512MB VRAM.

Registration of 3D point sets with the same number of points


ICP registrations finish very quickly (usually less than 10 milliseconds), but fail to provide a satisfactory alignment. This video is presented solely for comparison purposes with Softassign and EM-ICP.


EM-ICP provides a much more reliable match than the regular ICP, but also requires much more time: about 3 seconds for 1000 points and 30 seconds for 3000 points registration. For 5000 points, registration was stopped manually before proper convergence due to the extensive computation time.

Softassign (GPU/CUDA)

Softassign produces a very satisfactory match, but it is still vary time demanding even in a GPU-driven approach: about 6 seconds for 1000 points and over 30 seconds for 3000 points. As in EM-ICP/CPU, the registration was aborted for the 5000 points case.


EM-ICP maps particularly well to GPU and is significantly faster than its equivalent OpenMP/CPU implementation and our GPU-based Softassign: a third of a second for 1000 points, 2.98 seconds for 3000 points, and 6.8 seconds for 5000 points, which is about 60 times faster than those of EM-ICP/CPU.

Registration of 3D point sets of different densities using EM-ICP (GPU/CUDA)

The dense set contains 40097 points, while the sparse one consists of 1459 points. Our CUDA implementation quickly and accurately aligns the two sets, notwithstanding the wide difference in density among the two sets Two setups are presented in the video: the dense set is aligned to the sparse one and vice-versa. Results slightly differ due to the one-way correspondence condition inherited from the EM-ICP approach.

Building a bunny in one minute by EM-ICP (GPU/CUDA)

Reconstruction of the bunny geometry from 18 disperse point sets (taken at view points of 0,20,...,340 degrees) in about a minute on GPU by means of our CUDA-based EM-ICP implementation. Aligned pairs are incrementally merged and later assembled in a final registration.


  • ICP
    • Paul J. Besl, Neil D. McKay, "A Method for Registration of 3-D Shapes," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14, No. 2, pp. 239-256, 1992. online article
  • Softassign
    • Steven Gold, Anand Rangarajan, Chien-Ping Lu, Suguna Pappu, Eric Mjolsness, "New algorithms for 2D and 3D point matching: pose estimation and correspondence," Pattern Recognition, Vol. 31, No. 8, pp. 1019-1031, 1998. online article
  • EM-ICP
    • Sebastien Granger, Xavier Pennec, "Multi-scale EM-ICP: A Fast and Robust Approach for Surface Registration," in Proc. of 7th European Conference on Computer Vision (ECCV2002), Vol. 4, pp. 69-73, 2002. online article
    • Guillaume Dewaele, Frederic Devernay, Radu Horaud, "Hand motion from 3D point trajectories and a smooth surface model," in Proc. of 8th European Conference on Computer Vision (ECCV2004), Vol. 1, pp. 495-507, 2004. online article
    • Yonghuai Liu, "Automatic registration of overlapping 3D point clouds using closest points," Image and Vision Computing, Vol. 24, No. 7, pp. 762-781, 2006. online article
  • Misc