Pointnet Introduction

目前二维深度学习取得了很大的进步并且应用范围越来越广，随着三维设备的发展，三维深度学习得到了很大的关注。

最近接触了三维深度学习方面的研究，从pointnet入手，对此有了一点点了解希望记录下来并分享，若有误希望指正~持续更新

以下所有的解读基于点云分类。

一、三维深度学习简介

二、点云存在的问题

三、pointnet网络结构详解

四、pointnet代码详解

![Deep Learning on Point Clouds. To take advantage of the strong representation capability of classic CNNs, a 3D point cloud is first projected into multiview rendering im- ages in 33, 31, 27, 9 on which the well-designed CNNs for 2D images can be applied. But part of contextual infor- mation in point cloud is left behind during the projection process. Another popular representation for point cloud data is voxelized volumes. The works of [ 37, 23, 12, 30] convert point cloud data into regular volumetric occupancy grids, then train 3D CNNs or the varieties to perform voxel- level predictions. A drawback of volumetric representations is being both computationally and memory intensive, due to the sparsity of point clouds and the heavy computation of 3D convolutions. Therefore those methods are limited to deal with large-scale 3D scenes. To process raw point cloud directly, PointNet [26] is proposed to yield point-level predictions, achieving strong performance on 3D classifica- tion and segmentation tasks. The following works Point- Net++ [28], RSNet [1 3], DGCNN [36] and PointCNN [1 7] further focus on exploring the local context and hierarchi- cal learning architectures. In this work, we build a novel framework to associatively segment instances and seman- tics in point clouds, and demonstrate that it is effective and general on different backbone networks.

一、三维深度学习简介

多视角（multi-view）：通过多视角二维图片组合为三维物体，此方法将传统CNN应用于多张二维视角的图片，特征被view pooling procedure聚合起来形成三维物体；

多视图方法处理点云分割问题论文：

H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multiview

convolutional neural networks for 3d shape recognition.

In Proc. IEEE Int. Conf. Comp. Vis., 2015. 3

B. Shi, S. Bai, Z. Zhou, and X. Bai. DeepPano: Deep

panoramic representation for 3-d shape recognition. IEEE

Signal Processing Letters, 2015. 3

C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J.

Guibas. Volumetric and multi-view cnns for object classification

on 3d data. In Proc. IEEE Conf. Comp. Vis. Patt.

Recogn., 2016. 3

J. Guerry, A. Boulch, B. Le Saux, J. Moras, A. Plyer, and

D. Filliat. Snapnet-r: Consistent 3d multi-view semantic labeling

for robotics. In Proc. Workshop of Int. Conf. Computer

Vision, 2017. 3

体素（volumetric）：通过将物体表现为空间中的体素进行类似于二维的三维卷积（例如，卷积核大小为5x5x5），是规律化的并且易于类比二维的，但同时因为多了一个维度出来，时间和空间复杂度都非常高，目前已经不是主流的方法了；

体素方法进行点云分割:

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and

J. Xiao. 3d shapenets: A deep representation for volumetric

shapes. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.

D. Maturana and S. Scherer. Voxnet: A 3d convolutional

neural network for real-time object recognition. In Proc.

IEEE/RSJ Int. Conf. Intelligent Robots & Systems, 2015. 3

J. Huang and S. You. Point cloud labeling using 3d convolutional

neural network. In Proc. Int. Conf. Patt. Recogn.,

\2016. 3

G. Riegler, A. O. Ulusoy, and A. Geiger. Octnet: Learning

deep 3d representations at high resolutions. In Proc. IEEE

Conf. Comp. Vis. Patt. Recogn., 2017. 3

点云（point clouds）：直接将三维点云抛入网络进行训练，数据量小。主要任务有分类、分割以及大场景下语义分割；

PointNet:

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep

learning on point sets for 3d classification and segmentation.

In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017. 2, 3, 4,

5, 8

PointNet++:

C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep

hierarchical feature learning on point sets in a metric space.

In Proc. Advances in Neural Inf. Process. Syst., 2017. 2, 3, 8

Rsnet:

Q. Huang, W. Wang, and U. Neumann. Recurrent slice networks

for 3d segmentation of point clouds. In Proc. IEEE

Conf. Comp. Vis. Patt. Recogn., 2018. 3

DGCNN:

Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and

J. M. Solomon. Dynamic graph cnn for learning on point

clouds. arXiv: Comp. Res. Repository, 2018. 3, 4

PointCNN:

Y. Li, R. Bu, M. Sun, and B. Chen. Pointcnn. arXiv: Comp.

Res. Repository, 2018. 3

非欧式（manifold，graph）：在流形或图的结构上进行卷积，三维点云可以表现为mesh结构，可以通过点对之间临接关系表现为图的结构。流形表达比较抽象，用到拉普拉斯特征什么的，我也不太懂……

二、点云存在的问题

无序性：点云本质上是一长串点（nx3矩阵，其中n是点数）。在几何上，点的顺序不影响它在空间中对整体形状的表示，例如，相同的点云可以由两个完全不同的矩阵表示。如下图左边所示：

我们希望得到的效果如下图右边：N代表点云个数，D代表每个点的特征维度。不论点云顺序怎样，希望得到相同的特征提取结果。

我们知道，网络的一般结构是：提特征-特征映射-特征图压缩（降维）-全连接。

下图中x代表点云中某个点，h代表特征提取层，g叫做对称方法，r代表更高维特征提取，最后接一个softmax分类。g可以是maxpooling或sumpooling，也就是说，最后的D维特征对每一维都选取N个点中对应的最大特征值或特征值总和，这样就可以通过g来解决无序性问题。pointnet采用了max-pooling策略。

2.旋转性：相同的点云在空间中经过一定的刚性变化（旋转或平移），坐标发生变化，如下图所示：

我们希望不论点云在怎样的坐标系下呈现，网络都能正确的识别出。这个问题可以通过STN（spacial transform netw）来解决。二维的变换方法可以参考这里，三维不太一样的是点云是一个不规则的结构（无序，无网格），不需要重采样的过程。pointnet通过学习一个矩阵来达到对目标最有效的变换。

三、pointnet网络结构详解

先来看网络的两个亮点：

空间变换网络解决旋转问题：三维的STN可以通过学习点云本身的位姿信息学习到一个最有利于网络进行分类或分割的DxD旋转矩阵（D代表特征维度，pointnet中D采用3和64）。至于其中的原理，我的理解是，通过控制最后的loss来对变换矩阵进行调整，pointnet并不关心最后真正做了什么变换，只要有利于最后的结果都可以。pointnet采用了两次STN，第一次input transform是对空间中点云进行调整，直观上理解是旋转出一个更有利于分类或分割的角度，比如把物体转到正面；第二次feature transform是对提取出的64维特征进行对齐，即在特征层面对点云进行变换。

maxpooling解决无序性问题：网络对每个点进行了一定程度的特征提取之后，maxpooling可以对点云的整体提取出global feature。

其中，mlp是通过共享权重的卷积实现的，第一层卷积核大小是1x3（因为每个点的维度是xyz），之后的每一层卷积核大小都是1x1。即特征提取层只是把每个点连接起来而已。经过两个空间变换网络和两个mlp之后，对每一个点提取1024维特征，经过maxpool变成1x1024的全局特征。再经过一个mlp（代码中运用全连接）得到k个score。分类网络最后接的loss是softmax。

-——————–

作者：痛并快乐着呦西

来源：CSDN

原文：https://blog.csdn.net/qq_15332903/article/details/80224387