Deep Polynomial Neural Networks

PAPER REVIEW

Deep Polynomial Neural Networks

파송송 2022. 9. 12. 20:41

728x90

Deep Polynomial Neural Networks

https://arxiv.org/pdf/2006.13026.pdf

Abstract

Deep Convolutional Neural Networks
- discriminative Learning, generative을 위해 쓰임
- DCNN의 성공 요인은 구성 요소를 신중하게 선택한 것
proposs \( \Pi \)-Net
- 다항식 확장을 기반으로 하는 a new class of function approximators
- output은 input의 high-order polynomial(고차다항식)이다
  - high-order tensors (high-order polynomial)의 parameters는 factors sharing의 a collective tensor factorization(텐서 인수분해) 로 구해짐
\( \Pi \)-Net use 3개의 tensor decompositions(텐서 분해) -> tensor factorization
- tensor decomposition의 advantage
- parameters 수를 줄임
- hierarchical neurak networks를 효율적으로 구현함
\( \Pi \)-Net 은 non-linear activation functions을 사용하지 않고도 good results를 냄, activation function 사용시 image generation, face verification, 3D mesh representation 등에서 SOTA를 달성함

1. Introduction

DCNN
- architectural pipelines 선택도 중요하지만 core structure는 operators의 구성 기능에 에 의존함
- theoretical studies(이론적 연구), empirical studies(경험적 연구) 모두 구조적 한계가 있음
- theoretical and empirical results는 multiplicative interactions이 approximated(근사한) the class of functions를 확장함
  - 이를 motivation으로 \( \Pi \)-Net을 연구함

\( \Pi \)-Net

\( G(z) : \mathbb{R} ^{d} \longrightarrow \mathbb{R} ^{o} \)
- \( G(x) \) : high-order multivariate polymial function
- \( z \) : high-order tensors input
- \( \mathbb{R} ^{o} \) : high-order tensors parameter -> 잘 모르겠음
원래라면 parameter의 수(input의 high-order correlations를 수용해야하는 수)는 multivaruate polynomial의 차수에 따라 explode 함
- 이를 해결하기 위해 polynomial parameter tensor를 tensor factorization(텐서 인수분해) 함
introduce concept
- higher-order expansions for both generative and discriminative networks
- DCNN의 generative and discriminative networks에서 higher-order expansions를 추가한 것
- improvements 3가지
  1. 우리의 concept의 new intuitions는 새로운 모델 고안을 도움
  2. challenging task을 할 수 있게 experimental results를 extend한다
  3. \( \Pi \)-Net의 challenging task에 쓰이는 것에 대해 토론 할 것임
The paper contributions
- output이 input의 high-order polynomial이라는 이론 도입
- explode를 해결하기 위해 tensor factorization with shared factors 도입
- the proposed architectures는 generarive models(GAN 등), discriminative networks에 적용됨
- non-linear activation function기능이 없는 high-dimensional distributions를 학습시키는데 사용됨
- \( \Pi \)-Net 를 사용하여 다양한 task에서 SOTA를 이뤘음

2. Related work and notation

Deep neural networks는 impressive results를 가지고 광범위하게 적용됨
hardware, ML libraries, optimizer, regularization 등은 지속적으로 발전했으나 각 layer에 대한 paradigm은 변하지 않았음
기존의 paradigm of layer
- each layer는 linear transformarion과 요소 별로 activation function이 있음
hierarchical models은 generative models에서 stellar performance를 보여줌

Polynomial networks

polynomial relationships는 2가지 network에서 연구됨
- hard-coded을 통한 self organizing networks
- pi-sigma networks
learnable polynomial features의 아이디어는 GMDH (Group Method of Data Handling)에서 나옴
- GMDH :두개의 predefined input elements 사이의 quadratic correlations를 포착하는 partial descriptors(부분 설명자)를 학습함
- 이전의 higher-order polynomials는 더 많은 input elements가 필요함, partial descriptor는 미리 정의가 되어있기 때문에 기존의 방법으로는 high dimensional data로 확장이 불가능했음
pi-sigma network, a single hidden layer
- Multiple affine transformations of the data를 학습함
- 모든 features를 곱해서 output을 얻음
SPSNN (sigma-pi sigma neural network) : pi-sigma networks is extended
- output을 얻기 위해 each pi-sigma network를 더함
- high-dimensional signals에서는 성능이 좋지는 못함
- 3개의 입출력이 있는 signals에서만 사용 가능함
ConvACs (Convolutional arithmetic circuits)
- arithmetic circuits(산술 회로)는 2가지 nodes를 가짐
  - sum nodes (weighted sum of their inputs)
  - product nodes (computing the product of their inputs)
  - 이 2개의 nodes는 polynomial expansion하기 충분함
참고 논문에서는 DCNN의 depth efficiency의 특성에 focus on 하기 위해 polynomial expansion을 사용함
- CP decomposition : shallow convolutional network에서 weights를 factorize하는데 사용
- hierarchical Tucker decomposition : deep network에서 weights를 factorize하는데 사용
This paper에서는 target function을 approximate 하기 위해 polynomial을 expansion함
Recently multiplicative interactions를 통해 우수한 성능을 내는 연구가 급증하고 있음

2.1 Notation

3. Method

3.1 Single polynomial

parameter의 tensor decomposition은 매개변수를 줄이고 신경망을 구현하기에 자연스러움

728x90