They seem to have the same purpose, which is to fuse features of different scales so that a set of feature maps contains both high semantics and high resolution.
Want to know the difference between pooling and convolution in CV
They seem to have the same purpose, which is to fuse features of different scales so that a set of feature maps contains both high semantics and high resolution.
Want to know the difference between pooling and convolution in CV