Key points are not available for this paper at this time.
Vision Transformer (ViT) is one of the neural network architectures applied to image processing based on Transformer. ViT has achieved State-Of-The-Art performances on various computer vision tasks. This study attempts to improve Input Layer of ViT by changing the way of positional embedding. We propose ViT with pre-positional embedding that adds constants to each pixel before dividing input images into patches. This method assumes the following image features: vertically asymmetric, horizontally symmetric, and distribution of similar features in an image extending concentrically from the center of the image. Experimental results demonstrate that the proposed method achieves the same image recognition accuracy as the conventional method with positional embedding while reducing the number of training parameters.
Building similarity graph...
Analyzing shared references across papers
Loading...
Eguchi et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e6bd48b6db64358763dea6 — DOI: https://doi.org/10.1117/12.3018012
Takuro Eguchi
Yoshimitsu Kuroki
National Institute of Technology, Kurume College
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: