Gastric Histopathology Image Classification by Transformer, GasHis-Transformer

CNN and Transformer Fusion for Gastric Histopathology Image Classification with Few False Negatives and False Positive

Published in

Geek Culture

3 min readJun 21, 2021

GasHis-Transformer is a model for realizing gastric histopathological image classification (GHIC), which automatically classifies microscopic images of the stomach into normal and abnormal cases in gastric cancer diagnosis, as shown in the figure.

point

GasHis-Transformer is a multi-scale image classification model that combines the best features of Vision Transformer (ViT) and CNN, where ViT is good for global information and CNN is good for local information.

GasHis-Transformer consists of two important modules, Global Information Module ( GIM ) and Local Information Module ( LIM ), as shown in the figure below.

GasHisTransformer has high classification performance on the test data of gastric histopathology dataset, with estimate precision, recall, F1-score, and accuracy of 98.0%, 100.0%, 96.0%, and 98.0%, respectively.

Implementation

GasHisTransformer consists of two modules: Global Information Module (GIM) and Local Information Module (LIM). The global information module (GIM) extracts global information from gastric histopathological images based on BoTNet-50. The local information module (LIM) is based on the parallel structure of Inception-V3 and obtains the local information of the histopathological images of the stomach in a multi-scale manner. The input size of the LIM module (Inception-V3) is changed from 299x299 to 224x224 to match the feature measurements of the GIM module and the LIM module. 2048-dimensional features are fused by GIM and LIM, respectively, and 4096-dimensional features are passed through the FC layer and Softmax to diagnose gastric cancer.

Experimental Results

From the experimental results, it is very interesting to note that they successfully exploited the workings of two different networks: Vision Transformer (ViT), which is strong in global information, and CNN (ResNet-50, BotNet-50), which is strong in local information.

In addition, the following confusion matrix shows that the convergence and generalization ability of GasHisTransformer is very high, which is sufficient for practical use. In the validation set, 206 abnormal images and 209 normal images were classified into the correct category, 4 abnormal images were misclassified as normal (false positives), and 1 normal image was misclassified as abnormal (false negatives). In the test set, 403 abnormal images and 420 normal images were classified into the correct category. Only 17 abnormal images were misclassified as normal (false positives) and there were no false negatives.

References

[GasHis-Transformer] Haoyuan Chen., Chen Li., Xiaoyan Li., Ge Wang., Weiming Hu., Yixin Li., Wanli Liu., Changhao Sun., Yudong Yao., Yueyang Teng., Marcin Grzegorzek., 2021. GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification. arXiv preprint arXiv:2104.14528 .

[BotNet] Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A., 2021. Bottleneck transformers for visual recognition. arXiv preprint arXiv:2101.11605 .

[Inception-V3] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.

[ViT] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .