the large variation of poses, background, illumination, camera conditions and view angle changes
Because the body parts such as faces, hands and feet are unstable as the view angle changes, the CNN tends to focus on the main body part and the other descriminative body parts, i.e., some attentive local features are consequently suppressed.
Pose-based works seek to localize different body parts and align their associated features.
Part-based works use coarse partitions or attention selection network to improve feature learning.
Pose-based networks usually require additional body pose or segment information.
These networks are designed using specific partition mechanisms, such as a horizontal partition, which is fit for person re-ID but hard to be generalized to other metric learning tasks.
The global branch encodes the global feature representations while the feature dropping branch learns local detailed features.
Specifically, Batch DropBlock randomly drops the same region of all the feature maps, namely the same semantic body parts, in a batch during training and reinforces the attentive feature learning of local regions, i.e., the remaining parts.
Batch DropBlock can also be adopted in different metric learning schemes.
Metric Learning Problem:
aims to find embedding features for input images in order to measure their semantic similarity
such as contrastive loss, triplet loss, lifted structure loss, quadruplet loss, histogram loss
Global max pooling (GMP) encourages the network to identify comparatively weak salient features after the most descriminative part is dropped, i.e., encourages the network to strength the weak features.
Without the stack of convolution layers on feature map T, the global average pooling layer and the global max pooling layer would be applied simultaneously on T, making the network hard to converge.
The Batch DropBlock Layer is parameter free and will not increase the network size.
The Batch DropBlock Layer can be easily adopted in other metric learning tasks beyond person re-ID.
The triplet loss is a vital part of BDB Network since the Batch DropBlock layer has effect only when considering relationship between images.
The Batch DropBlock hyper-parameters are tunable without changing the network structure for different tasks, such as image retrieval problem.
BDB Network requires input images to be roughly aligned.
BDB Network can still benefit from image erasing data augmentation methods such as Cutout  and Random Erasing, because the BDB Network contains a global branch which sees the complete feature map.
Comparison with DropBlock :
Batch DropBlock is an attentive feature learning module for metric learning tasks while DropBlock is a regularization method for classification tasks.
Batch DropBlock drops the same block for a batch of images during a single iteration, while DropBlock erases randomly across different images.
 Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. [link]
 Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A regularization method for convolutional networks. arXiv:1810.12890, 2018. [link]