AlphaFold3 protein_dataset模块 ProteinDataset
类 _patch 方法的主要目的是围绕锚点残基(anchor residues)裁剪蛋白质数据,提取一个局部补丁(patch)作为模型输入。
源代码:
def _patch(self, data):
"""Cut the data around the anchor residues."""
# adapted from diffab
pos_alpha = data["X"][:, 2]
if self.mask_whole_chains:
mask_ = (data["mask"] * data["masked_res"]).bool()
anchor_points = pos_alpha[mask_].mean(0).unsqueeze(0)
anchor_ind = []
else:
anchor_ind = self.get_anchor_ind(data["masked_res"], data["mask"])
anchor_points = torch.stack([pos_alpha[ind] for ind in anchor_ind], dim=0)
dist_anchor = torch.cdist(pos_alpha, anchor_points, p=2).min(dim=1)[0] # (L, )
dist_anchor[~data["mask"].bool()] = float("+inf")
initial_patch_idx = torch.topk(
dist_anchor,
k=min(self.initial_patch_size, dist_anchor.size(0)),
largest=False,
sorted=True,
)[
1
] # (initial_patch_size, )
patch_mask = data["masked_res"].bool().clone()
patch_mask[[int(x) for x in anchor_ind]] = True
patch_mask[initial_patch_idx] = True
if self.sabdab:
antibody_mask = self._get_antibody_mask(data)
antigen_mask = ~antibody_mask
dist_anchor_antigen = dist_anchor.masked_fill(