Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl #2518

rwightman · 2025-06-13T21:17:33Z

Refactoring patch embed resampling, and adding grid sampling pos embed alternative...

HuggingFaceDocBuilderDev · 2025-06-13T21:19:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

… shapes in batch use full seq len.

stas-sl · 2025-06-18T21:17:34Z

timm/models/naflexvit.py

+            padding_mode='border',
+        ).to(dtype=x.dtype)  # (B, C, H_out, W_out)
+
+        # NOTE if we bring in patch_valid, can explicitly mask padding tokens


Not sure I’m fully getting this part of the code, but if the goal is to explicitly mask padded tokens, can’t you just write like this:

x += pos_embed[bi, :, patch_coord[..., 0], patch_coord[..., 1]] * patch_valid[..., None]

Though, IMHO it shouldn’t really make a difference, as these tokens will be masked in the attention layers anyway.

yes, could be done that way, I was also going to explore whether the different indexing approach had any impact on throughput (in combo with masking having any impact). The masking shouldn't make a difference due to the attention and pooling masks...

doubt there is much point in keeping the more complicated indexing or the masking, validation didn't show any noteworthy throughput difference, but thought i might check at least one train run

stas-sl · 2025-06-18T22:36:53Z

Would you also be interested in having a similar implementation using grid_sample with factorized embeddings? No pressure at all - just thought it might be nice for completeness, since you already have this approach with the learned embeddings. In fact, for 1D interpolation, as there is no antialias=True the results for F.interpolate and F.grid_sample are very close numerically.

It might look like this:

def _apply_factorized_naflex_pos_embed_grid_sample(self, x, patch_coord):
    B, _, C = x.shape[0]
    shapes = patch_coord.amax(dim=1) + 1
    pe_x = rearrange('1 w c -> b c 1 w', self.pos_embed_x, b=B)
    pe_y = rearrange('1 h c -> b c 1 h', self.pos_embed_y, b=B)
    grid_size = shapes.amax(0)
    theta_x = torch.zeros(B, 2, 3, device=x.device)
    theta_x[:, 0, 0] = grid_size[1] / shapes[:, 1]
    theta_x[:, 0, 2] = theta_x[:, 0, 0] - 1
    theta_x[:, 1, 1] = 1
    theta_x[:, 1, 2] = 0
    theta_y = torch.zeros(B, 2, 3, device=x.device)
    theta_y[:, 0, 0] = grid_size[0] / shapes[:, 0]
    theta_y[:, 0, 2] = theta_y[:, 0, 0] - 1
    theta_y[:, 1, 1] = 1
    theta_y[:, 1, 2] = 0
    grid_x = F.affine_grid(theta_x, (B, C, 1, grid_size[1]), align_corners=False)
    grid_y = F.affine_grid(theta_y, (B, C, 1, grid_size[0]), align_corners=False)
    pe_x = F.grid_sample(pe_x, grid_x, mode=self.pos_embed_interp_mode, align_corners=False, padding_mode='border')
    pe_y = F.grid_sample(pe_y, grid_y, mode=self.pos_embed_interp_mode, align_corners=False, padding_mode='border')
    bi = torch.arange(B, device=x.device)[:, None]
    x += pe_x[bi, :, 0, patch_coord[..., 1]] + pe_y[bi, :, 0, patch_coord[..., 0]]

rwightman · 2025-06-19T00:00:06Z

@stas-sl yeah, thanks! I was going to look at the 1d factorized impl as figured it wouldn't much extra work once the 2d is working well.

And yeah, should be less difference. Though worth noting, flipping between grid_sample and interpolate wasn't very different (especially with 'border' for padding) for the models I had pretrained in pytorch with the 2d learned code while testing naflexvit impl. I need to revisit the siglip-2 weights.

…nsistent, add factorized grid_sample and fixed grid size methods.

rwightman · 2025-06-19T17:50:04Z

@stas-sl okay, had some time to go through and clean this up a bit. Added the 1d factorized + grid_sample, and also the missing fixed grid for factorized. Cleaned up consistency of the interface, no more calculation of grid size arrays when using grid_sample (removes a graph break in compiled mode).

stas-sl · 2025-06-19T19:22:14Z

@rwightman, awesome! Thanks for taking the time - everything looks good to me now 👍

Refactor patch resampling based on feedback from https://github.com/s…

4471cad

…tas-sl

rwightman mentioned this pull request Jun 13, 2025

Naflex performance #2514

Closed

rwightman added 4 commits June 17, 2025 14:31

Add grid_sample pos_embed interpolation option

98c6a4a

Slight tweak

c2ba04c

Fix silly shape bug, and fix issue with pad_sequence when none of the…

4e3cba8

… shapes in batch use full seq len.

Fix up grid_sample, did not make sense to rebuild patch coords, duh

ab0c06c

stas-sl reviewed Jun 18, 2025

View reviewed changes

Refactor and cleanup NaFlex pos embed methods. Make interface more co…

41058e9

…nsistent, add factorized grid_sample and fixed grid size methods.

rwightman merged commit 996c149 into main Jun 19, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl #2518

Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl #2518

Uh oh!

rwightman commented Jun 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 13, 2025

Uh oh!

stas-sl Jun 18, 2025

Uh oh!

rwightman Jun 18, 2025 •

edited

Loading

Uh oh!

stas-sl commented Jun 18, 2025

Uh oh!

rwightman commented Jun 19, 2025 •

edited

Loading

Uh oh!

rwightman commented Jun 19, 2025 •

edited

Loading

Uh oh!

stas-sl commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl #2518

Refactor patch and pos embed resampling based on feedback from https://github.com/stas-sl #2518

Uh oh!

Conversation

rwightman commented Jun 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 13, 2025

Uh oh!

stas-sl Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

rwightman Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas-sl commented Jun 18, 2025

Uh oh!

rwightman commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwightman commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas-sl commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

rwightman Jun 18, 2025 •

edited

Loading

rwightman commented Jun 19, 2025 •

edited

Loading

rwightman commented Jun 19, 2025 •

edited

Loading