Question

Why does PyTorch's grid_sample return an incorrect value for an exact integer point?

Answer and Explanation

The apparent "incorrect" value returned by PyTorch's grid_sample for exact integer points typically stems from the way it performs interpolation, specifically when using the bilinear interpolation mode. While you might expect a direct pixel lookup at integer coordinates, grid_sample interpolates values from neighboring pixels.

Here's a breakdown of why this happens:

1. Normalization and Pixel Boundaries:

- PyTorch's grid_sample expects the grid to be normalized between -1 and 1, where -1 and 1 represent the edges of the input tensor. This normalization is crucial to understand. If your input tensor has dimensions H and W, a coordinate (x, y) in pixel space is mapped to (-1, -1) at (0, 0) and to (1, 1) at (W-1, H-1) within the normalized grid. The center of the top left pixel, for instance, is not at the normalized coordinate (-1, -1) but actually very close to it since the point (-1, -1) is the edge between two pixel areas.

2. Bilinear Interpolation:

- With bilinear interpolation, grid_sample calculates the output value as a weighted sum of the 4 closest input pixel values. Even if you provide a normalized coordinate that corresponds to what you believe is an exact integer point, it's likely that the calculated normalized coordinate will not exactly match the center of a pixel, especially if it falls right at the boundary.

3. Floating-Point Precision:

- The transformations involved in grid_sample, from pixel coordinates to normalized coordinates and back to pixel locations for interpolation, might introduce minor floating-point inaccuracies. These inaccuracies can shift the interpolation point slightly, leading to the observed discrepancies.

4. Edge Cases:

- Points exactly at the edges of the image can exhibit different behaviour since there might not be enough points to do a 4-point interpolation properly.

5. Example

Let's say your image is of size 4x4 and you want to sample pixel (1, 1). In order for grid_sample to sample this pixel, you need to provide normalized coordinates. The normalized coordinates will be computed as x = (2pixel_x / (width - 1) - 1) and y = (2pixel_y / (height - 1) - 1). Therefore in this case: x = (2 1 / (4 - 1)) - 1 = -1/3 and y = (2 1 / (4 - 1)) - 1 = -1/3. This means that the interpolation will be done using the surrounding pixels (0,0), (0,1), (1,0), (1,1) and not exactly on (1,1).

How to Get Direct Pixel Lookup

- If you need to access exact pixels without interpolation, consider using basic indexing in PyTorch. For instance, if your input image is img, you could simply access the pixel at (y, x) by doing img[y, x]. If you need to access the pixel with normalized coordinates you would need to make sure that these normalized coordinates map to the center of the pixel in the original image. You will also need to perform manual normalization/denormalization in your code.

In Summary:

The "incorrect" value arises from bilinear interpolation behavior in PyTorch's grid_sample. The function is designed for continuous transformations, not discrete pixel lookups, hence the interpolation. Using standard tensor indexing for direct pixel access would bypass this issue. Understanding the underlying coordinate normalization and the concept of bilinear interpolation in grid_sample is the key to avoiding confusion.

More questions