Can someone explain how the shape of `(H*W) x (H+W-1)` for attention is derived in the Python code? The energy shape is `H*W, H+W`. Thank you @speedinghzl and @honghuis for sharing this amazing repo.