You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `cp.async.mbarrier.arrive` Op makes the *mbarrier object* track
1854
1854
all prior cp.async operations initiated by the executing thread.
1855
1855
The `addr` operand specifies the address of the *mbarrier object*
1856
-
in generic address space. The `noinc` attr impacts how the
1857
-
mbarrier\'s state is updated.
1856
+
in generic or shared::cta address space. When it is generic, the
1857
+
underlying memory should fall within the shared::cta space;
1858
+
otherwise the behavior is undefined. The `noinc` attr impacts
1859
+
how the mbarrier\'s state is updated.
1858
1860
1859
1861
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
1860
1862
"""
@@ -1878,37 +1880,6 @@ function cp_async_mbarrier_arrive(addr::Value; noinc=nothing, location=Location(
1878
1880
)
1879
1881
end
1880
1882
1881
-
"""
1882
-
`cp_async_mbarrier_arrive_shared`
1883
-
1884
-
The `cp.async.mbarrier.arrive.shared` Op makes the *mbarrier object*
1885
-
track all prior cp.async operations initiated by the executing thread.
1886
-
The `addr` operand specifies the address of the *mbarrier object* in
1887
-
shared memory. The `noinc` attr impacts how the mbarrier\'s state
1888
-
is updated.
1889
-
1890
-
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
@@ -3334,8 +3305,10 @@ a result of this operation. The operation returns an opaque value that
3334
3305
captures the phase of the *mbarrier object* prior to the arrive-on operation.
3335
3306
3336
3307
The operation takes the following operands:
3337
-
- `addr`: A pointer to the memory location of the *mbarrier object*. Uses generic
3338
-
addressing, but the address must still be in the shared memory space.
3308
+
- `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
3309
+
must be a pointer to generic or shared::cta memory. When it is generic, the
3310
+
underlying address must be within the shared::cta memory space; otherwise
3311
+
the behavior is undefined.
3339
3312
- `count`: Integer specifying the count argument to the arrive-on operation.
3340
3313
Must be in the valid range as specified in the *mbarrier object* contents.
3341
3314
@@ -3362,35 +3335,6 @@ function mbarrier_arrive_nocomplete(
3362
3335
)
3363
3336
end
3364
3337
3365
-
"""
3366
-
`mbarrier_arrive_nocomplete_shared`
3367
-
3368
-
This Op is the same as `nvvm.mbarrier.arrive.nocomplete` except that the *mbarrier object*
3369
-
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
3370
-
3371
-
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
@@ -3408,8 +3352,10 @@ The operation returns an opaque value that captures the phase of the
3408
3352
value are implementation-specific.
3409
3353
3410
3354
The operation takes the following operand:
3411
-
- `addr`: A pointer to the memory location of the *mbarrier object*. Uses generic
3412
-
addressing, but the address must still be in the shared memory space.
3355
+
- `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
3356
+
must be a pointer to generic or shared::cta memory. When it is generic, the
3357
+
underlying address must be within the shared::cta memory space; otherwise
3358
+
the behavior is undefined.
3413
3359
3414
3360
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
3415
3361
"""
@@ -3432,33 +3378,6 @@ function mbarrier_arrive(addr::Value; res::IR.Type, location=Location())
3432
3378
)
3433
3379
end
3434
3380
3435
-
"""
3436
-
`mbarrier_arrive_shared`
3437
-
3438
-
This Op is the same as `nvvm.mbarrier.arrive` except that the *mbarrier object*
3439
-
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
3440
-
3441
-
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
@@ -3607,35 +3526,6 @@ function mbarrier_test_wait(addr::Value, state::Value; res::IR.Type, location=Lo
3607
3526
)
3608
3527
end
3609
3528
3610
-
"""
3611
-
`mbarrier_test_wait_shared`
3612
-
3613
-
This Op is the same as `nvvm.mbarrier.test.wait` except that the *mbarrier object*
3614
-
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
3615
-
3616
-
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-try-wait)
@@ -3793,6 +3683,34 @@ function match_sync(thread_mask::Value, val::Value; res::IR.Type, kind, location
3793
3683
)
3794
3684
end
3795
3685
3686
+
"""
3687
+
`memory_barrier`
3688
+
3689
+
`membar` operation guarantees that prior memory accesses requested by this
3690
+
thread are performed at the specified `scope`, before later memory
3691
+
operations requested by this thread following the membar instruction.
3692
+
3693
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
0 commit comments