Hi, thanks so much for your documentation, I'm using AMD 8845HS with 780M GPU run deepseek-r1:1.5b by Ollama follow by your document. But there has GPU hang error after several rounds of conversation:
HW Exception by GPU node-1 (Agent handle: 0x7e6eb7d0bb40) reason :GPU Hang
-
Hardware:
- CPU: AMD 8845HS
- GPU: 780M with 16GB VRAM
- Memory: DDR5 5600Mhz 48G
- OS: LXC Container in PVE 8.3
-
docker-compose:
services:
ollama:
image: ollama/ollama:rocm
container_name: ollama
restart: unless-stopped
devices:
- "/dev/kfd"
- "/dev/dri"
volumes:
- ./data:/root/.ollama
environment:
- OLLAMA_ORIGINS='chrome-extension://*,moz-extension://*'
- HSA_OVERRIDE_GFX_VERSION=11.0.0
- HCC_AMDGPU_TARGETS=gfx1103
- OLLAMA_LLM_LIBRARY=rocm_v60002
- OLLAMA_DEBUG=1
ports:
- "11434:11434"
ollama | time=2025-02-24T09:45:48.152Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
ollama | time=2025-02-24T09:45:48.152Z level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
ollama | llama_model_loader: loaded meta data with 26 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc (version GGUF V3 (latest))
ollama | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama | llama_model_loader: - kv 0: general.architecture str = qwen2
ollama | llama_model_loader: - kv 1: general.type str = model
ollama | llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Qwen 1.5B
ollama | llama_model_loader: - kv 3: general.basename str = DeepSeek-R1-Distill-Qwen
ollama | llama_model_loader: - kv 4: general.size_label str = 1.5B
ollama | llama_model_loader: - kv 5: qwen2.block_count u32 = 28
ollama | llama_model_loader: - kv 6: qwen2.context_length u32 = 131072
ollama | llama_model_loader: - kv 7: qwen2.embedding_length u32 = 1536
ollama | llama_model_loader: - kv 8: qwen2.feed_forward_length u32 = 8960
ollama | llama_model_loader: - kv 9: qwen2.attention.head_count u32 = 12
ollama | llama_model_loader: - kv 10: qwen2.attention.head_count_kv u32 = 2
ollama | llama_model_loader: - kv 11: qwen2.rope.freq_base f32 = 10000.000000
ollama | llama_model_loader: - kv 12: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
ollama | llama_model_loader: - kv 13: general.file_type u32 = 15
ollama | llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2
ollama | llama_model_loader: - kv 15: tokenizer.ggml.pre str = qwen2
ollama | llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
ollama | llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
ollama | llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
ollama | llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 151646
ollama | llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151643
ollama | llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
ollama | llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
ollama | llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
ollama | llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de...
ollama | llama_model_loader: - kv 25: general.quantization_version u32 = 2
ollama | llama_model_loader: - type f32: 141 tensors
ollama | llama_model_loader: - type q4_K: 169 tensors
ollama | llama_model_loader: - type q6_K: 29 tensors
ollama | llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
ollama | llm_load_vocab: special tokens cache size = 22
ollama | llm_load_vocab: token to piece cache size = 0.9310 MB
ollama | llm_load_print_meta: format = GGUF V3 (latest)
ollama | llm_load_print_meta: arch = qwen2
ollama | llm_load_print_meta: vocab type = BPE
ollama | llm_load_print_meta: n_vocab = 151936
ollama | llm_load_print_meta: n_merges = 151387
ollama | llm_load_print_meta: vocab_only = 1
ollama | llm_load_print_meta: model type = ?B
ollama | llm_load_print_meta: model ftype = all F32
ollama | llm_load_print_meta: model params = 1.78 B
ollama | llm_load_print_meta: model size = 1.04 GiB (5.00 BPW)
ollama | llm_load_print_meta: general.name = DeepSeek R1 Distill Qwen 1.5B
ollama | llm_load_print_meta: BOS token = 151646 '<|begin▁of▁sentence|>'
ollama | llm_load_print_meta: EOS token = 151643 '<|end▁of▁sentence|>'
ollama | llm_load_print_meta: PAD token = 151643 '<|end▁of▁sentence|>'
ollama | llm_load_print_meta: LF token = 148848 'ÄĬ'
ollama | llm_load_print_meta: EOG token = 151643 '<|end▁of▁sentence|>'
ollama | llm_load_print_meta: max token length = 256
ollama | llama_model_load: vocab only - skipping tensors
ollama | time=2025-02-24T09:45:48.455Z level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="You are a professional, authentic machine translation engine.\n\nYou are about to translate text from an article. Title: “OLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama”, Summary: {{imt_theme}}\n\nThis content may include the following terms {{imt_terms}}. Please handle these terms carefully.<|User|>; 把下一行文本作为纯文本输入,并将其翻译为简体中文,, if the text contains html tags, please consider after translate, where the tags should be in translated result, meanwhile keep the result fluently.仅输出翻译。如果某些内容无需翻译(如专有名词、代码等),则保持原文不变。不要解释,输入文本:\nOLLAMA_ORIGINS=chrome-extension://etc does not work #1686<|Assistant|>"
ollama | time=2025-02-24T09:45:48.457Z level=DEBUG source=routes.go:1457 msg="chat request" images=0 prompt="You are a professional, authentic machine translation engine.\n\nYou are about to translate text from an article. Title: “OLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama”, Summary: {{imt_theme}}\n\nThis content may include the following terms {{imt_terms}}. Please handle these terms carefully.<|User|>; 把下一行文本作为纯文本输入,并将其翻译为简体中文,, if the text contains html tags, please consider after translate, where the tags should be in translated result, meanwhile keep the result fluently.仅输出翻译。如果某些内容无需翻译(如专有名词、代码等),则保持原文不变。不要解释,输入文本:\nOLLAMA_ORIGINS=chrome-extension://etc does not work · Issue #1686 · ollama/ollama<|Assistant|>"
ollama | time=2025-02-24T09:45:48.458Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=174 used=0 remaining=174
// ... part of chat message
ollama | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=1 cache=0 prompt=183 used=0 remaining=183
ollama | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=2 cache=0 prompt=190 used=0 remaining=190
ollama | time=2025-02-24T09:45:48.776Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=3 cache=0 prompt=168 used=0 remaining=168
ollama | HW Exception by GPU node-1 (Agent handle: 0x7e6eb7d0bb40) reason :GPU Hang
ollama | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
ollama | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=14
ollama | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
ollama | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc refCount=13
ollama | time=2025-02-24T09:45:50.472Z level=DEBUG source=sched.go:407 msg="context for request finished"
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5137
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32638500(0x1f20624) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 32638500(0x1f20624) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32638500(0x1f20624) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32638500(0x1f20624) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 6400(0x1900)
ASIC Revision: 12(0xc)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2700
BDFID: 50432
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 40
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16319248(0xf90310) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16319248(0xf90310) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
uname -a
Linux dev 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 x86_64 x86_64 GNU/Linux
Did you meet same issues or can you give me some message to fix this issue, thank you so much.
Hi, thanks so much for your documentation, I'm using AMD 8845HS with 780M GPU run deepseek-r1:1.5b by Ollama follow by your document. But there has GPU hang error after several rounds of conversation:
Hardware:
docker-compose:
uname -a Linux dev 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 x86_64 x86_64 GNU/LinuxDid you meet same issues or can you give me some message to fix this issue, thank you so much.