arm64: dts: qcom: qcm6490-shift-otter: reduce rmtfs_mem from 6MiB to 2.5MiB by MatthewCroughan · Pull Request #12 · sc7280-mainline/linux

MatthewCroughan · 2025-10-07T15:05:46Z

The downstream dts still has this at 2.5MiB unlike counterparts such as the Fairphone 5 who set it to 6MiB

https://github.com/SHIFTPHONES/android_kernel_shift_qcm6490/blob/5b43ded8bf11d1d0571ade9979ffee03f9a0b973/arch/arm64/boot/dts/vendor/qcom/yupik.dtsi#L4680

…2.5MiB The downstream dts still has this at 2.5MiB unlike counterparts such as the Fairphone 5 who set it to 6MiB

MatthewCroughan · 2025-10-07T15:06:33Z

Worth noting that I haven't tested the changes don't break something like the modem yet.

amartinz · 2025-10-07T17:38:09Z

This may change with the A16 BSP updates, which also bump the downstream kernel to 6.12.

z3ntu · 2025-10-14T10:08:43Z

@amartinz So you say we wait until at least A16 is known what it needs? I don't know yet either from Fairphone side.

amartinz · 2025-10-14T15:48:42Z

@amartinz So you say we wait until at least A16 is known what it needs? I don't know yet either from Fairphone side.

I would prefer to wait, yes.

3.5 MB more or less is neglectable for a device with 12 GB of RAM in this stage of development, if you ask me 😄

commit 0031241 upstream. On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: 5434ae7 ("powerpc/64s/hash: Add a SLB preload cache") cc: stable@vger.kernel.org Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a355eef upstream. Currently, the initialization of loongarch_jump_ops does not contain an assignment to its .free field. This causes disasm_line__free() to fall through to ins_ops__delete() for LoongArch jump instructions. ins_ops__delete() will free ins_operands.source.raw and ins_operands.source.name, and these fields overlaps with ins_operands.jump.raw_comment and ins_operands.jump.raw_func_start. Since in loongarch_jump__parse(), these two fields are populated by strchr()-ing the same buffer, trying to free them will lead to undefined behavior. This invalid free usually leads to crashes: Process 1712902 (perf) of user 1000 dumped core. Stack trace of thread 1712902: #0 0x00007fffef155c58 n/a (libc.so.6 + 0x95c58) sc7280-mainline#1 0x00007fffef0f7a94 raise (libc.so.6 + 0x37a94) sc7280-mainline#2 0x00007fffef0dd6a8 abort (libc.so.6 + 0x1d6a8) sc7280-mainline#3 0x00007fffef145490 n/a (libc.so.6 + 0x85490) sc7280-mainline#4 0x00007fffef1646f4 n/a (libc.so.6 + 0xa46f4) sc7280-mainline#5 0x00007fffef164718 n/a (libc.so.6 + 0xa4718) sc7280-mainline#6 0x00005555583a6764 __zfree (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x106764) sc7280-mainline#7 0x000055555854fb70 disasm_line__free (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x2afb70) sc7280-mainline#8 0x000055555853d618 annotated_source__purge (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x29d618) sc7280-mainline#9 0x000055555852300c __hist_entry__tui_annotate (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x28300c) sc7280-mainline#10 0x0000555558526718 do_annotate (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x286718) sc7280-mainline#11 0x000055555852ed94 evsel__hists_browse (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x28ed94) sc7280-mainline#12 0x000055555831fdd0 cmd_report (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x7fdd0) sc7280-mainline#13 0x000055555839b644 handle_internal_command (/home/csmantle/dist/linux-arch/tools/perf/perf + 0xfb644) sc7280-mainline#14 0x00005555582fe6ac main (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x5e6ac) sc7280-mainline#15 0x00007fffef0ddd90 n/a (libc.so.6 + 0x1dd90) sc7280-mainline#16 0x00007fffef0ddf0c __libc_start_main (libc.so.6 + 0x1df0c) sc7280-mainline#17 0x00005555582fed10 _start (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x5ed10) ELF object binary architecture: LoongArch ... and it can be confirmed with Valgrind: ==1721834== Invalid free() / delete / delete[] / realloc() ==1721834== at 0x4EA9014: free (in /usr/lib/valgrind/vgpreload_memcheck-loongarch64-linux.so) ==1721834== by 0x4106287: __zfree (zalloc.c:13) ==1721834== by 0x42ADC8F: disasm_line__free (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x429B737: annotated_source__purge (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42811EB: __hist_entry__tui_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42848D7: do_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x428CF33: evsel__hists_browse (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== Address 0x7d34303 is 35 bytes inside a block of size 62 alloc'd ==1721834== at 0x4EA59B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-loongarch64-linux.so) ==1721834== by 0x6B80B6F: strdup (strdup.c:42) ==1721834== by 0x42AD917: disasm_line__new (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42AE5A3: symbol__disassemble_objdump (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42AF0A7: symbol__disassemble (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x429B3CF: symbol__annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x429C233: symbol__annotate2 (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42804D3: __hist_entry__tui_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x42848D7: do_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf) ==1721834== by 0x428CF33: evsel__hists_browse (in /home/csmantle/dist/linux-arch/tools/perf/perf) This patch adds the missing free() specialization in loongarch_jump_ops, which prevents disasm_line__free() from invoking the default cleanup function. Fixes: fb7fd2a ("perf annotate: Move raw_comment and raw_func_start fields out of 'struct ins_operands'") Cc: stable@vger.kernel.org Cc: WANG Rui <wangrui@loongson.cn> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: loongarch@lists.linux.dev Signed-off-by: Rong Bao <rong.bao@csmantle.top> Tested-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: qcom: qcm6490-shift-otter: reduce rmtfs_mem from 6MiB to …

f9a2018

…2.5MiB The downstream dts still has this at 2.5MiB unlike counterparts such as the Fairphone 5 who set it to 6MiB

MatthewCroughan force-pushed the otter-rmtfs branch from 70f24d1 to f9a2018 Compare October 7, 2025 15:06

MatthewCroughan changed the title ~~arm64: dts: qcom: reduce rmtfs_mem from 6MiB to 2.5MiB~~ arm64: dts: qcom: qcm6490-shift-otter: reduce rmtfs_mem from 6MiB to 2.5MiB Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm64: dts: qcom: qcm6490-shift-otter: reduce rmtfs_mem from 6MiB to 2.5MiB#12

arm64: dts: qcom: qcm6490-shift-otter: reduce rmtfs_mem from 6MiB to 2.5MiB#12
MatthewCroughan wants to merge 1 commit into
sc7280-mainline:sc7280-6.17.yfrom
MatthewCroughan:otter-rmtfs

MatthewCroughan commented Oct 7, 2025

Uh oh!

MatthewCroughan commented Oct 7, 2025

Uh oh!

amartinz commented Oct 7, 2025

Uh oh!

z3ntu commented Oct 14, 2025

Uh oh!

amartinz commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MatthewCroughan commented Oct 7, 2025

Uh oh!

MatthewCroughan commented Oct 7, 2025

Uh oh!

amartinz commented Oct 7, 2025

Uh oh!

z3ntu commented Oct 14, 2025

Uh oh!

amartinz commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants