I recently installed Debian 12 (bookworm) on my thinkpad T14s, and have been getting random crashes that threw me back to the login screen. Here’s what I learned about this issue.
My configuration
- Hardware : Thinkpad T14s AMD Gen4
- CPU : AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
- OS : debian 12 (bookworm)
- Drivers : open-source mesa drivers, and non-free firmware installed
The issue
It would only present when undocked, and using the built-in laptop display. Most often when scrolling fast on text in firefox.
I was sometimes able to replicate the bug with furmark but not reliably. However glmark2
triggererd the crash very reliably after a couple of seconds.
Journalctl contained:
Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, si>
Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: proce>
Apr 24 17:19:51 guillaumehw kernel: amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
Apr 24 17:19:51 guillaumehw kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *>
Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap >
What I tried (and didn’t work)
- Disabling hardware acceleration in firefox
- Upgrading mesa drivers, upgrading firefox using a backport
- Upgrading the kernel up to
6.6.13+bpo-amd64
Solution
The more I looked at people with these kind of errors, the more I suspected some kind of undervoltage or other hardware issue. Although I never overclocked my machine, maybe the issue never occurred when docked because the power level was more stable when the computer was plugged in ?
What ended up working was setting the performance level of the card to low :
echo low | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
I then made this change persistent by creating a file /etc/systemd/system/power-dpm.service
containing:
[Unit]
Description=set the parameters power_dpm_force_performance_level
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level'
[Install]
WantedBy=multi-user.target
And running
sudo systemctl daemon-reload
sudo systemctl enable power-dpm.service
I didn’t notice any drop in performance, but otherwise finer performance adjustments are available in /sys/class/drm/card0/device/
that may suit your needs. Another solution would be to use hooks to set the performance level to low or auto depending on whether the laptop is plugged in.