Guillaume Matheron

Data scientist, PhD in computer science

amdgpu freezes on Thinkpad T14s AMD Gen 4

Lenovo laptop

I recently installed Debian 12 (bookworm) on my thinkpad T14s, and have been getting random crashes that threw me back to the login screen. Here’s what I learned about this issue.

My configuration

  • Hardware : Thinkpad T14s AMD Gen4
  • CPU : AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
  • OS : debian 12 (bookworm)
  • Drivers : open-source mesa drivers, and non-free firmware installed

The issue

It would only present when undocked, and using the built-in laptop display. Most often when scrolling fast on text in firefox.

I was sometimes able to replicate the bug with furmark but not reliably. However glmark2 triggererd the crash very reliably after a couple of seconds.

Journalctl contained:

Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, si>
Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: proce>
Apr 24 17:19:51 guillaumehw kernel: amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
Apr 24 17:19:51 guillaumehw kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *>
Apr 24 17:19:51 guillaumehw kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap >

What I tried (and didn’t work)

  • Disabling hardware acceleration in firefox
  • Upgrading mesa drivers, upgrading firefox using a backport
  • Upgrading the kernel up to 6.6.13+bpo-amd64

Solution

The more I looked at people with these kind of errors, the more I suspected some kind of undervoltage or other hardware issue. Although I never overclocked my machine, maybe the issue never occurred when docked because the power level was more stable when the computer was plugged in ?

What ended up working was setting the performance level of the card to low :

echo low | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level

I then made this change persistent by creating a file /etc/systemd/system/power-dpm.service containing:

[Unit]
Description=set the parameters power_dpm_force_performance_level 

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level'

[Install]
WantedBy=multi-user.target

And running

sudo systemctl daemon-reload
sudo systemctl enable power-dpm.service

I didn’t notice any drop in performance, but otherwise finer performance adjustments are available in /sys/class/drm/card0/device/ that may suit your needs. Another solution would be to use hooks to set the performance level to low or auto depending on whether the laptop is plugged in.


Leave a Reply

Your email address will not be published. Required fields are marked *