[qubes-users] Help me test fixes for Intel IGD graphical artifacts on Qubes R4.0

When using some Intel integrated graphic cards on Qubes R4.0, screen
glitches may manifest after switching VTs or entering suspend mode.

A known workaround does exist for this bug, which is to add a
configuration file with the following contents within
/etc/X11/xorg.conf.d:

Section "Device"
  Identifier "Intel Graphics"
  Driver "modesetting"
  Option "AccelMethod" "glamor"
  Option "DRI" "3"
EndSection

However, the X11 modesetting driver version in Fedora 25 has its own
drawbacks:

* It freezes briefly when re-configuring monitors (e.g. plugging in an
  external monitor or changing screen resolution)
* XRandR keystone support is buggy

To remediate this, I've patched the Linux i915 driver and it has been
working fine for months. Only the patch for Linux 4.19 has been tested.

If anyone is affected by the issue, please feel free to test the
follow-up patches and give some feedback here.

See also:

* https://github.com/QubesOS/qubes-issues/issues/5244
* https://github.com/QubesOS/qubes-issues/issues/5377
* https://github.com/QubesOS/qubes-issues/issues/5460

Cheers,
Jinoh Kang

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

Thanks for the path. I've tried this one on 5.10.5, and it needs a
couple fixes to build, see below.

But then, it seems to fix the issue with VT switch and disabling
acceleration.

Independently, starting with 5.9.x (without the patch) the system freezes
with specific monitor configuration (two monitors side by side, but one
scaled x1.5). It happens immediately after configuring the second
display (and I still got the message about it in Xorg.0.log). Not
setting scaling on the monitor workarounds the issue.
Switching to modesetting driver fixes it. This issue wasn't here on
5.8.x or earlier and looks to be unrelated to this patch. I'm not sure
whether it's purely kernel driver issue, or some kernel<->Xorg
incompatibility.

---
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 36 +++++++++++++++++++++
drivers/gpu/drm/i915/i915_params.c | 3 ++
drivers/gpu/drm/i915/i915_params.h | 1 +
3 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index f2eaed6aca3d..65596d2b284f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -712,6 +712,34 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
   .release = i915_gem_userptr_release,
};

+static int i915_gem_userptr_prefault(unsigned long start,
+ unsigned long nr_pages,
+ bool readonly)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned int gup_flags = (readonly ? 0 : FOLL_WRITE) | FOLL_NOWAIT;
+ int err = 0;
+
+ down_read(&mm->mmap_sem);

mmap_read_lock(mm);

+ while (nr_pages) {
+ long ret;
+
+ ret = get_user_pages(start, nr_pages, gup_flags, NULL, NULL);
+ if (ret < 0) {
+ err = (int)ret;
+ break;
+ }
+ if (ret == 0)
+ ret = 1; /* skip this page */
+
+ start += ret << PAGE_SHIFT;
+ nr_pages -= ret;
+ }
+ up_read(&mm->mmap_sem);

mmap_read_unlock(mm);

+
+ return err;
+}
+
/*
  * Creates a new mm object that wraps some normal memory from the process
  * context - user memory.
@@ -796,6 +824,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
   if (!access_ok((char __user *)(unsigned long)args->user_ptr, args->user_size))
     return -EFAULT;

+ if (READ_ONCE(i915_modparams.gem_userptr_prefault)) {
+ ret = i915_gem_userptr_prefault((unsigned long)args->user_ptr,
+ args->user_size >> PAGE_SHIFT,
+ args->flags & I915_USERPTR_READ_ONLY);
+ if (ret)
+ return ret;
+ }
+
   if (args->flags & I915_USERPTR_READ_ONLY) {
     /*
      * On almost all of the older hw, we cannot tell the GPU that
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 7f139ea4a90b..c39766adeda2 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -197,6 +197,9 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
   "Fake LMEM start offset (default: 0)");
#endif

+i915_param_named(gem_userptr_prefault, bool, 0600,
+ "Prefault pages when userptr GEM object is created (default:true)");
+
static __always_inline void _print_param(struct drm_printer *p,
            const char *name,
            const char *type,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 330c03e2b4f7..1169a610a73c 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -79,6 +79,7 @@ struct drm_printer;
   param(bool, disable_display, false, 0400) \
   param(bool, verbose_state_checks, true, 0) \
   param(bool, nuclear_pageflip, false, 0400) \
+ param(bool, gem_userptr_prefault, true) \

param(bool, gem_userptr_prefault, true, 0600)

   param(bool, enable_dp_mst, true, 0600) \
   param(bool, enable_gvt, false, 0400)

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

When using some Intel integrated graphic cards on Qubes R4.0, screen
glitches may manifest after switching VTs or entering suspend mode.

A known workaround does exist for this bug, which is to add a
configuration file with the following contents within
/etc/X11/xorg.conf.d:

> Section "Device"
> Identifier "Intel Graphics"
> Driver "modesetting"
> Option "AccelMethod" "glamor"
> Option "DRI" "3"
> EndSection

However, the X11 modesetting driver version in Fedora 25 has its own
drawbacks:

* It freezes briefly when re-configuring monitors (e.g. plugging in an
  external monitor or changing screen resolution)
* XRandR keystone support is buggy

To remediate this, I've patched the Linux i915 driver and it has been
working fine for months. Only the patch for Linux 4.19 has been tested.

If anyone is affected by the issue, please feel free to test the
follow-up patches and give some feedback here.

So, I can confirm the (fixed) 5.10 patch also improves the situation.
Have you sent it upstream? I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

After all, we're not going to need it with Qubes R4.1.

I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.
I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Cheers,
Jinoh Kang

> So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

> Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

If that bug indeed affects only Qubes OS, there is a greater chance to
accept the patch, if the option defaults to false.

After all, we're not going to need it with Qubes R4.1.

Are you sure? The issue affects dom0 windows, which suggests it still
may be necessary. On the other hand, your patch description suggests
it's just any VM-mapped window triggers the faulty path in the
xf86-video-intel driver, that later affects all of the output.

> I do consider including it in our standard
> kernel package, but I'd like to see i915 driver maintainer opinion
> first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.

Can you send a fixed patch (that builds), rebased on top of recent Linux
(5.11-rc3, or recent 5.10)? I'll re-test and add my Tested-by:.

I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Generally, Linux patches should be sent to whoever MAINTAINERS file
lists, which do include some mailing lists. I highly recommend using
scripts/get_maintainer.pl script for that purpose (if you use git
send-email, that's as easy as --cc-cmd=scripts/get_maintainer.pl).

PS The other (independent) issue I mentioned seems to be
https://bugzilla.suse.com/show_bug.cgi?id=1180543, which is supposed to
be already fixed in >=5.10.6. I've already uploaded 5.10.7, but haven't
tested it on this particular machine yet.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

If that bug indeed affects only Qubes OS, there is a greater chance to
accept the patch, if the option defaults to false.

After all, we're not going to need it with Qubes R4.1.

Are you sure?

Now that I think about it, I just realised that we are still going to
this patch in Qubes R4.1, while migrating old VMs still using MFNDUMP.

But then, Fedora has moved away from xf86-video-intel to modesetting, so
we aren't going to need this patch either way (unless we are going to
switch to Wayland tomorrow)?

I believe everything else has been covered in qubes-issues#5909 [1].

The issue affects dom0 windows, which suggests it still
may be necessary.

Does it? I personally haven't experienced this case yet.

On the other hand, your patch description suggests
it's just any VM-mapped window triggers the faulty path in the
xf86-video-intel driver, that later affects all of the output.

Yes, but for privcmd-backed pages only. gntdev pages are unaffected.

I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.

Can you send a fixed patch (that builds), rebased on top of recent Linux
(5.11-rc3, or recent 5.10)? I'll re-test and add my Tested-by:.

OK, I'll send it upstream.

I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Generally, Linux patches should be sent to whoever MAINTAINERS file
lists, which do include some mailing lists. I highly recommend using
scripts/get_maintainer.pl script for that purpose (if you use git
send-email, that's as easy as --cc-cmd=scripts/get_maintainer.pl).

PS: due to Google SMTP OAuth2 issues, I have this weird workflow where I
first stage my changes in my local IMAP daemon's mbox and bounce it via
Thunderbird. This is admittedly very annoying, and perhaps I should
tinker with (neo)mutt + gmail-oauth2-tools some time. Or maybe switch
to a "real" mail account. This will also explain my past embedded PGP
signing mistake. Sorry about that...;(

PS The other (independent) issue I mentioned seems to be
https://bugzilla.suse.com/show_bug.cgi?id=1180543, which is supposed to
be already fixed in >=5.10.6. I've already uploaded 5.10.7, but haven't
tested it on this particular machine yet.

Thanks, good to know.

- --

[1] qubes-gui: backport window pixmap grant references to R4.0 · Issue #5909 · QubesOS/qubes-issues · GitHub

- --
Sincerely,
Jinoh Kang

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Something like this. In fact the current (Fedora) package is already
built from git snapshot.
We do backport this package from newer Fedora already:

But I would prefer to get it upstream anyway (and then possibly build
xorg-x11-drv-intel from newer git snapshot).

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Something like this. In fact the current (Fedora) package is already
built from git snapshot.

Here's the catch: Fedora hasn't been bumping gitdate for almost a year,
as seen in Pagure [1].

We do backport this package from newer Fedora already:
GitHub - QubesOS/qubes-linux-dom0-updates: Qubes component: linux-dom0-updates

That one from Fedora 28 is a bit behind, too.

But I would prefer to get it upstream anyway (and then possibly build
xorg-x11-drv-intel from newer git snapshot).

Something like this? (haven't built it yet, will fix later)

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 6a35067c..8a7af809 100644
- --- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -7023,6 +7023,8 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
   struct kgem_bo *bo;
   uintptr_t first_page, last_page;
   uint32_t handle;
+ struct drm_i915_gem_set_domain set_domain;
+ bool move_to_gtt = false;

   assert(MAP(ptr) == ptr);

@@ -7043,20 +7045,10 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
            read_only);
   if (handle == 0) {
     if (read_only && kgem->has_wc_mmap) {
- - struct drm_i915_gem_set_domain set_domain;
- -
       handle = gem_userptr(kgem->fd,
                (void *)first_page, last_page-first_page,
                false);
- -
- - VG_CLEAR(set_domain);
- - set_domain.handle = handle;
- - set_domain.read_domains = I915_GEM_DOMAIN_GTT;
- - set_domain.write_domain = 0;
- - if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
- - gem_close(kgem->fd, handle);
- - handle = 0;
- - }
+ move_to_gtt = true;
     }
     if (handle == 0) {
       DBG(("%s: import failed, errno=%d\n", __FUNCTION__, errno));
@@ -7064,6 +7056,21 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
     }
   }

+ VG_CLEAR(set_domain);
+ set_domain.handle = handle;
+ if (move_to_gtt) {
+ set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+ set_domain.write_domain = 0;
+ } else {
+ set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+ set_domain.write_domain = I915_GEM_DOMAIN_CPU;
+ }
+ if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
+ gem_close(kgem->fd, handle);
+ DBG(("%s: set_domain in import failed, errno=%d\n", __FUNCTION__, errno));
+ return NULL;
+ }

>> Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...
>
> Something like this. In fact the current (Fedora) package is already
> built from git snapshot.

Here's the catch: Fedora hasn't been bumping gitdate for almost a year,
as seen in Pagure [1].

> We do backport this package from newer Fedora already:
> GitHub - QubesOS/qubes-linux-dom0-updates: Qubes component: linux-dom0-updates

That one from Fedora 28 is a bit behind, too.

>
> But I would prefer to get it upstream anyway (and then possibly build
> xorg-x11-drv-intel from newer git snapshot).

Something like this? (haven't built it yet, will fix later)

I guess, yes.

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 6a35067c..8a7af809 100644
--- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -7023,6 +7023,8 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
   struct kgem_bo *bo;
   uintptr_t first_page, last_page;
   uint32_t handle;
+ struct drm_i915_gem_set_domain set_domain;
+ bool move_to_gtt = false;

   assert(MAP(ptr) == ptr);

@@ -7043,20 +7045,10 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
            read_only);
   if (handle == 0) {
     if (read_only && kgem->has_wc_mmap) {
- struct drm_i915_gem_set_domain set_domain;
-
       handle = gem_userptr(kgem->fd,
                (void *)first_page, last_page-first_page,
                false);
-
- VG_CLEAR(set_domain);
- set_domain.handle = handle;
- set_domain.read_domains = I915_GEM_DOMAIN_GTT;
- set_domain.write_domain = 0;
- if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
- gem_close(kgem->fd, handle);
- handle = 0;
- }
+ move_to_gtt = true;
     }
     if (handle == 0) {
       DBG(("%s: import failed, errno=%d\n", __FUNCTION__, errno));
@@ -7064,6 +7056,21 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
     }
   }

+ VG_CLEAR(set_domain);
+ set_domain.handle = handle;
+ if (move_to_gtt) {
+ set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+ set_domain.write_domain = 0;
+ } else {
+ set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+ set_domain.write_domain = I915_GEM_DOMAIN_CPU;
+ }
+ if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
+ gem_close(kgem->fd, handle);
+ DBG(("%s: set_domain in import failed, errno=%d\n", __FUNCTION__, errno));
+ return NULL;
+ }
+
   bo = __kgem_bo_alloc(handle, (last_page - first_page) / PAGE_SIZE);
   if (bo == NULL) {
     gem_close(kgem->fd, handle);

---

[1] Tree - rpms/xorg-x11-drv-intel - src.fedoraproject.org

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?