Last week we touched upon a bug that was occurring with X11
for Arch Linux. The bug was specifically when creating a window for a test. The
window would fail to create and now show up at all, failing the test. I was only
modifying the unit tests at the time but took time this week to look more
deeply into the actual code to see what could be going wrong.
One of the first things I decided to do was enable the
macros inside of GWindow to see the X11 debug output. This would allow me to
watch in real time what was failing, and better help towards finding the
problem child in the code.
As you can see, the errors being reported are talking about
a bad window. This normally means that the pixel map for the internal window is
bad, or the window itself failed to create and can’t have anything done to it.
I knew immediately the first place to check was the create function for GWindow.
Here we see two functions are called. The first function is
just initializing some basic fields, the second function, however, is the one
we’re interested in. `OpenWindow()` is the function responsible for creating
the window and initializing
all the information that GWindow needs to
properly manage it.
Remember, we can’t debug, or the error won’t occur. So,
instead, I decided to inspect some documentation and play around with the code,
probing it to see which function was giving problems. Upon inspection, I
noticed that anytime I messed with the `XInitThreads()` function, it had a
change in the functionality itself. It seems this might be the problem child I mentioned
earlier. I decided to put a `std::this_thread::sleep_for()` call right after
calling for the initialization of threads. The sleep time was small, only 10ms,
however, doing just this resulted in all of my tests passing consistently.
I found the bug that was occurring. So, how do I fix this then?
Well, the moment I discovered this, I made a commit with the sleep call change
as the fix. However, using this kind of call to fix the bug isn’t exactly good. Some issues can occur. For example:
- How would this code react on faster or slower
hardware? We can’t be certain that this sleep fix will work for all systems, it
might even only work on mine.
- Is this truly a threading issue, or a mutex
issue? The display for X11 is multithreading capable, so if I don’t lock it, it
could be other X11 functions are modifying or trying to modify the variable I’m
reading immediately.
- Lastly, what if this fix is temporary like the
event system bug months prior? This is a real concern; it would not be the
first time I “fixed” a bug only to have it come back spontaneously because I
wasn’t doing the idiomatic approach expected.
So, what I have to do is test for different solutions and
see what I can find. This was, unfortunately, all I could do for now. I wanted to
do more but was constrained on time with due dates and graduation fast
approaching. While this is going to be my last blog post, it will not be the
last of the changes I make to Gateware. I have full intention of continuing work on
Gateware to not only fix this bug, but also to see what I can do to fix,
implement, and update features for the project. This is in the hope that I learn
more from it over time, that and I simply enjoy working at a low level. So,
with that, this is my last blog post.