Thursday, November 12, 2020

BlueScreen of Death

 Not that blue screen. 

BlueScreen is one of the templates that we maintain for users of Gateware to learn from or build off of. It's a cross platform implementation of an OpenGL renderer. It uses Gateware libraries to create a window, and then render a colored triangle inside it. It runs on Linux and Windows. It's also supposed to change color when you resize the window. For some reason, on Linux only, the resize event wasn't being received properly and so the lambda passed into the create function never gets hit.

I found this issue when updating and testing the 1.2 release candidate on all the templates on all platforms and so I was assigned to fix it. I spent most of last Thursday, Friday, Monday, and Tuesday only trying to solve this problem. 

First of all, the main hurdle was trying to debug using Codelite on Linux. For some reason, if you build the project in CMake, you'll have no debugging unless you specify that you want it. To solve this issue, I had to copy over the the LinuxSetup script from the devops folder in the main Gateware dev repo and modify it to work in a different directory.

Once that was sorted, I got to work. I set breakpoints everywhere I could think of that might be useful. No luck. I put assert statements everywhere I could think of. No luck. I added print statements for debug information. No luck. 

Countless hours of research into X window and X11 on Linux, with nothing to show for it. Wednesday comes around and we have a release team meeting (that I missed, unfortunately). Ozzie and Lari worked on it and Lari came up with a very clever way to debug the problem. He put print statement in the lambda, then put a return at the beginning of the Create() function, rand the program, then moved the return down a few lines and tested again. They knew they found the line causing the problem once it stopped printing from the lambda.

Apparently, the issue was in X11. OpenGL was overriding settings that GWindow originally sets up and some of those settings had to do with whether the event got received. 

Definitely not an obvious fix.

No comments:

Post a Comment