Sunday, February 18, 2024

Month 3 - Week 2 - Into the Storm

 The title of this blog post for this week might be a bit farther out from my normal naming, but by no means am I exaggerating it. This week was chaotic and swamped with different issues that formed into twists and turns, turned into other issues, and outright formed into chaos towards the end. So, strap in, because there is a lot to cover.

In my last article, I discussed the work I was doing to get the Gateware foundation up-to-spec with its CMake. This was primarily focused on Linux as I felt it needed to be the most capable. This is primarily because CMake is built for the Linux family. It has tight integration and support for many of its architectures and strives to support a wide range of purely and wholly Linux-specific options. Because of this, I wanted most, if not all, of the CMake code for Linux to be tuned and optimized for what Linux is all about, options.

I went into further detail about this last week, so I won't linger on the likely tiring topic. However, I will cover some of the changes I made to this.

It was with happy notes that two major branches got merged of the many I made last week. The first one which I'm quite happy with is the general tags. This little change was one I was looking very much forward to and was happy to see it incorporated.


Another merge request, and really, the primary one, was the overhaul to the CMake for Linux.


All of the changes for it are listed concisely on the post as it covers a lot of different changes that were made which I found to be a fair bit useful. Something rather enjoyable of note was the different merge requests that were created from this one.

When this merge request was originally made, it involved mainly bringing the support listed as well as a plethora of refactoring aimed to make the CMake functionally more readable and concise, but it also spawned many new things as well. One in particular was Clang support.

Back in the early days of Gateware, it was primarily developed using Clang, specifically Clang 6 from what the history of commits tells me. At some point during development, around the time the project was facing a problem involving auto-linking, Clang was dropped in turn to support GCC instead. This put the project in a stasis on Linux where you have to use GCC only. This was something I was particularly questionable of as I like Linux's philosophy of giving the user options. So, I questioned if Clang could be brought back. The current version of Clang today is Clang 16, quite a bit further ahead than it was back then, so I figured it would support much of the features Gateware was originally waiting on. So, I did what developers do best, and started prototyping.


The first real edit I did was modifying the flags passed to the compiler. This was really the only major breaking change preventing Clang from being used. Once this change was done, it was off to compile. There were some immediate issues, one in particular was rather simple.


Some of the code inside of GAudio was using `nullptr_t` without the `std::` namespace added. In GCC, this wouldn't cause any problems, however, Clang is a little more picky about usages like this and makes it clear from the start that this should be added. Those changes were made, and then we encountered our first real issue, one that slipped passed even the seasoned devs.


This little piece of code was rather interesting. At first, I questioned how GCC was even compiling it, to begin with, but that led to a deeper rabbit hole. So, in case you happen to not understand what's happening here, let me explain. This code initializes a stack array of a certain size, stashing the width and height in the first two indexes before storing the rest of the space with pixel information. In total, this code takes pixel information stored in a certain way and then repacks it in another way. A great example is taking a BGRA image and then turning it into an ARGB image. This is effectively what it's doing. Now here is the problem. When defining a stack array like this, the size should be constant so that way we know that the array will not be stored dynamically to the stack. However, in the image, we don't see that.

We can see the size of the array is `convertedLength`, and we see that the variable is constant which is good, but what about `_width` or `_height`? Well, those are non-constant parameters. So, by our understanding, this is a non-standard array, furthermore, this code shouldn't be compiling right now, even with GCC. Yet, somehow, GCC is letting this build and pass perfectly fine.

That's when Clang came in. It immediately called out this issue, as loud as it could, making it loud and clear that this code was not only illegal but also an error. So with that, I immediately went to correct it.


This simple little change is not only clear but gets the point across evidently how the memory is handled. We aren't going to manually handle the array, we have no reason to. Instead, we will let the unique pointer type manage that, and it will give us a window into the array so we can modify and alter everything how we need to. After making this edit and doing a couple other bits of adjustments to the code, Clang support was now officially added. Wonderful!

Lastly, for our little string issue we had last week, I was able to make the last changes to it, adding the flags for both Clang and GCC and marking it done as well.


Now, what if I told you, that even with all of this progress, we still aren't done, not even halfway? Surprising right? I'm on a roll, and yet, we haven't even gotten to the good part. That is correct, it was after these changes that I started spiraling into a massive problem.

There was a merge request that I made recently to correct some issues with the GAudio tests. I wanted to do this as a way to help lessen some of the Clang warnings. It was a simple syntax issue that was overlooked and nothing I would consider too serious.


After making these changes, I began working on the primary bug that had been marked since the first week, the audio bug. A lot of my work was just on refactoring alone as I tried diagnosing and figuring out the issue at play as it was bewildering. Something internally wasn't working right, and things were breaking quite a bit. I did a lot of debugging, testing, and runs to try and figure out the problem, refactoring the code along the way to help read it easier as I figured this would be my home for the next several days. While doing this, I also decided to start diagnosing a rather annoying window event reporting failure in the unit test. This had been happening for a little while now, and I took note of it but didn't pay it any mind as I figured it may have been configuration-specific. I never had the issue show up until after a recent update to Xorg and a modification to my desktop. So I thought it was just a random one-off issue that wasn't general, as the runners were working and passing fine. However, I felt it would be in my best interest to look into it and see what could've possibly been happening. Upon inspecting the code, I came to find out it was occurring when the `Maximize` function would be called for GWindow. This event to maximize the window would do as expected, but instead of just reporting a singular maximize event, it would report a maximize event, and then do an event burst that was identical to maximize/minimize events after the initial maximize. This was troublesome as it meant a bug slipped in, one that I had introduced.


With this, I started going to work on fixing it. I figured with this being an edge case, the fix would be somewhat targeted but would work.


This code here was the fix. With this little edit, the events started working as expected again. There was a problem, however, if this issue happens to be isolated to just the latest versions of Xorg, then what does that mean for older versions? The problem would arise where the maximize would report fine, but now the minimize would be unreported due to the toggle I introduced for the manual call. This isn't good. I needed to test this. Specifically, I needed to update the Linux Mint runner to the latest version of Xorg, and then test the code before and after the change to verify it was Xorg. This had its own problems.

The Linux Mint runner that was currently being used was 20.3 (I may be slightly off, but it was the major version 20). When I got passed some initial issues updating, I found that this release was already up-to-date with the latest packages possible for that Mint version, which were all out of date by a fair bit. That's not good, I have to be able to test for this issue as best as I can, so I chose the next best option. I made a VM of Linux Mint and then cloned the GSource repo, building and running it on 21.3 of Linux Mint, which was running a much newer version of Xorg. This would give me a good idea if it was Xorg causing the issue or not.

I ran the test and waited... then waited... and waited more. Eventually, the tests were done, and to my surprise, no issue. That's confusing. Is it Xorg causing the issue then? I wanted to do a manual test just to be sure. I toggled the manual tests and then started testing them. I maximized the window and got the maximized event. I minimized it... nothing. Maximized... nothing again. Oh no. Nothing was reported. This isn't good. For some reason, the fix I had developed for this event reporting issue wasn't working at all on Linux Mint, not just that, but Debian. It was only Arch that was working as expected. That's not good at all. It means the most popular branch of Linux still has a bug I thought I fixed, furthermore, it means I have Arch-specific code inside of the Debian build that's doing nothing.


Sadly, after careful consideration, I had to reopen the issue.


Because of this issue resurfacing, I discussed with Lari the course of action and came to the conclusion that the audio bug would have to be put on the back burner while I focus on fixing these event reporting issues. Hopefully, by next week, I will be able to put an end to this issue and have it fixed permanently, or at least until X11 updates with more breaking changes.

For now, this is where I'm at. Next week will be focused heavily on fixing this bug, as well as the child to it. By next week, I hope that this issue will be fixed so that I may move forward with the audio bug and continue fixing it for week four.

No comments:

Post a Comment