Sunday, February 25, 2024

Month 3 - Week 3 - Fixing the Event System and Continuing with Audio Fix

Last week we went into the discussion of all the different things that were being pushed for merging, different issues encountered when refactoring the audio code for the bug fix, and then ending it with the return of the event system issue that was starting to plague throughout the Debian build of the library. This week, we will discuss the changes to the event reporting system and what I did to fix this bug.

This was a rather frustrating issue to tackle as I ended it on the complicated matter of figuring out how to locate a way to implement the fix. I was originally working on the idea that since it was X11 causing the problems, I could check which version of X11 was running and then build a solution for that version and the ones occurring after it. However, this solution wasn’t a clean one. What if the check fails at some point? What if a later version fixes the problem that was being encountered? Lastly, is my solution a topic of over-engineering?

I started to do some minor research into the topic, and the code itself. I looked into the code that specifically gave us the property return value. I figured—maybe—it could shine a light on the matter and help me understand in better detail what could be happening. Looking into some of the scarce documentation on `XGetWindowProperty`, I found something rather interesting of note. The property return isn’t exactly a single value. It can contain a single value—but is capable of sending more than just one return code. So, the property return is treated more like an array rather than a singular value. I had already known this due to the code below, however, I did not consider that there may be something more to this.

Furthermore, it is noted by other developers that the function that gives us the property return values is known to have—in the only words I can describe—undefined behavior. The return can move, reorder, position, and even be absent of values in events. It’s not entirely known why it does this, but depending on the event you are reading, they are things to take careful consideration of.

I started working on this by creating a little logging of the property event codes, this was put inside of a while loop that would cycle through all the codes in the event being fired. Whenever I would maximize the window, or minimize the window, I took note of all the events it would fire. The thing I found interesting about the maximize event is it would report a vertical and horizontal maximize property return.  However, with the minimize, I found that it would report the same two property returns, along with a hidden property return. And much like what was mentioned previously, on Arch, the order of events was different compared to Debian, thus explaining why my initial fix wasn’t working.

I decided that instead of using bitmasks to try and fix the event reporting problem as I did initially, I would instead use these multiple return values to properly determine if a maximize or a minimize event was happening. The first step was the logic of the loop I made for logging. I rearranged the code and proceeded to check for the specific conditions to be met. If the property return value was vertical max, toggle the vertical maximize flag, if it was horizontal, toggle the horizontal flag, and so on. The exception to this was the hidden flag. If the hidden flag was present, we want to toggle the other two off, this is because if the other two are active, the minimize event could be mistaken as a maximize since the maximize only reports the vertical and horizontal property return values.

The next thing needed was to simply form the condition of this check. We can determine a maximize event is occurring fairly quickly, we just need to confirm that 1) the window is the same, 2) the vertical and horizontal flags are true, and 3) the previous event is not a maximize (this will prevent spamming of the event). If we meet all these requirements, then a maximize event has occurred. The minimize event is much the same, but it is looking for the hidden flag being raised over the vertical and horizontal flags.

After we’ve gotten these conditions settled, the replacement of the bitmask was very straightforward. With this replacement, we now report the maximize and minimize of the window without any extra hassle. Furthermore, this also fixes the other bug that was occurring last week where the event test case in the unit test was failing due to a bitmask overflow occurring which resulted in the events inversing, showing minimize on maximize and maximize on minimize.

Lastly, the final change I made was removing this code. I chose to do this as the if chain it was connected to has an else block that will return a maximize immediately, I figured that removing this would improve the performance of the code by minimizing some of the conditional checking it has to do. Something else to add to this is that with the addition of my code, there is a chance this code will no longer run since my code can pick up maximize events very quickly just by reading the multiple return values. So, I removed this and tested and found all the tests worked as expected, and my manual testing also passed, on both Debian and Arch. So, with that, this code was now effectively fixed.

Of course, there was more work this week. Since this bug was fixed rather early in the week, and was considered a high-priority issue, I found myself open to working on the audio issue once more.

I found myself mostly refactoring a lot of the audio code. It was written back when C++98 was used for the library—at least that’s what I’ve been told—so a majority of the code reflected that. Usages of new and delete, the use of NULL over nullptr, and general low-level management that doesn’t reflect the more modern usages seen in C++11. A lot of my changes focused on trying to modernize the code and get it into a more functional state while also removing the rather tricky memory leaks that were spread throughout it. Almost all of them looked to be small oversights that were not noticed when initially written. The way I fixed this problem was by using `unique_ptr`. The fix wasn’t nearly as easy though. Some of the changes required making custom delete functions for the pointers since the pointers were managed by PulseAudio. These had to be handled in a very particular way since the library reference counts these allocations. So, neglecting to delete them properly results in the leakage of memory or worse, runtime crashes.

Just by a surface look at the code, it is clear the kind of work that has to be done to effectively ensure these pointers are managed. There may have been more elegant solutions for this, but this is the one I found to be most readable and straightforward from an outsider's perspective. You may be wondering how these functions get passed into the unique pointer. That, in the end, is handled on the assignment of the pointer.

Almost all of the pointers that need these delete functions are handled like this. This code, in my complete opinion, is messy, however, I can’t use the modern luxuries of `std::make_unique` since it wasn’t introduced until C++17, even then, `std::make_unique` doesn’t support setting the custom delete function we need, so this is our best option. Throughout all the code, we replace the initialization and usages of these variables with the unique pointer equivalent, while also fixing instances of new and delete with other local variables.

This will ensure that no possible memory leaks can occur. While this example is rather clear that no memory leak would happen, what about this one?

We can’t be sure, but I know that `CreateFilePath` performs a memory allocation, and I know that it isn’t freed before that reallocation occurs, so in this instance, a memory leak occurred.


Now in these changes, we know for certain that the memory is no longer leaking. We are returning a unique pointer and instead letting it maintain the memory being used and letting it deallocate using RAII over user control. These little changes just save the confusion and make it clear for new developers what to use without them having to mentally track every allocation and deallocation they make. While I can do it, and many others can, others may have a harder time with that. Especially with classes like this one which do a lot.

After performing these changes and testing them, I continued to do minor refactoring while also including another bug to close the merge request. This other bug simply remarked fixing memory leaks found with the audio class. Since I just made this rather large change to the code, I added it as it will more than likely be fixed along with the initial bug.

Now, to close the topic for this week, I have created two new issues that will have to be tracked. Firstly, recent PulseAudio versions have started crashing during 3D tests. The crash is happening from an assertion failure within the internal code for PulseAudio, within the module `pulsecore`. This is something that will have to be worked on just to ensure that newer versions are supported, as my unit testing is currently broken right now because of it. Lastly, the other issue is that Linux users using PipeWire will suffer from distorted, fast-forwarded, skipping, and corrupted audio. I was using PipeWire initially without realizing it and found out that PipeWire with `pipewire-pulse` does not like or play with the GSound and GMusic code all that well. The workaround for this is using PulseAudio directly, but that introduces the other issue with the new versions not liking GMusic3D. So, in the end, both of these issues are related, and both will need to be fixed and closed at some point soon. I may work towards trying to fix this issue before tackling the ghost window that Lari wants me to work on. Leaving GAudio in the state it is in is not sitting right with me, and the more I tinker with it and use it, the more I find pockets of issues that need to be sorted out. I’d like to work on these and fix them before I feel comfortable enough to move on to the next issue, which would be ghost windows on macOS.

Sunday, February 18, 2024

Month 3 - Week 2 - Into the Storm

 The title of this blog post for this week might be a bit farther out from my normal naming, but by no means am I exaggerating it. This week was chaotic and swamped with different issues that formed into twists and turns, turned into other issues, and outright formed into chaos towards the end. So, strap in, because there is a lot to cover.

In my last article, I discussed the work I was doing to get the Gateware foundation up-to-spec with its CMake. This was primarily focused on Linux as I felt it needed to be the most capable. This is primarily because CMake is built for the Linux family. It has tight integration and support for many of its architectures and strives to support a wide range of purely and wholly Linux-specific options. Because of this, I wanted most, if not all, of the CMake code for Linux to be tuned and optimized for what Linux is all about, options.

I went into further detail about this last week, so I won't linger on the likely tiring topic. However, I will cover some of the changes I made to this.

It was with happy notes that two major branches got merged of the many I made last week. The first one which I'm quite happy with is the general tags. This little change was one I was looking very much forward to and was happy to see it incorporated.


Another merge request, and really, the primary one, was the overhaul to the CMake for Linux.


All of the changes for it are listed concisely on the post as it covers a lot of different changes that were made which I found to be a fair bit useful. Something rather enjoyable of note was the different merge requests that were created from this one.

When this merge request was originally made, it involved mainly bringing the support listed as well as a plethora of refactoring aimed to make the CMake functionally more readable and concise, but it also spawned many new things as well. One in particular was Clang support.

Back in the early days of Gateware, it was primarily developed using Clang, specifically Clang 6 from what the history of commits tells me. At some point during development, around the time the project was facing a problem involving auto-linking, Clang was dropped in turn to support GCC instead. This put the project in a stasis on Linux where you have to use GCC only. This was something I was particularly questionable of as I like Linux's philosophy of giving the user options. So, I questioned if Clang could be brought back. The current version of Clang today is Clang 16, quite a bit further ahead than it was back then, so I figured it would support much of the features Gateware was originally waiting on. So, I did what developers do best, and started prototyping.


The first real edit I did was modifying the flags passed to the compiler. This was really the only major breaking change preventing Clang from being used. Once this change was done, it was off to compile. There were some immediate issues, one in particular was rather simple.


Some of the code inside of GAudio was using `nullptr_t` without the `std::` namespace added. In GCC, this wouldn't cause any problems, however, Clang is a little more picky about usages like this and makes it clear from the start that this should be added. Those changes were made, and then we encountered our first real issue, one that slipped passed even the seasoned devs.


This little piece of code was rather interesting. At first, I questioned how GCC was even compiling it, to begin with, but that led to a deeper rabbit hole. So, in case you happen to not understand what's happening here, let me explain. This code initializes a stack array of a certain size, stashing the width and height in the first two indexes before storing the rest of the space with pixel information. In total, this code takes pixel information stored in a certain way and then repacks it in another way. A great example is taking a BGRA image and then turning it into an ARGB image. This is effectively what it's doing. Now here is the problem. When defining a stack array like this, the size should be constant so that way we know that the array will not be stored dynamically to the stack. However, in the image, we don't see that.

We can see the size of the array is `convertedLength`, and we see that the variable is constant which is good, but what about `_width` or `_height`? Well, those are non-constant parameters. So, by our understanding, this is a non-standard array, furthermore, this code shouldn't be compiling right now, even with GCC. Yet, somehow, GCC is letting this build and pass perfectly fine.

That's when Clang came in. It immediately called out this issue, as loud as it could, making it loud and clear that this code was not only illegal but also an error. So with that, I immediately went to correct it.


This simple little change is not only clear but gets the point across evidently how the memory is handled. We aren't going to manually handle the array, we have no reason to. Instead, we will let the unique pointer type manage that, and it will give us a window into the array so we can modify and alter everything how we need to. After making this edit and doing a couple other bits of adjustments to the code, Clang support was now officially added. Wonderful!

Lastly, for our little string issue we had last week, I was able to make the last changes to it, adding the flags for both Clang and GCC and marking it done as well.


Now, what if I told you, that even with all of this progress, we still aren't done, not even halfway? Surprising right? I'm on a roll, and yet, we haven't even gotten to the good part. That is correct, it was after these changes that I started spiraling into a massive problem.

There was a merge request that I made recently to correct some issues with the GAudio tests. I wanted to do this as a way to help lessen some of the Clang warnings. It was a simple syntax issue that was overlooked and nothing I would consider too serious.


After making these changes, I began working on the primary bug that had been marked since the first week, the audio bug. A lot of my work was just on refactoring alone as I tried diagnosing and figuring out the issue at play as it was bewildering. Something internally wasn't working right, and things were breaking quite a bit. I did a lot of debugging, testing, and runs to try and figure out the problem, refactoring the code along the way to help read it easier as I figured this would be my home for the next several days. While doing this, I also decided to start diagnosing a rather annoying window event reporting failure in the unit test. This had been happening for a little while now, and I took note of it but didn't pay it any mind as I figured it may have been configuration-specific. I never had the issue show up until after a recent update to Xorg and a modification to my desktop. So I thought it was just a random one-off issue that wasn't general, as the runners were working and passing fine. However, I felt it would be in my best interest to look into it and see what could've possibly been happening. Upon inspecting the code, I came to find out it was occurring when the `Maximize` function would be called for GWindow. This event to maximize the window would do as expected, but instead of just reporting a singular maximize event, it would report a maximize event, and then do an event burst that was identical to maximize/minimize events after the initial maximize. This was troublesome as it meant a bug slipped in, one that I had introduced.


With this, I started going to work on fixing it. I figured with this being an edge case, the fix would be somewhat targeted but would work.


This code here was the fix. With this little edit, the events started working as expected again. There was a problem, however, if this issue happens to be isolated to just the latest versions of Xorg, then what does that mean for older versions? The problem would arise where the maximize would report fine, but now the minimize would be unreported due to the toggle I introduced for the manual call. This isn't good. I needed to test this. Specifically, I needed to update the Linux Mint runner to the latest version of Xorg, and then test the code before and after the change to verify it was Xorg. This had its own problems.

The Linux Mint runner that was currently being used was 20.3 (I may be slightly off, but it was the major version 20). When I got passed some initial issues updating, I found that this release was already up-to-date with the latest packages possible for that Mint version, which were all out of date by a fair bit. That's not good, I have to be able to test for this issue as best as I can, so I chose the next best option. I made a VM of Linux Mint and then cloned the GSource repo, building and running it on 21.3 of Linux Mint, which was running a much newer version of Xorg. This would give me a good idea if it was Xorg causing the issue or not.

I ran the test and waited... then waited... and waited more. Eventually, the tests were done, and to my surprise, no issue. That's confusing. Is it Xorg causing the issue then? I wanted to do a manual test just to be sure. I toggled the manual tests and then started testing them. I maximized the window and got the maximized event. I minimized it... nothing. Maximized... nothing again. Oh no. Nothing was reported. This isn't good. For some reason, the fix I had developed for this event reporting issue wasn't working at all on Linux Mint, not just that, but Debian. It was only Arch that was working as expected. That's not good at all. It means the most popular branch of Linux still has a bug I thought I fixed, furthermore, it means I have Arch-specific code inside of the Debian build that's doing nothing.


Sadly, after careful consideration, I had to reopen the issue.


Because of this issue resurfacing, I discussed with Lari the course of action and came to the conclusion that the audio bug would have to be put on the back burner while I focus on fixing these event reporting issues. Hopefully, by next week, I will be able to put an end to this issue and have it fixed permanently, or at least until X11 updates with more breaking changes.

For now, this is where I'm at. Next week will be focused heavily on fixing this bug, as well as the child to it. By next week, I hope that this issue will be fixed so that I may move forward with the audio bug and continue fixing it for week four.

Sunday, February 11, 2024

Month 3 - Week 1 - Preperations for Audio Bug

 This week has been a busy one, possibly more so than even when I implemented MSAA for DX11. There was work that needed to be done for issues unrelated to the bug I had picked for this week and even more work that was related to the bug I picked. So, this post will focus on the issues encountered, and the steps I took towards preparing for next week when I begin tackling this bug in its full entirety.

So, let's start from the beginning. The bug. The bug is an audio bug that has been around for a while now on Linux. This bug was originally not a bug, but rather refactoring that was needed for loading WAV files. It became a bug I assume after most of the refactoring had been done. There are issues with how the audio is played back to the user, specifically a described "popping" sound that is rather easy to catch in the tests. My description of the bug refers to the sounds more as popping, scrambled, or garbled. My assumption from the very start is possible garbage data that's inside the buffer causing the messy sounds that I hear. This could also be the case since this happens with shorter audio clips that aren't filling the buffer completely, resulting in some empty data at the end. If that empty space isn't handled properly, it could result in that data still being played, which results in the sounds I hear.

So, I picked a new bug to tackle and one that seemed like it would be pretty fun to tackle, but then I was hit with a wonderfully surprising issue the following day. The library wasn't compiling.

 

 It seems that some change slipped past into the release code, preventing the entire project from compiling. This isn't good, this is a rather large issue to have because it can heavily affect the stability of the library if something like this happens. So, I went to work on diagnosing it. The errors really piqued my interest, however. I haven't done anything with strings that would be considered an error to the compiler, and generally ensure that my usage of strings and c-strings fit the C++ Standard. However, I did remember some warnings I had gotten last month when I first moved over to Linux that I took note of. There were four of these warnings, all warning of non-standard usage of strings.

 

 These wonderful little warnings are not ones you generally want to see in code. Generally, if the compiler warns about ISO C++, there can be errors later. I also had similar warnings with the MSVC compilers as well. So I took note of these issues and moved on.

At least until now. Looking at the errors being printed, I immediately thought back to these warnings and knew that they were the cause. There were four of the warnings I took note of being in the release code, specifically in GFile. I didn't waste any time going through, making a branch, and fixing these troublesome errors.

 

These little things are what caused the entire library to buckle. The moment I made the variables const for both Linux and Windows, the warnings went away, and the library started compiling just fine. Simple fix thankfully. But this left us needing to figure out how to prevent this issue in the future. I wanted a stern fix, something that would permanently prevent this from happening again. I dug around and eventually found this little argument, "/Zc:strictStrings". What this does is tell MSVC to conform to the ISO C++ Standard and outright fail to compile if it finds poor string usage. So instead of it being a warning about it, it will treat it as an error. A lot of the "Zc" commands are passed into blank default project templates like Console applications, which is why it failed to compile initially. So, I went through and added this flag to the list of debug and release flags passed for the library. Now, if you develop for the library, it will outright fail if you do poor string usage. But I wanted to go further because we support developing on Linux and Mac. So I also introduced flags for them as well.

 

 I do want to note, that currently, this merge request for these changes hasn't been merged and is still in active development. This is because another merge request needs to be combined first, and because I need to ensure the Mac flag is correct before merging completely, as Mac uses a Clang-based compiler for its C/C++ compiling.

After I made these changes, I had a sudden want to implement a couple other additions, mainly due to the bug that I will be working on. I wanted some changes to the library to make bug fixing a little easier. One of these changes is the introduction of general tags to unit tests.

 

This was easily one of the heftiest changes I made. It was the first outright change to code I've done without being asked and involved a lot of testing to ensure systems were working right. The code edits themselves were easy, but I felt I needed to "sell" my changes, since on the surface, they can look pretty generic or even unneeded.

What does this merge request actually do? What it does is allow for IDEs that are capable to test on tests by a specific tag. What this means is if you have a suite of tests that have an "Audio" tag, and you have dozens of others with different general tags, you can target only the tests with the "Audio" tag. This makes it so that you don't have to test all 2,349 tests, but just 40 instead. This is extremely useful for me, as I'm going to be testing audio a lot, and this change will save me having to wait for a minute each time to test audio. It'll instead cost me only 10-20 seconds. Not too bad.

After doing this change and sitting a little more on the project, I decided to tackle another issue. There was a problem with CMake on Linux. There are a few problems with it admittedly. 1) CMake fails to work with the build directory if that directory is not populated with at least one file, which makes it extremely frustrating if you want to remove the build directory for a completely clean build. 2) CMake lacked support for multi-config generators, meaning that if you tried using something like Ninja Multi-Config, it outright wouldn't work. Lastly, 3) there are a few single configuration generators (namely, Ninja) that won't work with CMake at all. I started trying to diagnose this issue to figure out what exactly was causing the problem. I ended up finding the issue pretty quickly.

 

This code here is the entire configuration for Linux in the main CMakeLists file. Besides some of the hard-to-read formatting, there was something I noticed very quickly.

 

The CMakeLists file required that these two directories exist. That's not good. Let me explain. Multi-configuration generators are generators that allow for multiple configurations within the same build. What this means is instead of having your build separated into a debug and release folder like with single configuration generators, you can instead have them stored inside of a main directory that will additionally store its own debug and release folder. It saves you from having to maintain multiple directories and also makes the general process of having multiple profiles for a single project much easier.

What happens is if I try to use a multi-configuration generator and point it to a build directory of "build/Desktop", it will fail to even make the project because it is expecting a specific layout for the build directory. So, I set to work on this problem.

 

The very first thing I chose to do for this was to let CMake manage not only the build directory but also put an empty file inside of it. If we don't make an empty file, we open the door to CMake not understanding the directory, failing to find libraries, and even its own make program. I'm not sure what it's like this, but we do these steps to prevent it. Lastly, we make the project itself. I'm not sure why in the old code the projects were made with different names, but it is generally unneeded, even with single configuration generators.

The next part is important, this is what will help configure multi-configuration generators. What this is doing is checking if the variable "CMAKE_CONFIGURATION_TYPES" is defined. This variable is only ever defined if you are currently using a multi-config setup. So, if it is, then we will set the types to the ones we want (Debug and Release). We also have a message that will print providing information about what generator you are using. This is more debug, but also useful in general as it tells you what step CMake is in currently for the CMakeLists.

 

Lastly, the biggest change. You may have noticed in the old code that each of these build configurations had their separate flags for the compiler.

 

Now, if you have a keen eye, you can remove a majority of these arguments as they are mostly the same with each configuration. The only difference between the two was "-g" being included for the debug configuration. Now, in the change I made, I provided a rather long comment. I'm going to quote this comment verbatim.

There is no configuration needed for debug and release as CMake will already handle this for us through the CMAKE_CXX_FLAGS_DEBUG and CMAKE_CXX_FLAGS_RELEASE variables. On multi-configuration build systems, CMake will automatically use CMAKE_CXX_FLAGS_DEBUG_INIT and CMAKE_CXX_FLAGS_RELEASE_INIT to set the flags for the respective configurations. The debug will use the commands `-g` and the release will use `-O3 -DNDEBUG`.

You may ask, how can I be sure of this? Well, I debugged the CMakeLists file and noticed the variable, I got curious and decided to test my theory with multiple generators, and found it to be true. Here is a snippet of a debug configuration I have using Ninja Multi-Config.

 

This is Ninja's form of a makefile that you are viewing. I took the effort to parse this and highlight the main part for your viewing. You'll notice in the list of flags I highlighted, you can see "-g" is included, even though I don't explicitly add it in CMake. Let's look at the release configuration from Ninja. 

 

If you look here, you'll notice there is a difference. Instead of "-g", we get "-O3 -DNDEBUG", this is rather important, because it means that the explicit inclusion of this in the CMakeLists file was unneeded, CMake does this for us already. Good to know.

A lot of the other changes for this merge request included removing an error message that was not important due to it only reporting a library not being used (this library was "threads" which was excluded from being included in the project at one point in time). There was also a variable that was set for CodeLite, which I took the time to wrap in an if statement so it would only be set when you're using the CodeLite generator.

In total, this was a week that resulted in over 3 merge requests being made on top of the merge request for the bug that I'll be working on next week. I want to get all three merged first if possible so that way I can build, configure, and test the fix for this bug as comfortably as possible. For now, this is all the work that I've done this week.

This isn't a traditional format for how I do these blog posts, but due to the volume of changes I did, I felt all needed to be covered as they were each just as important to me.