FS Gateware

Sunday, February 11, 2024

Month 3 - Week 1 - Preperations for Audio Bug

This week has been a busy one, possibly more so than even when I implemented MSAA for DX11. There was work that needed to be done for issues unrelated to the bug I had picked for this week and even more work that was related to the bug I picked. So, this post will focus on the issues encountered, and the steps I took towards preparing for next week when I begin tackling this bug in its full entirety.

So, let's start from the beginning. The bug. The bug is an audio bug that has been around for a while now on Linux. This bug was originally not a bug, but rather refactoring that was needed for loading WAV files. It became a bug I assume after most of the refactoring had been done. There are issues with how the audio is played back to the user, specifically a described "popping" sound that is rather easy to catch in the tests. My description of the bug refers to the sounds more as popping, scrambled, or garbled. My assumption from the very start is possible garbage data that's inside the buffer causing the messy sounds that I hear. This could also be the case since this happens with shorter audio clips that aren't filling the buffer completely, resulting in some empty data at the end. If that empty space isn't handled properly, it could result in that data still being played, which results in the sounds I hear.

So, I picked a new bug to tackle and one that seemed like it would be pretty fun to tackle, but then I was hit with a wonderfully surprising issue the following day. The library wasn't compiling.

It seems that some change slipped past into the release code, preventing the entire project from compiling. This isn't good, this is a rather large issue to have because it can heavily affect the stability of the library if something like this happens. So, I went to work on diagnosing it. The errors really piqued my interest, however. I haven't done anything with strings that would be considered an error to the compiler, and generally ensure that my usage of strings and c-strings fit the C++ Standard. However, I did remember some warnings I had gotten last month when I first moved over to Linux that I took note of. There were four of these warnings, all warning of non-standard usage of strings.

These wonderful little warnings are not ones you generally want to see in code. Generally, if the compiler warns about ISO C++, there can be errors later. I also had similar warnings with the MSVC compilers as well. So I took note of these issues and moved on.

At least until now. Looking at the errors being printed, I immediately thought back to these warnings and knew that they were the cause. There were four of the warnings I took note of being in the release code, specifically in GFile. I didn't waste any time going through, making a branch, and fixing these troublesome errors.

These little things are what caused the entire library to buckle. The moment I made the variables const for both Linux and Windows, the warnings went away, and the library started compiling just fine. Simple fix thankfully. But this left us needing to figure out how to prevent this issue in the future. I wanted a stern fix, something that would permanently prevent this from happening again. I dug around and eventually found this little argument, "/Zc:strictStrings". What this does is tell MSVC to conform to the ISO C++ Standard and outright fail to compile if it finds poor string usage. So instead of it being a warning about it, it will treat it as an error. A lot of the "Zc" commands are passed into blank default project templates like Console applications, which is why it failed to compile initially. So, I went through and added this flag to the list of debug and release flags passed for the library. Now, if you develop for the library, it will outright fail if you do poor string usage. But I wanted to go further because we support developing on Linux and Mac. So I also introduced flags for them as well.

I do want to note, that currently, this merge request for these changes hasn't been merged and is still in active development. This is because another merge request needs to be combined first, and because I need to ensure the Mac flag is correct before merging completely, as Mac uses a Clang-based compiler for its C/C++ compiling.

After I made these changes, I had a sudden want to implement a couple other additions, mainly due to the bug that I will be working on. I wanted some changes to the library to make bug fixing a little easier. One of these changes is the introduction of general tags to unit tests.

This was easily one of the heftiest changes I made. It was the first outright change to code I've done without being asked and involved a lot of testing to ensure systems were working right. The code edits themselves were easy, but I felt I needed to "sell" my changes, since on the surface, they can look pretty generic or even unneeded.

What does this merge request actually do? What it does is allow for IDEs that are capable to test on tests by a specific tag. What this means is if you have a suite of tests that have an "Audio" tag, and you have dozens of others with different general tags, you can target only the tests with the "Audio" tag. This makes it so that you don't have to test all 2,349 tests, but just 40 instead. This is extremely useful for me, as I'm going to be testing audio a lot, and this change will save me having to wait for a minute each time to test audio. It'll instead cost me only 10-20 seconds. Not too bad.

After doing this change and sitting a little more on the project, I decided to tackle another issue. There was a problem with CMake on Linux. There are a few problems with it admittedly. 1) CMake fails to work with the build directory if that directory is not populated with at least one file, which makes it extremely frustrating if you want to remove the build directory for a completely clean build. 2) CMake lacked support for multi-config generators, meaning that if you tried using something like Ninja Multi-Config, it outright wouldn't work. Lastly, 3) there are a few single configuration generators (namely, Ninja) that won't work with CMake at all. I started trying to diagnose this issue to figure out what exactly was causing the problem. I ended up finding the issue pretty quickly.

This code here is the entire configuration for Linux in the main CMakeLists file. Besides some of the hard-to-read formatting, there was something I noticed very quickly.

The CMakeLists file required that these two directories exist. That's not good. Let me explain. Multi-configuration generators are generators that allow for multiple configurations within the same build. What this means is instead of having your build separated into a debug and release folder like with single configuration generators, you can instead have them stored inside of a main directory that will additionally store its own debug and release folder. It saves you from having to maintain multiple directories and also makes the general process of having multiple profiles for a single project much easier.

What happens is if I try to use a multi-configuration generator and point it to a build directory of "build/Desktop", it will fail to even make the project because it is expecting a specific layout for the build directory. So, I set to work on this problem.

The very first thing I chose to do for this was to let CMake manage not only the build directory but also put an empty file inside of it. If we don't make an empty file, we open the door to CMake not understanding the directory, failing to find libraries, and even its own make program. I'm not sure what it's like this, but we do these steps to prevent it. Lastly, we make the project itself. I'm not sure why in the old code the projects were made with different names, but it is generally unneeded, even with single configuration generators.

The next part is important, this is what will help configure multi-configuration generators. What this is doing is checking if the variable "CMAKE_CONFIGURATION_TYPES" is defined. This variable is only ever defined if you are currently using a multi-config setup. So, if it is, then we will set the types to the ones we want (Debug and Release). We also have a message that will print providing information about what generator you are using. This is more debug, but also useful in general as it tells you what step CMake is in currently for the CMakeLists.

Lastly, the biggest change. You may have noticed in the old code that each of these build configurations had their separate flags for the compiler.

Now, if you have a keen eye, you can remove a majority of these arguments as they are mostly the same with each configuration. The only difference between the two was "-g" being included for the debug configuration. Now, in the change I made, I provided a rather long comment. I'm going to quote this comment verbatim.

There is no configuration needed for debug and release as CMake will already handle this for us through the CMAKE_CXX_FLAGS_DEBUG and CMAKE_CXX_FLAGS_RELEASE variables. On multi-configuration build systems, CMake will automatically use CMAKE_CXX_FLAGS_DEBUG_INIT and CMAKE_CXX_FLAGS_RELEASE_INIT to set the flags for the respective configurations. The debug will use the commands `-g` and the release will use `-O3 -DNDEBUG`.

You may ask, how can I be sure of this? Well, I debugged the CMakeLists file and noticed the variable, I got curious and decided to test my theory with multiple generators, and found it to be true. Here is a snippet of a debug configuration I have using Ninja Multi-Config.

This is Ninja's form of a makefile that you are viewing. I took the effort to parse this and highlight the main part for your viewing. You'll notice in the list of flags I highlighted, you can see "-g" is included, even though I don't explicitly add it in CMake. Let's look at the release configuration from Ninja.

If you look here, you'll notice there is a difference. Instead of "-g", we get "-O3 -DNDEBUG", this is rather important, because it means that the explicit inclusion of this in the CMakeLists file was unneeded, CMake does this for us already. Good to know.

A lot of the other changes for this merge request included removing an error message that was not important due to it only reporting a library not being used (this library was "threads" which was excluded from being included in the project at one point in time). There was also a variable that was set for CodeLite, which I took the time to wrap in an if statement so it would only be set when you're using the CodeLite generator.

In total, this was a week that resulted in over 3 merge requests being made on top of the merge request for the bug that I'll be working on next week. I want to get all three merged first if possible so that way I can build, configure, and test the fix for this bug as comfortably as possible. For now, this is all the work that I've done this week.

This isn't a traditional format for how I do these blog posts, but due to the volume of changes I did, I felt all needed to be covered as they were each just as important to me.

Sunday, February 4, 2024

Month 2 - Week 4 - Working on Event Reporting in Linux

This week has been rather extensive. There was a lot of work involved with not just this class, but my other class as well. I wanted to pour in more time to get this specific bug we'll be talking about fixed and working correctly, however, I was not able to devote as much time as I would've liked. This is primarily because I had to spend a lot of time in the other class working on final assignments. So the majority of my time was spent ensuring that those assignments were handled and that they were turned in on time. As a result, the progress on this bug which I will be discussing in particular is rather slow, but I was able to make enough progress to be able to sufficiently know how to fix it next week.

To give a brief description of the bug, I want to put the key focus on the type of bug that we're dealing with. So whenever an average developer creates a window using our library, they tend to want to keep track of certain events that occur with that window. If you move a window, resize it, or maximize, or minimize it, you preferably want an event to be reported so that way the developer can create certain functionalities that can occur upon any of these events happening. So something I wanted to focus on was trying to determine why this bug occurred. A brief description of the bug is rather simple, if the user maximizes the window and then minimizes the maximized window, no minimize event is reported. It sounds simple on the surface but is actually much harder once you look into the nitty-gritty of it all.

One of the main issues in particular revolves around how these events are determined. I would also like to note that this bug is only occurring on Linux under the X11 library. Whenever an event occurs, the underlying architecture will report over to GWindow to handle it. This carries a couple properties, the main one in particular being the actual property return value. There are a lot of values this variable can be, but there are a few in particular that point to whatever is happening to the window. For instance, if the property value were equal to 384, then we can determine whatever happened to the window resulted in the window's property going to the max vertical height, this would normally mean a resize, maximize, or minimize event.

The first step that needed to be taken was trying to determine why the event wasn't being detected. There were a couple theories that I had, mostly revolving around the conditional logic of how the minimization event was being detected. I eventually concluded that none of the actual conditional logic was the problem. From my findings, it was a preliminary check that was acting up. So the whole reason the minimization event wasn't happening was due to that. The specific preliminary check that I'm referencing is a check that tries to determine if the window has changed all. It checks if there's anything different about the window's property, and by looking at it, if there's no difference, it will break out and report nothing. This is rather problematic though, because a minimization event doesn't inherently change the window. Whoever initially wrote the condition for this if statement also thought the same thing, as there is an additional condition applied to it, ensuring that the window isn't being minimized. However, this is where we face the actual bug.

Whenever a window is maximized and then minimized, the property return value does not report a minimized event. It reports a burst of 4-5 events, all being 384, or 384 with the last being 491 (I may be wrong as it's been a few days since I viewed the exact number). For simplicity, 384 is a property vertical max event, and 491 is unknown. None of these events are reporting the proper property hidden event code. This causes a problem. Since the property return value is no longer returning property hidden, and the window doesn't change due to it only being minimized, this if statement above executes and exits the function without reporting a single event.

There are a couple of things we could do to try and check for a minimize event during a maximize. One of the main checks is seeing if the previous event that was reported is a maximize event. If the previous event was a maximize event, and the property return value is 384, then we could report that a minimize event has occurred. We could also flip this condition to check if the previous event was a minimize event, so that way if you bring the window back up after minimizing, it reports a maximize. This is where the main problem occurs though, because I've already done this code, the window, due to the fact it reports these events in a burst of 4-5 events, we face the issue of it reporting multiple minimize and maximize events. So that leads us to figure out how to lessen that.

To clarify, in its current state, we do have minimize and maximize events reporting properly. The issue we currently have is that it's reporting them multiple times. We only want these events to report once, if it's reporting these events repeatedly, it can cause issues for whatever functionality might be blinded to these events. Since this library is used mostly for game development, you don't want to burst of minimize and maximize events being reported, there could be functionality tied to that game that may break behavior if it's being spammed with events like this. So this is why we need to figure out how to lessen it to just one event being reported.

One of the possible fixes is to use a bitmask or an array that tracks the burst of events, specifically, the property return value. Let's frame it like this. If the window is minimized while it's also maximized, we'll receive the burst of events, around 4-5. If we capture these numbers, and then check if they're equal to 384 or 491, we can tick a bit in the bitmask. We can then check if the bitmask equals 0x11110 or 0x11111. The reason for the two checks is if we only get four events and not five, we will have the last bit be 0, otherwise, they'll all be 1.

This is merely pseudo code, but it does show the idea of how this could work. By doing these checks, and then using the bitmask as a way to track how many times these events are reported, we can use the bitmask as a condition for the minimize and maximize. It would also force the event to be reported only once. This bitmask wouldn't be used anywhere else in the code and would be tracked only for this specific event. However, a change like this will require discussion with Lari and will have to be approved as this introduces more variables, and could have possible side-effects that I may be overlooking. Currently, this is where I am for this bug, but I feel I'm getting closer to fixing it. I feel if I'm allowed another week to tinker with it, I'll be able to get a solution made.

Edit (2/5/2024): I'm going to add a small correction to this blog post as I felt it was needed to correctly explain the bitmask. I made a mistake in my explanation of it and failed to represent the data accurately, this is due to most of my experience with bit manipulation revolving around color. To better explain the actual values, they would need to be displayed in proper binary. So instead of 0x11110 and 0x11111, it would be `0001 1110` or `0001 1111`. The pseudo-code would also be different compared to what was originally shown.

This should hopefully show more accurately what the code is checking for. Here we are checking if the bitmask is equal to `0x0001E` or `0x0001F`, which are just 30 and 31, or `0001 1110` and `0001 1111` respectfully. So this code should actually be more accurate to what the final code would look like.