FS Gateware: January 2024

Sunday, January 28, 2024

Month 2 - Week 3 - The GController Bug is Fixed

Last week I talked about a bug I was facing involving GController. To summarize, if you have a controller already plugged in before the GController is initialized, GController fails to properly "see" the controller. This was a rather tedious issue at first, primarily due to the fact I didn't realize what kind of bug this was. It was only later during this week that I figured out the bug was around threading and event callbacks. This required two changes primarily. The first was the unit test needed alteration so that it could properly locate and trigger the bug consistently, and the second was the controller setup needed to happen manually before the event callback was called.

The unit test was rather easy, as all it needed was a simple addition to the code. This addition is to make the main thread sleep. If the theory is that the event callback works on a separate thread, then we can make our thread rest while the event callback thread does what it needs. So, if the unit test is being done correctly, it should start and initialize the GController, get the connected controller count through GController, wait for one second, and then do a manual count of controllers connected. What this should do is find an inaccuracy. If the user has a single controller connected, the count when GController is initially made should be one. If we wait for one second and then check again, the count from the manual check should also be one. However, with the bug present, what happens is the initial count is zero, and then after one second of waiting, the count in the manual check is one. Since both counts don't match, there is a discrepancy, resulting in the test failing. This means the test located the bug happening, and that there is a bug present.

The second change was easier, as all I needed to do, was take the event callback lambda that was being used and store it inside of an auto variable. This variable will effectively become a function built solely for this initialized function inside of GController. We can then pass this variable to the daemon to make the event callback, and then call the lambda function itself after. This should make it so that when GController is initialized, it will immediately have the information it needs to properly work with the controllers as intended.

From this, I waited for the next meeting and then started discussions with Lari about this bug fix. We eventually concluded that the bug was fixed through his testing, and we discovered something else that was also interesting. Lari mentioned to me that there was another bug with manual unit tests for GController. What would happen is that GController would initialize in one of the manual tests, and if the test were to fail and move to the next manual test, the next one would fail outright, refusing to see any input mapping on the controller, resulting in missed inputs. When he went to show me this bug in action, however, he couldn't recreate the issue. I started diagnosing the problem and reading through the code but came to the conclusion that the bug I had just fixed was also responsible for the bug he was experiencing with the manual tests.

My theory is that the previous test wasn't breaking the next test, but more so, the test that was failing was due to the event callback not happening in time, thus the controller is not seen nor initialized. This would result in the input mapping not being configured correctly, and thus, when the test is reading or looking for input, the controller isn't registered, making the input "miss" even if you press the right button. So, indirectly, I ended up fixing two bugs at the same time without knowing it. I'm happy I did, however, because the more stable the code is for consumers and developers, the better.

Now, this would normally be the end of the post, however, I need to also talk about my next bug. There is a bug right now that involves the infamously hard-to-work-with X11 framework in Linux.

This bug is specifically around event reporting, or lack thereof with X11. The way this bug happens is simple. Take a GWindow, maximize it, and then minimize it. If you do this, you will have no minimize event reported due to the window being maximized. This is mainly due to conditional issues in the code. I've already located the problematic source in particular and will be working on it soon. The only reason I haven't started yet is due to the setup and steps I'm taking to properly get everything working correctly.

I decided that to tackle this bug, I would install Arch Linux on my system, and get it configured for development, so that way, I can fix the bug on my main workstation. The conditional check in specific is already located and marked as a comment on the merge request.

So since this bug has already been located and marked, I just have to prepare my workstation to resolve this bug, which I'm thankfully almost done with.

So far, I'm using an Arch Linux manual installation with Fish as my shell, I have KDE Plasma running under Xorg as my main desktop GUI experience. I'm also prepping some specific IDE-related things to get all of the code work ready. I should be finished with all of the configuring tomorrow, I just have some things with Vulkan that I need to take care of, and then I will be good to go. This bug will be my primary focus next week.

Sunday, January 21, 2024

Month 2 - Week 2 - GController Bug Investigation

Last week I spent time correcting and fixing a bug relating to macro usage and the context of what the macros actually represented. I'm happy to say this bug has been fixed and is now merged. Stepping further into GController, however, there are other bugs present that I decided to investigate. Particularly, there is one I noticed that piqued my interest.

I read through this issue and immediately wanted to diagnose it. At hand, the issue is rather simple. If you have a controller already plugged in before the GController is initialized, GController fails to properly "see" it, resulting in inaccurately reported information. For instance, if you were to call the function `GetNumConnected` which returns the total connected controllers, it would return zero instead of one if you had a single controller plugged in before GController is initialized. I decided to make a little unit test that would check for this inaccuracy, and then determine how GController would react. GController initially reported the unit test passing, which didn't sound right. I updated the unit test a little more, this time to provide printouts for the reported number of controllers compared against a manually checked reported number. To my surprise, they both were reporting zero, meaning that GController didn't even register that the controller was there at all.

This misreported number is unsettling, especially because the manual check is supposed to catch the inaccuracy and fail the test. After around two hours of researching, I reran the test, and amazingly, it failed. It worked as expected. The controllers connected reported zero, however, the confirmed controllers connected reported one. So, off the unit test, we have a bug that may or may not show at all. I decided to dig into the code at this point and debug.

From the debugging, I came across code internally that seems to be the problem:

In the initialize function, we see an event callback that is set up for XInput. In this event callback, we can start to piece together how the function is working. We call onto XInput to get the state of each controller slot, if the controller is connected, it will then work on setting up the controller, mapping what's needed, and then revealing it's connected. Something of note is that this code is bug-free. It works as expected, and also detects my controller fine. The problem is it is an event callback. The reason this becomes an issue is simple.

What if, by chance, you initialize GController and test for a controller before the event callback is called? What happens is the bug we're experiencing. This callback is being handled on a separate thread (if my information is correct). So what happens is the code we call and execute is happening too fast, and due to that, XInput hasn't initialized the controllers yet, which results in GController showing that no controller is plugged in. This means one of two things. We either need to initialize the controllers manually before XInput does, or we delay the program long enough for XInput to initialize itself. There is possibly another way, which is initializing XInput, and then calling the callback itself after (this assumes the back-end is created for XInput to allow this), but currently, these are the options left to choose from.

There is no resolution for this issue currently as I'm still working on getting it completed and fixed. From what's inspected though, I have a pretty good idea of what's happening now. So all that is needed is to test more, make changes, and see if this can be corrected.

Sunday, January 14, 2024

Month 2 - Week 1 - Fixing GController's Incorrect Macro Use

This week consisted of a lot of work from the previous month. Some of it included more MSAA testing, finalizing changes, and last touches, but there were some new things. One in particular was a bug I had assigned myself to last month and haven't had the time to fix, at least until now. With MSAA implemented, I took a look at it again and decided I would work on getting it fully fixed this week. This bug is directly related to the GController, specifically a macro called `G_MAX_CONTROLLER_INDEX`. The problem with this macro was that it was being incorrectly used. In some cases, it was treated as the max count of controllers, but in others, it was treated as the max index for the controllers array. The name of the macro itself can cause confusion upon seeing how it's used in code, and I figured it would be a perfect bug to work on right after finishing my first major feature.

We can see in the issue that it had been sitting for over a year. Admittedly, this issue is rather small, and not a major problem. However, it has been a source of confusion. Due to the name it carries, users of the library could end up mistaking it as the max index for the array, when it's more a max count of controllers allowed. To approach the issue, I waited until the Wednesday meeting where I could talk with Lari, my mentor, and Colby, the discoverer and creator of the issue above. We discussed how the changes would have to be handled, what ways it could be fixed, and the possible problems that fall into editing GController itself.

The route we chose for this fix was a rename. Simple enough. However, we needed to correct parts of the code that were using the macro as a max index. The rename would also modify three macros: `G_MAX_CONTROLLER_INDEX`, `G_MAX_XBOX_CONTROLLER_INDEX`, and `G_MAX_XBOX_CONTROLLER_INDEX_XBOXONE`. For each of these macros, we will rename them so that `INDEX` is replaced with `COUNT`. We have to update these changes throughout the source code and ensure that they are being used properly. There are some problems with the changes being made, however.

Firstly, testing any changes to the code is going to be a process. Since I only have one controller, an Xbox One controller, I can't test the functionality of what's being edited. This testing has to be done by Lari, as he has four controllers that can be used. Secondly, we have to modify some logic in the code, primarily the array bound checking that is being done for `GetState` and `IsConnected`. These checks are currently using the old macros, and they're also using them incorrectly, checking one over the array length (due to zero-based indexing). Lastly, we have to ensure that it runs cross-platform. We can't do this through traditional means as we don't have the proper tooling for it. Our only workaround for this is the runners for GitLab, but they can't test physical controllers, so only the basics. This means every edit we do has to be thoroughly thought out and looked over before they are pushed. This is to ensure we don't indirectly introduce a bug to the codebase, only to repeat the process all over again.

So with the possible issues known, we approached multiple ways to fix them. One has already been mentioned, that being Lari testing the primary manual tests with controllers for Windows. This will cross out Windows and UWP from the list of platforms needing testing. Mac and Linux will have to be handled differently. For those platforms, we will focus more on ensuring code changes are light and efficient. The changes done for them have to be reviewed and have to be ensured to have no logical errors in them. It's not the best fix, but it's the best we have currently. Then, naturally, there are the runners, that will make sure the code compiles okay, and that nothing will slip through for a majority of other tests.

Another fix is the logic changes. I decided that after renaming the macros, I would introduce three new macros that have the old name, however, these macros will equal the max count by subtracting one for zero-based indexing. This means that max count exists for physical controller count, and max index is for array indexing. They are separated. Because of this, I can replace instances of index checking in the Win32, UWP, Mac, and Linux code with the new index macro. This will keep readability while also showing how both macros are intended to be used.

After these changes are done, we can reflect them in the actual code itself.

Then we also update the needed logic checks to better fit the new index macros.

You can probably tell that since the index macros contain the same name as the old count macros, we don't have to replace the macro name, more the logic itself so it is more in line with expected functionality. If the controller index is equal to the max index macro now, it is on the last controller in the array, compared to before where it needs to be greater than or equal to the macro. There was also worry with the old macros since the index checks could be checking one over the array's size, possibly causing overflowing in the library code itself, which is no good. The replacements and logic changes here feel more grounded, and ensure that everything is being used for proper purpose.

With all of the changes done, the main thing to do now is await feedback, have manual testing done, and ensure that all of the code is correct by code reviewing. Once these are handled, then the bug fix can be merged, and I can move on to the next bug on the list. The general workflow for this was very easy, and I didn't find it all too hard to get into. After implementing my first feature, I understand much more about how this library functions, what each class does, and how the code is compiled and formed. After getting a grasp of everything needed, I find the bug fixing to be more of a cooldown, especially in comparison to the nearly 2,600 lines of code I added to tests alone for the MSAA implementation.