domingo, 6 de março de 2016

Games in C++ can't be that hard, right? - Shared Pointers and Ownership.

        C++ was my favorite language back then, at college, and even before. Fun were the days where Segmentation Fault wouldn't immediately translate to time or money wasted around, nor unhappy clients contacting the team. Right after graduating I've started using C# with Unity 3d and got into the Garbage Collected world – and I've stayed with C# from the very then.

        I've decided to start a simple C++ game on my free time. It can't be that hard, right? And the game will run a lot faster, right? Actually, both are wrong. C++ can be very hard if you aren't used to think in terms memory management, resources ownership and mapping, and even when and how their memory will be released. Even with smart pointers and RAII, everything can eventually go south. In fact, even in Garbage Collected environments with RAII and everything else, things WILL eventually go south... And about being faster, even if the compiled code do have raw speed in terms of execution time, a lot more often than not, slowness is caused by design, code architecture, algorithmic decisions and so on. The end result is that in hands of good programmers, it's possible that games in C# or Java run faster simply because they're easier to profile, debug and refactor – and refactoring is as time demanding as it is need for any project survival as it grows.

        Enough about that! I've decided to use c++ and so I've started searching around about what good practices there are when using the "new" standard smart pointers. "New" because they've already existed in Boost since like ever. Then I found this amazing stack overflow answer, by David (quoting here):

Personally, this is how I (more or less) do it:
  • unique_ptrs are for sole ownership
  • raw pointers mean whoever gave me the raw pointer guarantees the lifetime of that object to match or exceed my lifetime.
  • shared_ptrs are for shared ownership
  • weak_ptrs are for when a system wants to check if the object still exists before using it. This is rare in my code since I find it cleaner to have a system guarantee the lifetime of anything it passes it's subsystems (in which case I use a raw pointer)
By far I use more unique_ptrs than shared_ptrs, and more raw pointers than weak pointers.

        These guidelines are indeed very accurate and simple, as, in my opinion, smart pointers greatest virtue is showing intention over “simply” managing memory. There are, of course, exceptions where those guidelines can't be properly followed, but exceptions exists for everything in the coding world. Looking at smart pointers types, we can see that there's an immediate synergy between unique_ptr and raw pointers, and shared_ptr and weak_ptr. It's simple, right? Can't be that hard. Again, this turns out to be wrong. Not because of the guidelines nor the pointers, but because of the problem that came way before them: resource ownership.

        I've structure a simple Entity-Components-System for a game. I want Entities to have a list of components they hold, but I want actual components data to be stored somewhere else, in a ComponentsContainer class. The reason is that I want my systems to be able to iterate over all components of a single type easily.

        I bet that didn't make any sense, right? Well, I've imposed a few restrictions on each part of this structure:

  • [c1] Components can have methods, but "high level" methods such as "Update" or "Render" are desired to be in a System, which will iterate over all components executing their logic.
  • [c2] Components must be able to store references to other components, of the same or any other entity. They must have means of easily checking if that component still exists.
  • [c3] Components can be destroyed anytime, by any component or System.
  • [c4] Components should be automatically deleted if no one references them, if possible.
  • [c5] Components are owned by a single entity.

  • [e1] They have a unique identifier
  • [e2] Must not have logic other than helping methods
  • [e3] They uniquely own their components

  • [s1] Systems have the responsibility of iterating over components of a single type (the one that the system works with), executing logic, rendering, or whatever else it is supposed to do (i.e spatial partitioning for another system, such as collision testing)

        There are problems with [s1], [e3], [c2] and [c5].

        Because of [c2], it is desired that the entities own each component by means of a shared_ptr, even if they are the unique owners. With this, references to any component can be weak_ptr, and we can check whether they (the referenced components) have been deleted or not. Alternatively, using a unique_ptr would mean our references are now raw pointers. By the guidelines above, a raw pointer indicates that the resource pointed to will outlive the component, which is not guaranteed because of [c3]. With shared_ptr, actually, the solution for this is kind of straight away: entities must not ever allow another shared_ptr share the component, pushing us back to the original "unique" ownership.

        Now, [s1] says that we need means to iterate over all components of a single type. It is possible to achieve this by simply iterating over all entities, searching for components of a specific type. There are some ways of achieving this, such as storing a matrix of entities X components, etc, but I personally don't like the concept of checking if an entity has a component in order for us to update it, in a System. If I want to iterate over all components of a type, it does not mean I am also interested in checking if they exists. Checking for this in Systems might be made later, as a need for deep refactoring.

        Trying to not over-engineer, the components storing can be moved out of the entities, into a data container. A data container will also make some stuff a little bit simpler, such as factories and dependency injection, if I decide on them later. I want these characteristics:

  • [d1] They must work with ranged for loops. Come on, that's cool! Every iteration must be over a valid component (in the context of allocated and alive)
  • [d2] They do not need to worry about specific alternative accessing methods. For example, methods to allow accessing specific elements of an octree is too specific. The system will have to know the container type anyway, to do this effectively.
  • [d3] They must not force any specific type of structuring. This means that they might be used as vectors, lists, maps, trees, octrees, etc
  • [d4] They must be able to store data contiguously
  • [d5] Reordering elements should not break any references
  • [d6] Data not used by anyone should be deleted, to avoid leaks.

        The rule [d4] is a critical one. It is listed here to avoid the need of some refactoring later. [d4] together with [d1] permits intuitive (for the System) sequential access over contiguous memory data. Even if I don't want to implement this right now (and I dont), this will inevitably become an important optimization later, as it allows for cache-friendly access.

        With the data container, the components won't be stored in entities anymore, although entities still are their unique owners. Simple? No. There is a problem here. If the data is contiguous, then it's very likely that the components are stored by value in an array. We just can't delete it. However, were it in another data structure, such as tree of elements in the heap, then deleting would be a must. Fortunately, smart_ptr allows us to pass a function – better yet, a lambda - to use as deleting logic. We can pass this complexity back to the container, making it create the new component for us and returning a shared_ptr with the deleter set.

        However, it turns out that smart pointers aren't enough anymore, because of [d5]. If the container reorders the elements, how am I supposed to update the references in the entities? They need to point to something else, instead of the data itself: maybe a handle – and then now we have a smart pointer to a handle. It is kind of like de-referencing twice, to access the data we want.

        Since we have arrived at handles, now it is hard to let go of them. They can still exists even if the resourced they "handle" is not around anymore. If we can update a handle when the resource is deleted, we can make it either return nothing, or return something to expose the error. For example, a handle to a sound effect could return a default error-like sound when accessed after the original sound has been unloaded. This means allowing the data container being able to unload the components even if they are still being used by other entities. The use for this sounds dubious for components, but it is interesting if we can make the container generic.

        So, how to structure everything with handles? I have no idea. I'll be thinking about it and then make a new post in the future.

Nenhum comentário:

Postar um comentário