- 05 Nov, 2020 1 commit
-
-
davean authored
Since the sha1 of the output didn't match the C++ version we started investigating. We noticed clang++ and g++ actually produced different sha1s, but more importantly for us, we found that the unincrimented depth was being used in one branch. Fixing that, we can generate an image with an identical sha1 to the clang++ version so we are confident in saying that we are doing the same thing in the same way as the C++ code. Generating equal sha1s is sensative to numerical inaccuracy and generating identical sha1s is more reliable < 150x150. A ray-by-ray analysis shows some rays can differ in the last decimal digit of the Double, and we conjecture that this same floating point instability accounts for the g++/clang++ hash difference.
-
- 03 Nov, 2020 16 commits
-
-
davean authored
We used +Inf to match the Maybeness, but thats not what the C++ code actually origionally did. It set 1e20 s the horizon, and if we change our Haskell code to match, our instructions take fewer clock cycles.
-
davean authored
The LLVM backend can pick up assembly level optimizations that GHC has missed. Particularly for numerical heavy code, which this is. It is not universally better than the default GHC backend though, so use with some consideration.
-
davean authored
Strings are slow and the code was even using a slow conversion path to strings.
-
davean authored
Its clear that f will consume these, so making it strict can have some benefit. The problem is the right level here. One's first take might be to set the parameters of 'f' to be strict, and that can have benefits, but it leaves a lot on the table. Moving to and unboxed tuple performs the computation faster but introduces copying. Out 'T' datatype here has the right balance, it makes it clear the data is strict but doesn't copy.
-
davean authored
This removes the list and this a potential level of indirections and branches. Sadly GHC did not do this for us even though the list was static.
-
davean authored
Our innermost functions are of critical importance. Here we remove a Maybe which significantly reduces the boxing (which could have been mitigated with a StrictMaybe) and the cases. Since a Ray that fails to intersect something can be said to intersect at infinity, Double already actually covers the structure at play. This also reduces allocation.
-
davean authored
Here we go through the program and apply some considered strictness to it. Forcing computation we don't need to do, or often forcing it to far before we need it is a loss. When we put a bang pattern in, GHC doesn't move it so we have to hand float it to an appropriate location.
-
davean authored
Some inspection shows us which bang patters are actually carrying the weight. Most are unnecessary.
-
davean authored
This does make a difference but a small one. The question is which ones matter.
-
davean authored
By removing mutability we remove a lot of barrier to optimization. There is the potential for fusion that was formerly blocked by references and the mutable vector, in addition to the passes on y being independent if we ever wanted to parallelize procesing. Reordering the order we do 'y's in allows us to not use an intermediary structure at all and directly generate the result. Since we're generating results directly, we just produce a list, and don't bother with the 'i' variable for placement, which means we want to reorder which y rows we generate first. So we generate them in the reverse order now (but this does potentially change edge cases and parameter validation should be added to avoid any issues with degenerate perameters).
-
davean authored
Using C via FFI isn't inherently faster. C isn't always faster and there always exists some overhead when there is any impedance which almost always exists between runtimes. Haskell's C FFI is fairly light but there is still some book keeping that has to be done.
-
davean authored
This is a finicky optimization. At first it actually benched worse than the maximum version. The strictness on the binding is required for reliable performance, but it seemed sensible to only do that where it was actually used since its in a single branch and no longer lazy.
-
davean authored
While we weren't able to unpack Refl in the last step because it is a sum type, we can if we use a trick modeled after an C Enum. The use of COMPLETE is unsafe here because other values could be constructed. For this small example though we rely on the fact that we create values of Refl via the patterns.
-
davean authored
This optimization removes indirection and laziness. We want to leave the Vecs in Ray as references though to avoid copying and allow more optimizations.
-
davean authored
Knowing what functions are used how enables many optimizations that could otherwise would be detrimental or inapplicable.
-
davean authored
Additionally for clarity, vector operators have been made infix and Foreign.C.Types newtype wrappers have been replaced with their Haskell names.
-