1. 05 Nov, 2020 1 commit
    • davean's avatar
      Fix difference with C++ version. · 9ee060dc
      davean authored
      Since the sha1 of the output didn't match the C++ version we
      started investigating. We noticed clang++ and g++ actually
      produced different sha1s, but more importantly for us, we found
      that the unincrimented depth was being used in one branch. Fixing
      that, we can generate an image with an identical sha1 to the
      clang++ version so we are confident in saying that we are doing
      the same thing in the same way as the C++ code.
      
      Generating equal sha1s is sensative to numerical inaccuracy and
      generating identical sha1s is more reliable < 150x150. A ray-by-ray
      analysis shows some rays can differ in the last decimal digit of the
      Double, and we conjecture that this same floating point instability
      accounts for the g++/clang++ hash difference.
      9ee060dc
  2. 03 Nov, 2020 16 commits
    • davean's avatar
      Avoid CPU ieee754 slow paths. · d6347455
      davean authored
      We used +Inf to match the Maybeness, but thats not what the C++
      code actually origionally did. It set 1e20 s the horizon, and if
      we change our Haskell code to match, our instructions take fewer
      clock cycles.
      d6347455
    • davean's avatar
      Use LLVM backend. · 16c5641e
      davean authored
      The LLVM backend can pick up assembly level optimizations that GHC
      has missed. Particularly for numerical heavy code, which this is.
      It is not universally better than the default GHC backend though,
      so use with some consideration.
      16c5641e
    • davean's avatar
      Optimize file writing. · 4e141a75
      davean authored
      Strings are slow and the code was even using a slow conversion
      path to strings.
      4e141a75
    • davean's avatar
      Custom datatype for 'intersects' parameter passing. · 07d68dea
      davean authored
      Its clear that f will consume these, so making it strict can have
      some benefit. The problem is the right level here. One's first
      take might be to set the parameters of 'f' to be strict, and that
      can have benefits, but it leaves a lot on the table. Moving to
      and unboxed tuple performs the computation faster but introduces
      copying. Out 'T' datatype here has the right balance, it makes it
      clear the data is strict but doesn't copy.
      07d68dea
    • davean's avatar
      Hand unroll the fold in intersects. · 80931559
      davean authored
      This removes the list and this a potential level of indirections
      and branches. Sadly GHC did not do this for us even though the
      list was static.
      80931559
    • davean's avatar
      Remove Maybe from intersect(s). · 149bc9fa
      davean authored
      Our innermost functions are of critical importance. Here we remove
      a Maybe which significantly reduces the boxing (which could have
      been mitigated with a StrictMaybe) and the cases.
      
      Since a Ray that fails to intersect something can be said to
      intersect at infinity, Double already actually covers the
      structure at play.
      
      This also reduces allocation.
      149bc9fa
    • davean's avatar
      Strategic application of strictness. · 7911c91f
      davean authored
      Here we go through the program and apply some considered strictness
      to it. Forcing computation we don't need to do, or often forcing it
      to far before we need it is a loss. When we put a bang pattern in,
      GHC doesn't move it so we have to hand float it to an appropriate
      location.
      7911c91f
    • davean's avatar
      Reduce to only useful strictnesses. · 361eb1e1
      davean authored
      Some inspection shows us which bang patters are actually carrying
      the weight. Most are unnecessary.
      361eb1e1
    • davean's avatar
      Set everything in smallpt to be strict. · 5caf84c7
      davean authored
      This does make a difference but a small one. The question is which
      ones matter.
      5caf84c7
    • davean's avatar
      Remove mutability. · eb9a3c19
      davean authored
      By removing mutability we remove a lot of barrier to optimization.
      There is the potential for fusion that was formerly blocked by
      references and the mutable vector, in addition to the passes on
      y being independent if we ever wanted to parallelize procesing.
      
      Reordering the order we do 'y's in allows us to not use an
      intermediary structure at all and directly generate  the result.
      Since we're generating results directly, we just produce a list,
      and don't bother with the 'i' variable for placement, which means
      we want to reorder which y rows we generate first. So we generate
      them in the reverse order now (but this does potentially change
      edge cases and parameter validation should be added to avoid any
      issues with degenerate perameters).
      eb9a3c19
    • davean's avatar
      Convert erand48 to pure Haskell. · 5704ace8
      davean authored
      Using C via FFI isn't inherently faster. C isn't always faster and
      there always exists some overhead when there is any impedance which
      almost always exists between runtimes. Haskell's C FFI is fairly
      light but there is still some book keeping that has to be done.
      5704ace8
    • davean's avatar
      Change from maximum on a list to max. · 66c39a61
      davean authored
      This is a finicky optimization. At first it actually benched worse
      than the maximum version. The strictness on the binding is required
      for reliable performance, but it seemed sensible to only do that
      where it was actually used since its in a single branch and no
      longer lazy.
      66c39a61
    • davean's avatar
      Use a pattern synonym to unpack Refl in Sphere. · fc364302
      davean authored
      While we weren't able to unpack Refl in the last step because it
      is a sum type, we can if we use a trick modeled after an C Enum.
      The use of COMPLETE is unsafe here because other values could be
      constructed. For this small example though we rely on the fact
      that we create values of Refl via the patterns.
      fc364302
    • davean's avatar
      Mark entries of Ray and Sphere as UNPACK and Strict. · f3819f8a
      davean authored
      This optimization removes indirection and laziness.
      We want to leave the Vecs in Ray as references though
      to avoid copying and allow more optimizations.
      f3819f8a
    • davean's avatar
      Restrict the export list to 'main'. · 76a82d53
      davean authored
      Knowing what functions are used how enables many optimizations
      that could otherwise would be detrimental or inapplicable.
      76a82d53
    • davean's avatar
      Warning cleanup import of smallpt.hs and smallpt.cpp. · 0e242c58
      davean authored
      Additionally for clarity, vector operators have been made infix
      and Foreign.C.Types newtype wrappers have been replaced with
      their Haskell names.
      0e242c58