This Summer I am doing work on Herbie with Pavel Panchekha and Zach Tatlock. In particular I am working to classify floating point expressions to come up with a taxonomy describing why some expressions are inaccurate and in what ways, as well as to improve Herbie after figuring out what methods improve the types of expressions that floating point tools currently struggle on. I will be working extensively with Herbie on this, using various parts of its engine to analyze these expressions and see which parts of Herbie improve them.
Herbie has a fairly extensive set of tests that we use for debugging and performance metrics, including a set of “challenge” problems which Herbie and/or other existing floating point tools don’t handle well. These problems can be found here. Herbie flat out fails to get any result for several of these tests, and even on the ones it runs on, it is only able to make a minor improvement, or none at all. Additionally, there are several other expressions Herbie makes significant improvement on, but which still have significant error such as the compound interest expression.
Herbie employs several different methods for improving floating point expressions including expanding expressions in a Taylor Series and branching expressions into different regimes. One thing that we don’t know about Herbie is how much each strategy improves these expressions and how often it will make significant improvement with each strategy. More generally, we don’t know what kinds of equations can be improved by each strategy, or which types of expressions can be improved by existing strategies in any tool. It’s entirely possible that some classes of floating point expressions simply can’t be improved and other strategies like using fixed point or arbitrary precision calculations might be the best choice. There is a lot that we don’t know about floating point expressions, and I hope to answer some of those questions over the Summer.
Two problems that I find particularly interesting are Toniolo and Linder Equation 13 () and Area of a Triangle (). From a cursory glance, it seems like these are best solved by better regimes are better input sampling respectively, so those will be the first to approaches that I investigate.