Learning from planner performance

The planning community has amassed a large body of publicly available problems in a standardized input language and planners that accept the language. We seized this remarkable opportunity to collect data about how some of these planners perform on the benchmark problems. We analyzed the resulting data to learn about the state of the art in Classical planning.

Our analyses are retrospective, prescriptive and prospective. The first analyses are retrospective and prescriptive in that they characterize the problems and planners in terms of difficulty, diversity and trends over time. We statistically confirm that problem sets have become more difficult and that new planners are generally more capable. A visualization of planner success on domains shows how the domains distinguish performance. 

The second analyses automatically learn models of success and time for each planner. The models are constructed from easily extracted features of problems and domains and use off-the-shelf Machine Learning techniques. We find the models of success to be extremely accurate, but the models of time to be less so. They too are both retrospective and prescriptive in demonstrating the predictability of current planner performance.

In a third analysis, we apply the data to an existing explanatory model linking the relationship between the search space and planner performance. Our study validates previous results linking search topology with planner performance on a wider set of planners than the original study.

Finally, we fill in some gaps in observed performance of the benchmark problems by constructing new problems; these problems do turn out to be more challenging. This study of existing and new problems and planners is prescriptive and prospective in that the results should help guide researchers in comparatively evaluating their planners and suggest need for additional effort.

These analyses highlight the importance of problems in driving research in planning. We show how much can be accomplished with the available resources and point out how much more can be done by broadening the problems available and by learning from what has already been done.