Created on 2026-03-23.14:23:43 by masataro, last changed by gabi.
SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json.
|
| msg12078 (view) |
Author: gabi |
Date: 2026-04-24.12:09:49 |
|
We discussed this in the developer meeting yesterday and would be open to phasing out the SAS format in favor of a JSON file (temporarily supporting both variants).
Nobody wanted to work on this immediately, also because there is some ongoing refactoring of the parser (issue1218) and we intend to revise the translator output format (issue371). It would be good to have only a single revision with the (optional) JSON format being a part of it.
Maybe there is a chance to get all of this done as part of the next sprint?
|
| msg12066 (view) |
Author: malte |
Date: 2026-04-16.15:02:24 |
|
On Discord, Christian mentioned that as a first step, we could *add* JSON output as a translator option without immediately support it in the search component (and then getting rid of the old format).
|
| msg12054 (view) |
Author: masataro |
Date: 2026-04-02.15:45:20 |
|
for speed however, https://github.com/simdjson/simdjson
|
| msg12053 (view) |
Author: masataro |
Date: 2026-04-02.15:41:27 |
|
I've used https://github.com/nlohmann/json in 2016 and it was feature-complete & quite pleasant to use.
|
| msg12051 (view) |
Author: malte |
Date: 2026-03-23.14:59:49 |
|
I like this idea. Do you have good recommendations for JSON libraries for C++?
|
|
| Date |
User |
Action |
Args |
| 2026-04-24 12:09:49 | gabi | set | messages:
+ msg12078 nosy:
+ gabi |
| 2026-04-16 15:02:24 | malte | set | messages:
+ msg12066 |
| 2026-04-10 15:29:51 | haz | set | nosy:
+ haz |
| 2026-04-02 15:45:20 | masataro | set | messages:
+ msg12054 |
| 2026-04-02 15:41:27 | masataro | set | messages:
+ msg12053 |
| 2026-03-23 14:59:49 | malte | set | messages:
+ msg12051 nosy:
+ malte, jendrik, masataro status: unread -> chatting |
| 2026-03-23 14:44:04 | masataro | set | summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. |
| 2026-03-23 14:42:10 | masataro | set | summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work, including my latplan but also recent llm-based ones, produces PDDL so that they are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may greatly improve the performance of a neural planning solver written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones.
Using a json-based format also greatly improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridges python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation costis low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.
1. better interoperability with numerous other, open-ended applications.
While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.
Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).
Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).
It also lowers the hurdle for implementing other grounders, including recent Tarski.
2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.
Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.
3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf
Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments.
4. Implementation cost is low.
The current SAS format is a simple intermediate file format that easily translates to an equivalent json. |
| 2026-03-23 14:23:43 | masataro | create | |
|