Issue 1213: deprecating the SAS file format in favor of JSON for a better interoperability

Title	deprecating the SAS file format in favor of JSON for a better interoperability
Priority	wish	Status	chatting
Superseder		Nosy List	gabi, haz, jendrik, malte, masataro
Assigned To		Keywords
Optional summary	SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits. 1. better interoperability with numerous other, open-ended applications. While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables. Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them). Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic). It also lowers the hurdle for implementing other grounders, including recent Tarski. 2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer. Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing. 3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable. 4. Implementation cost is low. The current SAS format is a simple intermediate file format that easily translates to an equivalent json.

Created on 2026-03-23.14:23:43 by masataro, last changed by gabi.

SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits.

1. better interoperability with numerous other, open-ended applications.

While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables.

Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them).

Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic).

It also lowers the hurdle for implementing other grounders, including recent Tarski.

2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer.

Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing.

3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf

Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable.

4. Implementation cost is low.

The current SAS format is a simple intermediate file format that easily translates to an equivalent json.

Messages
msg12078 (view)	Author: gabi	Date: 2026-04-24.12:09:49
We discussed this in the developer meeting yesterday and would be open to phasing out the SAS format in favor of a JSON file (temporarily supporting both variants). Nobody wanted to work on this immediately, also because there is some ongoing refactoring of the parser (issue1218) and we intend to revise the translator output format (issue371). It would be good to have only a single revision with the (optional) JSON format being a part of it. Maybe there is a chance to get all of this done as part of the next sprint?
msg12066 (view)	Author: malte	Date: 2026-04-16.15:02:24
On Discord, Christian mentioned that as a first step, we could add JSON output as a translator option without immediately support it in the search component (and then getting rid of the old format).
msg12054 (view)	Author: masataro	Date: 2026-04-02.15:45:20
for speed however, https://github.com/simdjson/simdjson
msg12053 (view)	Author: masataro	Date: 2026-04-02.15:41:27
I've used https://github.com/nlohmann/json in 2016 and it was feature-complete & quite pleasant to use.
msg12051 (view)	Author: malte	Date: 2026-03-23.14:59:49
I like this idea. Do you have good recommendations for JSON libraries for C++?

History
Date	User	Action	Args
2026-04-24 12:09:49	gabi	set	messages: + msg12078 nosy: + gabi
2026-04-16 15:02:24	malte	set	messages: + msg12066
2026-04-10 15:29:51	haz	set	nosy: + haz
2026-04-02 15:45:20	masataro	set	messages: + msg12054
2026-04-02 15:41:27	masataro	set	messages: + msg12053
2026-03-23 14:59:49	malte	set	messages: + msg12051 nosy: + malte, jendrik, masataro status: unread -> chatting
2026-03-23 14:44:04	masataro	set	summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits. 1. better interoperability with numerous other, open-ended applications. While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables. Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them). Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic). It also lowers the hurdle for implementing other grounders, including recent Tarski. 2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer. Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing. 3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. 4. Implementation cost is low. The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits. 1. better interoperability with numerous other, open-ended applications. While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables. Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them). Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic). It also lowers the hurdle for implementing other grounders, including recent Tarski. 2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer. Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing. 3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. Standardizing the output to json may make this next step more approacheable. 4. Implementation cost is low. The current SAS format is a simple intermediate file format that easily translates to an equivalent json.
2026-03-23 14:42:10	masataro	set	summary: SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits. 1. better interoperability with numerous other, open-ended applications. While many work, including my latplan but also recent llm-based ones, produces PDDL so that they are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables. Allowing multivalue variables may greatly improve the performance of a neural planning solver written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Using a json-based format also greatly improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridges python dataclass and the corresponding json grammer decoder (pydantic). It also lowers the hurdle for implementing other grounders, including recent Tarski. 2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer. Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing. 3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. 4. Implementation costis low. The current SAS format is a simple intermediate file format that easily translates to an equivalent json. -> SAS file format has worked hard long enough. I propose replacing the current SAS format with JSON, with a proper JSON schema. There are multiple benefits. 1. better interoperability with numerous other, open-ended applications. While many work on action model learning, including my latplan but also recent llm-based ones, produce PDDLs so that their outputs are compatible with classical planners, it sometimes makes much more sense to model them as SAS with categorical random variables rather than propositional/Boolean/Bernoulli variables. Allowing multivalue variables may also improve the performance of the current not-so-great neural planning solvers written in some GPU frameworks. Let's admit, not being able to use the modern GPU frameworks is the biggest hurdle for implementing a GPU based planner. Python/pytorch is slow, but there are better JIT frameworks with minimal performance drawbacks, such as JAX (python) and Julia-based ones. Anyone with a sane engineering principle would first try them to see whether the proof of concept works before writing a custom CUDA kernel. But then there is a SAS format that makes it difficult for them to load. Removing such a hurdle one by one makes classical planning available to many who don't want to write C++ (although some people, including myself, have some feeling against them). Using a json-based format also improves the interoperability with constrained decoding for LLMs, a mechanism for restricting the output of LLMs using a user-specified Context Free Grammer to guarantee the specific output from the llm. There are multiple, increasingly sophisticated, fast decoders written in Rust (llguidance,xgrammer,outlines) and supporting libraries that bridge python dataclass and the corresponding json grammer decoder (pydantic). It also lowers the hurdle for implementing other grounders, including recent Tarski. 2. Better maintainance and faster processing by using an already available open source header-only parser & standard python json module for the writer. Existing json parsers received rigorous performance tuning, including SIMD processing for faster parsing. 3. Potential to accept multiple input formats in the future, including compact binary formats like protobuf Google's protocol buffer is a compact binary format that has the same basic feature as the json. Currently, some benchmark instances produce a sas file of hundreds of MBs of space, which pressures the cluster storage space if I want to cache them to reduce the runtime of the experiments. 4. Implementation cost is low. The current SAS format is a simple intermediate file format that easily translates to an equivalent json.
2026-03-23 14:23:43	masataro	create

Issue1213