When we worked on the new FDR task parsing code (issue1146, which we resolved today, but the actual work was done two sprints ago), we discussed some possible performance improvements that we left as future work.
It looks like we didn't document these at the time, but they were along the following lines, based on input from ChatGPT and ordered roughly by decreasing order of expected impact. The following list is ChatGPT's suggestions edited by me, in decreasing order of importance:
1. Disable sync with C stdio
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
2. Avoid formatted extraction (operator>>) in tight loops
int x;
while (std::cin >> x) { ... }
Instead read raw data, parse ourselves.
3. Prefer block/line-based reading
Instead of token-by-token extraction, read larger chunks:
std::string line;
while (std::getline(std::cin, line)) {
// parse line manually
}
4. Use std::from_chars for numeric parsing
Once you have raw text:
int value;
auto [ptr, ec] = std::from_chars(line.data(), line.data() + line.size(), value);
Supposedly much faster than stringstream or operator>>
5. Avoid std::stringstream
std::stringstream ss(line);
ss >> x;
Replace with std::from_chars or manual parsing.
6. Consider manual pointer-based parsing
For maximum performance, especially with simple formats:
const char* p = buffer;
while (*p) {
int x = 0;
while (*p >= '0' && *p <= '9') {
x = x * 10 + (*p - '0');
++p;
}
// use x
++p;
}
According to ChatGPT, common in high-performance parsers.
7. Increase input buffer size
static char buffer[1 << 20];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));
8. Use lower-level I/O if needed
Instead of iostream, use fread or fgets, or mmap for very large data
9. Avoid unnecessary copying
Watch for patterns like: repeatedly constructing strings, slicing substrings
Use pointers and string views instead.
10. Profile the parsing, not just the reading
|