Datasets
Surogate supports multiple dataset formats configured under datasets (and optionally validation_datasets).
Dataset type can be:
text(pretraining / raw text)instruction(instruction/output)conversation(chat messages)auto(auto-detect)
See the full schema and examples in the config reference.