Just saw this open dataset, would be nice to integrate it more directly: https://www.together.xyz/blog/redpajama