2023 02 28

Reading time: less than 1 minute

Machine learning

There is a large dataset for training Large Language Model’s called The Pile. It can be found here. It is an 825 GiB open-source dataset including books, github repositories, webpages, chat logs, and medical, physics, math, computer science, and philosophy papers.

Citation

If you find this work useful, please cite it as:

@article{yaltirakli,
  title   = "2023 02 28",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2025",
  url     = "https://www.gkbrk.com/journal/2023-02-28"
}

Not using BibTeX? Click here for more citation styles.

IEEE Citation

Gokberk Yaltirakli, "2023 02 28", June, 2025. [Online]. Available: https://www.gkbrk.com/journal/2023-02-28. [Accessed Jun. 30, 2025].

APA Style

Yaltirakli, G. (2025, June 30). 2023 02 28. https://www.gkbrk.com/journal/2023-02-28

Bluebook Style

Gokberk Yaltirakli, 2023 02 28, GKBRK.COM (Jun. 30, 2025), https://www.gkbrk.com/journal/2023-02-28

2023 02 28

Machine learning

Citation

Comments

Table of contents

Search

More links