Machine learning
There is a large dataset for training Large Language Model’s called The Pile. It can be found here. It is an 825 GiB open-source dataset including books, github repositories, webpages, chat logs, and medical, physics, math, computer science, and philosophy papers.
Citation
If you find this work useful, please cite it as:
@article{yaltirakli,
title = "2023 02 28",
author = "Yaltirakli, Gokberk",
journal = "gkbrk.com",
year = "2025",
url = "https://www.gkbrk.com/journal/2023-02-28"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation Gokberk Yaltirakli, "2023 02 28", March, 2025. [Online]. Available: https://www.gkbrk.com/journal/2023-02-28. [Accessed Mar. 19, 2025].
APA Style Yaltirakli, G. (2025, March 19). 2023 02 28. https://www.gkbrk.com/journal/2023-02-28
Bluebook Style Gokberk Yaltirakli, 2023 02 28, GKBRK.COM (Mar. 19, 2025), https://www.gkbrk.com/journal/2023-02-28