.

Data science


Reading time: less than 1 minute

Data Science is extracting useful and actionable information out of structured and unstructured data.

Exploratory Data Analysis (EDA)

When you get a dataset, it’s a set of rows and columns. If it’s a supervised learning task, there are labels as well. But before you go straight to modeling, you should make yourself familiar with the data first.

Oftentimes, 1 hour spent looking at the data will be more useful than 1 hour spent tweaking model stuff. After all, garbage-in garbage-out, so you should try to put in something as clean as possible.

ydata-profiling

This is both a Python library and a command-line tool. The Python library can analyse Pandas dataframes, and the command-line tool can analyse CSV files.

The tool works a bit slowly, and the generated reports make your browser use a lot of RAM. But the analysis is very good and helpful.

You can run it on a CSV like this

uv run --python cpython-3.12.10-linux-x86_64-gnu --with ydata-profiling --with setuptools -- ydata_profiling data.csv report.html

This will read data.csv and output a report.html.

I needed to run it with Python 3.12 for some reason, it didn’t work with Python 3.14. Probably this will be fixed in the future.

The following pages link here

Citation

If you find this work useful, please cite it as:
@article{yaltirakli,
  title   = "Data science",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2025",
  url     = "https://www.gkbrk.com/data-science"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "Data science", July, 2025. [Online]. Available: https://www.gkbrk.com/data-science. [Accessed Jul. 14, 2025].
APA Style
Yaltirakli, G. (2025, July 14). Data science. https://www.gkbrk.com/data-science
Bluebook Style
Gokberk Yaltirakli, Data science, GKBRK.COM (Jul. 14, 2025), https://www.gkbrk.com/data-science

Comments

© 2025 Gokberk Yaltirakli