# Underfolder By default, pipelime reads and writes data in the *Underfolder* format, a flexible file-system based dataset storage format. The main benefits of using the underfolder format are: - Flexibility: one single format for pretty much all the datasets that you will ever use. - Readability: no need to use special viewers to access the dataset, all data is stored as separate files with common extensions, making it easy to manually inspect. - Low disk usage: pipelime takes advantage of hard-links and item sharing to minimize disk usage. An Underfolder **dataset** is a collection of samples. A **sample** is a collection of items. An **item** is a unitary block of data, i.e., a multi-channel image, a python object, a dictionary and more. The file system structure is here summarized: ![underfolder structure](../images/underfolder.png "underfolder structure") Any valid underfolder dataset must contain a subfolder named `data` with samples and items. Also, *global shared* items can be stored in the root folder. All items are stored as separate files, with a specific naming rule: ![naming convention](../images/naming.png "naming convention") Where: * `$ID` is the sample index, must be a unique integer for each sample. * `ITEM` is the item name. * `EXT` is the item extension. We currently support many common file formats and others can be added by users: * `.png`, `.jpeg/.jpg/.jfif/.jpe`, `.bmp` for images * `.tiff/.tif` for multi-page images and multi-dimensional numpy arrays * `.yaml/.yml`, `.json` and `.toml/.tml` for metadata * `.txt` for numpy 2D matrix notation * `.npy` for general numpy arrays * `.pkl/.pickle` for picklable python objects * `.bin` for generic binary data Root files follow the same convention but they lack the sample identifier part, i.e., `$ITEM.$EXT`