What format to choose to save your data
Which file format should you use when saving your research dataset? Besides the obvious question of how to encode your data structures in a file, you might also want to consider portability (the ability to write/read data across different operating systems and different programming languages and libraries), the inclusion of metadata (data description), I/O bandwidth and file sizes, compression, and the ability to read/write data in parallel for large datasets. In this in-person introductory workshop, we will cover all these aspects starting from the very basic file formats and ending with scalable scientific dataset formats. We will cover CSV, JSON, YAML, XML, BSON, VTK, NetCDF, and HDF5, using both structured and unstructured datasets and demoing all examples in Python.
In the demos, we will use Python and quite a few Python libraries to write into all these file formats. If you are comfortable with running Python and installing Python libraries on your computer, we will mention the libraries as we go during the workshop. For everyone else, we will provide access to a remote system with all these libraries installed -- all you'll need is a browser.
- Friday, November 4, 2022
- 1:00pm - 2:30pm
- 548 and 552 - Presentation Room (Combined)
- Koerner Library
- Marie-Helene Burle and Alex Razoumov, Digital Research Alliance of Canada