Event box

What format to choose to save your data

Abstract:
Which file format should you use when saving your research dataset? Besides the obvious question of how to encode your data structures in a file, you might also want to consider portability (the ability to write/read data across different operating systems and different programming languages and libraries), the inclusion of metadata (data description), I/O bandwidth and file sizes, compression, and the ability to read/write data in parallel for large datasets. In this in-person introductory workshop, we will cover all these aspects starting from the very basic file formats and ending with scalable scientific dataset formats. We will cover CSV, JSON, YAML, XML, BSON, VTK, NetCDF, and HDF5, using both structured and unstructured datasets and demoing all examples in Python.

Prerequisites: none.

Software installation:
In the demos, we will use Python and quite a few Python libraries to write into all these file formats. If you are comfortable with running Python and installing Python libraries on your computer, we will mention the libraries as we go during the workshop. For everyone else, we will provide access to a remote system with all these libraries installed -- all you'll need is a browser.

Date:: Friday, November 4, 2022
Time:: 1:00pm - 2:30pm
Room:: 548 and 552 - Presentation Room
Location:: Koerner Library
Categories:: Data Research Commons Research Data Management
Presenter(s):: Marie-Helene Burle and Alex Razoumov, Digital Research Alliance of Canada

Event box

What format to choose to save your data

Get rid of it

Presenter(s)

More Events Like This...