Everything you wanted to know (and more) about PyTorch tensors
Before information can be fed to artificial neural networks (ANN), it needs to be converted to a form ANN can process: floating point numbers. Indeed, you don't pass a sentence or an image through an ANN; you input numbers representing a sequence of words or pixel values.
All these floating point numbers need to be stored in a data structure. The most suited structure is multidimensional (to hold several layers of information) and since all data is of the same type, it is an array.
Python already has several multidimensional array structures—the most popular of which being NumPy's ndarray—but the particularities of deep learning call for special characteristics: ability to run operations on GPUs and/or in a distributed fashion, as well as the ability to keep track of computation graphs for automatic differentiation.
The PyTorch tensor is a Python data structure with these characteristics that can also easily be converted to/from NumPy's ndarray and integrates well with other Python libraries such as Pandas.
In this workshop, suitable for users of all levels, we will have a deep look at this data structure and go much beyond a basic introduction. In particular, we will:
- see how tensors are stored in memory
- look at the metadata which allows this efficient memory storage
- cover the basics of working with tensors (indexing, vectorized operations...)
- move tensors to/from GPUs
- convert tensors to/from NumPy ndarrays
- see how tensors work in distributed frameworks
- see how linear algebra can be done with PyTorch tensors
Things to do before arriving.
If you want to follow along the hands-on part of this workshop (which is totally up to you), you will need up to date versions of:
- PyTorch (see https://pytorch.org/get-started/locally/ for installation)
- a terminal emulator (Linux and MacOS probably already have one, Windows users can install the free version of MobaXterm (see https://mobaxterm.mobatek.net/download.html for installation)
Because we will cover a lot of material, the pace will be brisky without much time for troubleshooting, so you might also choose to simply watch and practice at your own pace after the workshop.
About the presenters:
Marie-Helene Burle Prior to entering the realm of computing, Marie-Helene Burle spent 15 years roaming the globe from the High Arctic to uninhabited Sub-Antarctic islands or desert tropical atolls, conducting bird and mammal research (she calls those her "years running after penguins"). As a PhD candidate in behavioural and evolutionary biology at Simon Fraser University, she "fell" into Emacs, R, and Linux. This turned Marie into an advocate for open source tools and improved computing literacy for all, as well as better coding practices and more reproducible workflows in science. She started to contribute to the open source community, became a Software and Data Carpentry Instructor, and worked at the SFU Research Commons providing programming support to researchers. She is thrilled to be continuing in this direction with HPC and new languages at WestGrid. When not behind a computer, Marie loves reading history books and looking for powder in the British Columbia backcountry on skis.
If you have any questions, concerns, or accessibility needs please email [firstname.lastname@example.org].
To keep up-to-date with all of the workshops, consults, and events subscribe to the UBC Library Research Commons monthly newsletter.
- Thursday, January 27, 2022
- 1:00pm - 2:30pm
- Marie-Helene Burle, WestGrid