We will need to use Reverse Mode Autodiff
- Tensor
- Each Tensor should have a list of children (i.e. the previous tensors)
- Tensors are connected to their child by an Operation
- Tensors that contain weights will need to be able to accumulate gradients.
- Operation
- Device
- CPU
- Metal
- Autodiff lecture https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf
- Original Oldschool Autograd: https://github.com/HIPS/autograd
- Simplified Autodiff: https://github.com/mattjj/autodidact