Currently, our approach for some of the __getitem__ methods is inefficient. For example, column subsetting for CategoricalMatrix converts the full matrix to a csc_matrix.
Here's a list to update with potential improvements:
DenseMatrix: nothing to do. Already optimized with np.ndarray
SparseMatrix: nothing to do. Already optimized with sps.csc_matrix
CategoricalMatrix:
- row: nothing to do, trivial
- column: create a SparseMatrix with only the subset of columns/rows selected
SplitMatrix:
- Test thoroughly all the potential ways to index
StandardizedMatrix
- Not sure if columns subset with only one row works
- Write docstrings for expected behavior
- Write tests covering all expected behavior
Currently, our approach for some of the
__getitem__methods is inefficient. For example, column subsetting forCategoricalMatrixconverts the full matrix to a csc_matrix.Here's a list to update with potential improvements:
DenseMatrix: nothing to do. Already optimized withnp.ndarraySparseMatrix: nothing to do. Already optimized withsps.csc_matrixCategoricalMatrix:SplitMatrix:StandardizedMatrix