Skip to content

Commit ada3c7f

Browse files
First release
1 parent 168a596 commit ada3c7f

File tree

11 files changed

+868
-2
lines changed

11 files changed

+868
-2
lines changed

LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2022 JavaScriptDude
3+
Copyright (c) 2022 Timothy C. Quinn <javascriptdude [at] protonmail.com>
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal
@@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1818
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1919
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2020
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE.
21+
SOFTWARE.

MANIFEST.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include README.md
2+
include LICENSE
3+
include pyproject.toml

README.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
## `multisort` - NoneType Safe Multi Column Sorting
2+
3+
Simplified multi-column sorting of lists of tuples, dicts, lists or objects that are NoneType safe.
4+
5+
### Installation
6+
7+
```
8+
python3 -m pip install multisort
9+
```
10+
11+
### Dependencies
12+
None
13+
14+
### Performance
15+
Average over 10 iterations with 500 rows.
16+
Test | Secs
17+
---|---
18+
cmp_func|0.0054
19+
pandas|0.0061
20+
reversor|0.0149
21+
msorted|0.0179
22+
23+
As you can see, if the `cmp_func` is by far the fastest methodology as long as the number of cells in the table are 500 rows for 5 columns. However for larger data sets, `pandas` is the performance winner and scales extremely well. In such large dataset cases, where performance is key, `pandas` should be the first choice.
24+
25+
The surprising thing from testing is that `cmp_func` far outperforms `reversor` which which is the only other methodology for multi-columnar sorting that can handle `NoneType` values.
26+
27+
### Note on `NoneType` and sorting
28+
If your data may contain None, it would be wise to ensure your sort algorithm is tuned to handle them. This is because sorted uses `<` comparisons; which is not supported by `NoneType`. For example, the following error will result: `TypeError: '>' not supported between instances of 'NoneType' and 'str'`.
29+
30+
### Methodologies
31+
Method|Descr|Notes
32+
---|---|---
33+
cmp_func|Multi column sorting in the model `java.util.Comparator`|Fastest for small to medium size data
34+
reversor|Enable multi column sorting with column specific reverse sorting|Medium speed. [Source](https://stackoverflow.com/a/56842689/286807)
35+
msorted|Simple one-liner designed after `multisort` [example from python docs](https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts)|Slowest of the bunch but not by much
36+
37+
38+
39+
### Dictionary Examples
40+
For data:
41+
```
42+
rows_dict = [
43+
{'idx': 0, 'name': 'joh', 'grade': 'C', 'attend': 100}
44+
,{'idx': 1, 'name': 'jan', 'grade': 'a', 'attend': 80}
45+
,{'idx': 2, 'name': 'dav', 'grade': 'B', 'attend': 85}
46+
,{'idx': 3, 'name': 'bob' , 'grade': 'C', 'attend': 85}
47+
,{'idx': 4, 'name': 'jim' , 'grade': 'F', 'attend': 55}
48+
,{'idx': 5, 'name': 'joe' , 'grade': None, 'attend': 55}
49+
]
50+
```
51+
52+
### `msorted`
53+
Sort rows_dict by _grade_, descending, then _attend_, ascending and put None first in results:
54+
```
55+
from multisort import msorted
56+
rows_sorted = msorted(rows_dict, [
57+
('grade', {'reverse': False, 'none_first': True})
58+
,'attend'
59+
])
60+
61+
```
62+
63+
Sort rows_dict by _grade_, descending, then _attend_ and call upper() for _grade_:
64+
```
65+
from multisort import msorted
66+
rows_sorted = msorted(rows_dict, [
67+
('grade', {'reverse': False, 'clean': lambda s:None if s is None else s.upper()})
68+
,'attend'
69+
])
70+
71+
```
72+
73+
### `sorted` with `reversor`
74+
Sort rows_dict by _grade_, descending, then _attend_ and call upper() for _grade_:
75+
```
76+
rows_sorted = sorted(rows_dict, key=lambda o: (
77+
reversor(None if o['grade'] is None else o['grade'].upper())
78+
,o['attend'])
79+
))
80+
```
81+
82+
83+
### `sorted` with `cmp_func`
84+
Sort rows_dict by _grade_, descending, then _attend_ and call upper() for _grade_:
85+
```
86+
def cmp_student(a,b):
87+
k='grade'; va=a[k]; vb=b[k]
88+
if va != vb:
89+
if va is None: return -1
90+
if vb is None: return 1
91+
return -1 if va > vb else 1
92+
k='attend'; va=a[k]; vb=b[k];
93+
if va != vb: return -1 if va < vb else 1
94+
return 0
95+
rows_sorted = sorted(rows_dict, key=cmp_func(cmp_student), reverse=True)
96+
```
97+
98+
99+
100+
### Object Examples
101+
For data:
102+
```
103+
class Student():
104+
def __init__(self, idx, name, grade, attend):
105+
self.idx = idx
106+
self.name = name
107+
self.grade = grade
108+
self.attend = attend
109+
def __str__(self): return f"name: {self.name}, grade: {self.grade}, attend: {self.attend}"
110+
def __repr__(self): return self.__str__()
111+
112+
rows_obj = [
113+
Student(0, 'joh', 'C', 100)
114+
,Student(1, 'jan', 'a', 80)
115+
,Student(2, 'dav', 'B', 85)
116+
,Student(3, 'bob', 'C', 85)
117+
,Student(4, 'jim', 'F', 55)
118+
,Student(5, 'joe', None, 55)
119+
]
120+
```
121+
122+
### `msorted`
123+
(Same syntax as with 'dict' example)
124+
125+
126+
### `sorted` with `reversor`
127+
Sort rows_obj by _grade_, descending, then _attend_ and call upper() for _grade_:
128+
```
129+
rows_sorted = sorted(rows_obj, key=lambda o: (
130+
reversor(None if o.grade is None else o.grade.upper())
131+
,o.attend)
132+
))
133+
```
134+
135+
136+
### `sorted` with `cmp_func`
137+
Sort rows_obj by _grade_, descending, then _attend_ and call upper() for _grade_:
138+
```
139+
def cmp_student(a,b):
140+
if a.grade != b.grade:
141+
if a.grade is None: return -1
142+
if b.grade is None: return 1
143+
return -1 if a.grade > b.grade else 1
144+
if a.attend != b.attend:
145+
return -1 if a.attend < b.attend else 1
146+
return 0
147+
rows_sorted = sorted(rows_obj, key=cmp_func(cmp_student), reverse=True)
148+
```
149+
150+
151+
### List / Tuple Examples
152+
For data:
153+
```
154+
rows_tuple = [
155+
(0, 'joh', 'a' , 100)
156+
,(1, 'joe', 'B' , 80)
157+
,(2, 'dav', 'A' , 85)
158+
,(3, 'bob', 'C' , 85)
159+
,(4, 'jim', None , 55)
160+
,(5, 'jan', 'B' , 70)
161+
]
162+
(COL_IDX, COL_NAME, COL_GRADE, COL_ATTEND) = range(0,4)
163+
```
164+
165+
### `msorted`
166+
Sort rows_tuple by _grade_, descending, then _attend_, ascending and put None first in results:
167+
```
168+
from multisort import msorted
169+
rows_sorted = msorted(rows_tuple, [
170+
(COL_GRADE, {'reverse': False, 'none_first': True})
171+
,COL_ATTEND
172+
])
173+
174+
```
175+
176+
177+
### `sorted` with `reversor`
178+
Sort rows_tuple by _grade_, descending, then _attend_ and call upper() for _grade_:
179+
```
180+
rows_sorted = sorted(rows_tuple, key=lambda o: (
181+
reversor(None if o[COL_GRADE] is None else o[COL_GRADE].upper())
182+
,o[COL_ATTEND])
183+
))
184+
```
185+
186+
187+
### `sorted` with `cmp_func`
188+
Sort rows_tuple by _grade_, descending, then _attend_ and call upper() for _grade_:
189+
```
190+
def cmp_student(a,b):
191+
k=COL_GRADE; va=a[k]; vb=b[k]
192+
if va != vb:
193+
if va is None: return -1
194+
if vb is None: return 1
195+
return -1 if va > vb else 1
196+
k=COL_ATTEND; va=a[k]; vb=b[k];
197+
if va != vb:
198+
return -1 if va < vb else 1
199+
return 0
200+
rows_sorted = sorted(rows_tuple, key=cmp_func(cmp_student), reverse=True)
201+
```
202+
203+
### Tests / Samples
204+
Name|Descr|Other
205+
---|---|---
206+
tests/test_msorted.py|msorted unit tests|-
207+
tests/performance_tests.py|Tunable performance tests using asyncio | requires pandas
208+
tests/hand_test.py|Hand testing|-

dev.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
PYTHONPATH=./src:${PYTHONPATH}

pyproject.toml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
[tool.poetry]
2+
name = "multisort"
3+
version = "0.1.0"
4+
description = "NoneType Safe Multi Column Sorting For Python"
5+
license = "MIT"
6+
authors = ["Timothy C. Quinn"]
7+
readme = "README.md"
8+
homepage = "https://pypi.org/project/multisort"
9+
repository = "https://github.com/JavaScriptDude/multisort"
10+
classifiers = [
11+
'Development Status :: 4 - Beta',
12+
'Environment :: Console',
13+
'Intended Audience :: Developers',
14+
'Operating System :: POSIX :: Linux',
15+
'Operating System :: POSIX :: BSD',
16+
'Operating System :: POSIX :: SunOS/Solaris',
17+
'Operating System :: MacOS :: MacOS X',
18+
'Programming Language :: Python :: 3 :: Only',
19+
'Programming Language :: Python :: 3.7',
20+
'Programming Language :: Python :: 3.8',
21+
'Programming Language :: Python :: 3.9',
22+
'Programming Language :: Python :: 3.10',
23+
'Topic :: Utilities',
24+
]
25+
26+
[tool.poetry.dependencies]
27+
python = "^3.7.9"
28+
29+
[tool.poetry.dev-dependencies]
30+
31+
[build-system]
32+
requires = ["poetry-core>=1.0.0"]
33+
build-backend = "poetry.core.masonry.api"

src/multisort/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .multisort import msorted, cmp_func, reversor

src/multisort/multisort.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#########################################
2+
# .: multisort.py :.
3+
# Simplified Multi-Column Sorting For Lists of records
4+
# Installation:
5+
# . pip install multisort
6+
# Author: Timothy C. Quinn
7+
# Home: https://pypi.org/project/multisort
8+
# Licence: MIT
9+
#########################################
10+
from functools import cmp_to_key
11+
cmp_func = cmp_to_key
12+
13+
14+
# .: msorted :.
15+
# spec is a list one of the following
16+
# <key>
17+
# (<key>,)
18+
# (<key>, <opts>)
19+
# where:
20+
# <key> Property, Key or Index for 'column' in row
21+
# <opts> dict. Options:
22+
# reverse: opt - reversed sort (defaults to False)
23+
# clean: opt - callback to clean / alter data in 'field'
24+
# none_first: opt - If True, None will be at top of sort. Default is False (bottom)
25+
class Comparator:
26+
@classmethod
27+
def new(cls, *args):
28+
if len(args) == 1 and isinstance(args[0], (int,str)):
29+
_c = Comparator(spec=args[0])
30+
else:
31+
_c = Comparator(spec=args)
32+
return cmp_to_key(_c._compare_a_b)
33+
34+
def __init__(self, spec):
35+
if isinstance(spec, (int, str)):
36+
self.spec = ( (spec, False, None, False), )
37+
else:
38+
a=[]
39+
for s_c in spec:
40+
if isinstance(s_c, (int, str)):
41+
a.append((s_c, None, None, False))
42+
else:
43+
assert isinstance(s_c, tuple) and len(s_c) in (1,2),\
44+
f"Invalid spec. Must have 1 or 2 params per record. Got: {s_c}"
45+
if len(s_c) == 1:
46+
a.append((s_c[0], None, None, False))
47+
elif len(s_c) == 2:
48+
s_opts = s_c[1]
49+
assert not s_opts is None and isinstance(s_opts, dict), f"Invalid Spec. Second value must be a dict. Got {getClassName(s_opts)}"
50+
a.append((s_c[0], s_opts.get('reverse', False), s_opts.get('clean', None), s_opts.get('none_first', False)))
51+
52+
self.spec = a
53+
54+
def _compare_a_b(self, a, b):
55+
if a is None: return 1
56+
if b is None: return -1
57+
for k, desc, clean, none_first in self.spec:
58+
try:
59+
try:
60+
va = a[k]; vb = b[k]
61+
except Exception as ex:
62+
va = getattr(a, k); vb = getattr(b, k)
63+
64+
except Exception as ex:
65+
raise KeyError(f"Key {k} is not available in object(s) given a: {a.__class__.__name__}, b: {a.__class__.__name__}")
66+
67+
if clean:
68+
va = clean(va)
69+
vb = clean(vb)
70+
71+
if va != vb:
72+
if va is None: return -1 if none_first else 1
73+
if vb is None: return 1 if none_first else -1
74+
if desc:
75+
return -1 if va > vb else 1
76+
else:
77+
return 1 if va > vb else -1
78+
79+
return 0
80+
81+
82+
def msorted(rows, spec, reverse:bool=False):
83+
if isinstance(spec, (int, str)):
84+
_c = Comparator.new(spec)
85+
else:
86+
_c = Comparator.new(*spec)
87+
return sorted(rows, key=_c, reverse=reverse)
88+
89+
# For use in the multi column sorted syntax to sort by 'grade' and then 'attend' descending
90+
# dict example:
91+
# rows_sorted = sorted(rows, key=lambda o: ((None if o['grade'] is None else o['grade'].lower()), reversor(o['attend'])), reverse=True)
92+
# object example:
93+
# rows_sorted = sorted(rows, key=lambda o: ((None if o.grade is None else o.grade.lower()), reversor(o.attend)), reverse=True)
94+
# list, tuple example:
95+
# rows_sorted = sorted(rows, key=lambda o: ((None if o[COL_GRADE] is None else o[COL_GRADE].lower()), reversor(o[COL_ATTEND])), reverse=True)
96+
# where: COL_GRADE and COL_ATTEND are column indexes for values
97+
class reversor:
98+
def __init__(self, obj):
99+
self.obj = obj
100+
def __eq__(self, other):
101+
return other.obj == self.obj
102+
def __lt__(self, other):
103+
return False if self.obj is None else \
104+
True if other.obj is None else \
105+
other.obj < self.obj
106+
107+
108+
def getClassName(o):
109+
return None if o == None else type(o).__name__
110+

0 commit comments

Comments
 (0)