Do you know if keysort/2 and msort/2 are stable sorting predicates ?
Any Prolog where keysort/2 is NOT stable is badly broken.
You will find keysort/2 in boot/sort.pl
Yes, it is stable.
The name `msort/2' originally signified MERGE sort and was
definitely intended to be a stable sort.
You will find sort/2 and msort/2 in src/pl-list.c
No, msort/2 is NOT stable.
But you can't possibly tell, so it doesn't matter.
I note that pl-list.c defines sort/2 and msort/2 in terms of C's qsort()
procedure. This is *NOT* a good idea. I've known commercial C
compilers implement qsort() as Shell sort or worse (yes, really) and the
O(n**2) worst case behaviour of so-called "quick sort" is no mere
theoretical possibility. I'm tired of people quoting textbooks at me to
the effect that with random inputs the O(n**2) worst case is extremely
unlikely; REAL data sets are NOT random (unless you've taken special
care to randomise them) and O(n**2) run times are quite commonly
observed. There are a number of tricks one can play to reduce the
likelihood, median-of-medians pivot selection, fat pivot, but it's all
so complicated and so pointless given the higher efficiency of merge
based sorts.
[Hint: what's _bad_ about having qsort_compare_standard()?]
[Hint: for what inputs is David Warren's sort/2 in Prolog O(N),
and which well known sorting algorithm is likely to go O(N**2)
on such inputs?]
David Warren's classic sort/2 goes like this:
sort(List, Sorted) :-
length(List, Length),
sort_(Length, List, _, Sorted).
sort_(N, L0, L, S) :-
( N > 2 ->
A is N >> 1,
Z is N - A,
sort_(A, L0, L1, S0),
sort_(Z, L1, L, S1),
ord_union(S0, S1, S)
; N =:= 2 ->
L0 = [X1,X2|L],
compare(R, X1, X2),
sort2(R, X1, X2, S)
; N =:= 1 ->
L0 = [X1|L],
S = [X1]
; N =:= 0,
L0 = L,
S = []
).
sort2(<, X1, X2, [X1,X2]).
sort2(>, X1, X2, [X2,X1]).
sort2(=, X1, _, [X1]).
ord_union([], S2, S2).
ord_union([X1|S1], S2, S) :-
ord_union1(S2, X1, S, S1).
ord_union1([], X1, [X1|S1], S1).
ord_union1([X2|S2], X1, S, S1) :-
compare(R, X1, X2),
ord_union12(R, X1, S, S1, X2, S2).
ord_union12(<, X1, [X1|S], S1, X2, S2) :-
ord_union1(S1, X2, S, S2).
ord_union12(>, X1, [X2|S], S1, X2, S2) :-
ord_union1(S2, X1, S, S1).
ord_union12(=, X1, [X1|S], S1, _, S2) :-
ord_union(S1, S2, S).
Changing this to msort/2 is fairly obvious, we just do merging
instead of unioning.
msort(List, Sorted) :-
length(List, Length),
msort_(Length, List, _, Sorted).
msort_(N, L0, L, S) :-
( N > 2 ->
A is N >> 1,
Z is N - A,
msort_(A, L0, L1, S0),
msort_(Z, L1, L, S1),
ord_merge(S0, S1, S)
; N =:= 2 ->
L0 = [X1,X2|L],
compare(R, X1, X2),
msort2(R, X1, X2, S)
; N =:= 1 ->
L0 = [X1|L],
S = [X1]
; N =:= 0,
L0 = L,
S = []
).
msort2(<, X1, X2, [X1,X2]).
msort2(>, X1, X2, [X2,X1]).
msort2(=, X1, X2, [X1,X2]).
ord_merge([], S2, S2).
ord_merge([X1|S1], S2, S) :-
ord_merge1(S2, X1, S, S1).
ord_merge1([], X1, [X1|S1], S1).
ord_merge1([X2|S2], X1, S, S1) :-
compare(R, X1, X2),
ord_merge12(R, X1, S, S1, X2, S2).
ord_merge12(<, X1, [X1|S], S1, X2, S2) :-
ord_merge2(S1, X2, S, S2).
ord_merge12(>, X1, [X2|S], S1, X2, S2) :-
ord_merge1(S2, X1, S, S1).
ord_merge12(=, X1, [X1|S], S1, X2, S2) :-
ord_merge2(S1, X2, S, S2).
ord_merge2([], X2, [X2|S2], S2).
ord_merge2([X1|S1], X2, S, S2) :-
compare(R, X1, X2),
ord_merge12(R, X1, S, S1, X2, S2).
Keysort is nearly the same as msort, except for how the comparisons are done.
(There's no reason keycompare/3 couldn't be a built in predicate.)
keysort(List, Sorted) :-
length(List, Length),
keysort_(Length, List, _, Sorted).
keysort_(N, L0, L, S) :-
( N > 2 ->
A is N >> 1,
Z is N - A,
keysort_(A, L0, L1, S0),
keysort_(Z, L1, L, S1),
key_merge(S0, S1, S)
; N =:= 2 ->
L0 = [X1,X2|L],
key_compare(R, X1, X2),
msort2(R, X1, X2, S)
; N =:= 1 ->
L0 = [X1|L],
S = [X1]
; N =:= 0,
L0 = L,
S = []
).
key_compare(R, K1-_, K2-_) :-
compare(R, K1, K2).
key_merge([], S2, S2).
key_merge([X1|S1], S2, S) :-
key_merge1(S2, X1, S, S1).
key_merge1([], X1, [X1|S1], S1).
key_merge1([X2|S2], X1, S, S1) :-
compare(R, X1, X2),
key_merge12(R, X1, S, S1, X2, S2).
key_merge12(<, X1, [X1|S], S1, X2, S2) :-
key_merge2(S1, X2, S, S2).
key_merge12(>, X1, [X2|S], S1, X2, S2) :-
key_merge1(S2, X1, S, S1).
key_merge12(=, X1, [X1|S], S1, X2, S2) :-
key_merge2(S1, X2, S, S2).
key_merge2([], X2, [X2|S2], S2).
key_merge2([X1|S1], X2, S, S2) :-
key_compare(R, X1, X2),
key_merge12(R, X1, S, S1, X2, S2).
I tested this code in Quintus Prolog and SWI Prolog (renaming the
top-level predicates to avoid clashes with built-ins). For a timing
test, I selected 10000 words at random from /usr/dict/words, and did
findall(W, w(W), Ws),
statistics(runtime, _),
sort(Ws, _),
statistics(runtime, [_,T|_])
with the obvious changes for the other predicates.
To my surprise, the version of sort/2 above was 12% faster than the
one in Quintus Prolog, which I _thought_ was pretty much exactly this code.
Also to my surprise, the built-in keysort/2 was *faster* than the built-in
sort/2 in Quintus Prolog. Maybe they gave it special treatment after I left,
as it is such a VERY useful predicate.
I got two other surprises in SWI Prolog version 5.0.8.
First, sort/2 as shown above is INFINITELY faster than the built-in
(C coded) sort/2, because when I run the timing code shown above, I get
ERROR: sort/2: Caught signal 11 (segv)
Now I can understand sort/2 *failing* if the input is not a list.
But how can sort/2 possibly get a SIGSEGV exception?
Second, keysort/2 as shown above is more than twice as fast on this
test set as the built-in one. I was getting times like 8.6 seconds
with the built-in and 3.75 seconds with the keysort/2 code above.
Can it be that the boot/*.pl files are compiled with 'optimise' false?
A good way to implement these sorting methods via a hybrid of Prolog and C
is
copy_for_sort([], [], N, N).
copy_for_sort([H|T], [H|C], N0, N) :-
N1 is 1 + N0,
copy_for_sort(T, C, N1, N).
sort(List, Sorted) :-
copy_for_sort(List, Working, 0, N),
$sort(Working, N, 1, 0),
Sorted = Working.
msort(List, Sorted) :-
copy_for_sort(List, Working, 0, N),
$sort(Working, N, 0, 0),
Sorted = Working.
keysort(List, Sorted) :-
copy_for_sort(List, Working, 0, N),
$sort(Working, N, 0, 1),
Sorted = Working.
ukeysort(List, Sorted) :-
copy_for_sort(List, Working, 0, N),
$sort(Working, N, 1, 1),
Sorted = Working.
Here $sort(List, Length, Discard-Duplicate-Keys?, Key-Is-First-Argument?)
sorts List by destructively redirecting "tail" pointers. Samsort is a good
sorting method here, it requires O(lgN) workspace, which means that a quite
small array allocated on the C stack is quite good enough, and it causes NO
memory management activity whatever in C code.
I'm afraid that I don't understand the C internals of SWI Prolog well enough
to implement this myself, but if anyone's interested I'll be happy to
explain further.
----------------
* To UNSUBSCRIBE, please use the HTML form at
http://www.swi-prolog.org/mailinglist.html
or send mail to prolog-***@swi.psy.uva.nl using the Subject: "unsubscribe"
(without the quotes) and *no* message body.
** An ARCHIVE of this list is maintained at
http://www.swi.psy.uva.nl/projects/SWI-Prolog/mailinglist/archive/