- 
                Notifications
    You must be signed in to change notification settings 
- Fork 186
Faster disjoint/isSubsetOf for Set via unbalanced splitting. #865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| The same should be applicable to  | 
| I'm going to need to see a proof that the stated time bounds still hold. Or if they don't, some reasonably tight new bound. We need to be sure that this doesn't introduce some nasty cases that our benchmarks don't happen to catch. | 
| -- Same as 'splitMember' but skips re-balancing by using 'bin' instead of 'link'. | ||
| -- Attempting to build new trees out of these will error when re-balancing but | ||
| -- this can improve performance when the resulting trees are disposable. | ||
| splitMemberUnbalanced :: Ord a => a -> Set a -> (Set a,Bool,Set a) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to factor out bin/link to avoid code duplication but that made the involved functions 2-3x slower.
| Did you see my question? The bounds we have for these operations lean on the highly nontrivial published proofs for bounds on intersection and difference. My concern is that by allowing one set of trees to become unbalanced (hence potentially deep for their size), you could break those proofs (and bounds). | 
| 
 My bad, hadn't noticed you had already replied by the time I finished commenting. 
 Right. The idea here is that the act of balancing that  That said, I don't currently have a proof, only a gut feeling backed by no more than a specific benchmark. I'll peek at the published proofs. | 
| I think this is interesting, and can be applied to union/intersection/difference/others too. I implemented it and can see some improvements in set-operations-set, Results  union-block_nn:              OK (0.39s)
    419  μs ±  25 μs, 32% less than baseline
  union-block_nn_swap:         OK (0.39s)
    410  μs ±  32 μs, 35% less than baseline
  union-block_ns:              OK (0.42s)
    38.5 μs ± 3.0 μs, 44% less than baseline
  union-block_sn_swap:         OK (0.47s)
    48.0 μs ± 3.5 μs, 35% less than baseline
  union-common_nn:             OK (0.77s)
    512  μs ±  34 μs,       same as baseline
  union-common_nn_swap:        OK (0.84s)
    1.23 ms ± 113 μs, 13% more than baseline
  union-common_ns:             OK (0.42s)
    404  μs ±  28 μs, 43% less than baseline
  union-common_nt:             OK (0.55s)
    34.7 μs ± 1.8 μs, 27% less than baseline
  union-common_sn_swap:        OK (0.32s)
    1.49 ms ± 146 μs,       same as baseline
  union-common_tn_swap:        OK (0.42s)
    95.5 μs ± 5.5 μs, 14% more than baseline
  union-disj_nn:               OK (0.53s)
    2.97 μs ± 174 ns, 36% less than baseline
  union-disj_nn_swap:          OK (0.54s)
    2.82 μs ± 197 ns, 43% less than baseline
  union-disj_ns:               OK (0.49s)
    2.11 μs ± 177 ns, 40% less than baseline
  union-disj_nt:               OK (0.54s)
    1.23 μs ±  89 ns, 44% less than baseline
  union-disj_sn_swap:          OK (0.51s)
    2.41 μs ± 195 ns, 38% less than baseline
  union-disj_tn_swap:          OK (0.47s)
    1.72 μs ± 166 ns, 32% less than baseline
  union-mix_nn:                OK (1.16s)
    16.7 ms ± 624 μs,  6% less than baseline
  union-mix_nn_swap:           OK (0.58s)
    16.6 ms ± 565 μs,       same as baseline
  union-mix_ns:                OK (0.48s)
    1.18 ms ±  42 μs, 31% less than baseline
  union-mix_nt:                OK (0.36s)
    66.1 μs ± 5.6 μs, 16% less than baseline
  union-mix_sn_swap:           OK (0.46s)
    2.25 ms ± 111 μs, 16% more than baseline
  union-mix_tn_swap:           OK (0.46s)
    97.9 μs ± 7.5 μs, 14% more than baseline
  difference-block_nn:         OK (0.40s)
    191  μs ±  16 μs, 56% less than baseline
  difference-block_nn_swap:    OK (0.42s)
    188  μs ±  11 μs, 57% less than baseline
  difference-block_ns:         OK (0.45s)
    18.4 μs ± 1.5 μs, 64% less than baseline
  difference-block_sn_swap:    OK (0.42s)
    17.7 μs ± 1.5 μs, 65% less than baseline
  difference-common_nn:        OK (0.53s)
    3.03 ms ± 189 μs, 14% less than baseline
  difference-common_nn_swap:   OK (0.35s)
    577  μs ±  44 μs, 17% less than baseline
  difference-common_ns:        OK (0.31s)
    1.33 ms ± 104 μs, 46% less than baseline
  difference-common_nt:        OK (0.44s)
    92.2 μs ± 7.3 μs, 29% less than baseline
  difference-common_sn_swap:   OK (0.42s)
    453  μs ±  25 μs, 55% less than baseline
  difference-common_tn_swap:   OK (0.45s)
    43.4 μs ± 3.4 μs, 49% less than baseline
  difference-disj_nn:          OK (0.56s)
    1.53 μs ±  90 ns, 56% less than baseline
  difference-disj_nn_swap:     OK (0.56s)
    1.55 μs ±  85 ns, 47% less than baseline
  difference-disj_ns:          OK (0.51s)
    1.19 μs ±  84 ns, 55% less than baseline
  difference-disj_nt:          OK (0.58s)
    772  ns ±  49 ns, 54% less than baseline
  difference-disj_sn_swap:     OK (0.51s)
    1.19 μs ±  87 ns, 50% less than baseline
  difference-disj_tn_swap:     OK (0.56s)
    736  ns ±  42 ns, 55% less than baseline
  difference-mix_nn:           OK (0.31s)
    3.11 ms ± 213 μs, 49% less than baseline
  difference-mix_nn_swap:      OK (0.58s)
    3.23 ms ± 147 μs, 47% less than baseline
  difference-mix_ns:           OK (0.37s)
    833  μs ±  55 μs, 40% less than baseline
  difference-mix_nt:           OK (0.36s)
    68.5 μs ± 6.1 μs, 28% less than baseline
  difference-mix_sn_swap:      OK (0.33s)
    562  μs ±  47 μs, 61% less than baseline
  difference-mix_tn_swap:      OK (0.45s)
    50.1 μs ± 3.1 μs, 44% less than baseline
  intersection-block_nn:       OK (0.40s)
    191  μs ±  16 μs, 66% less than baseline
  intersection-block_nn_swap:  OK (0.42s)
    189  μs ±  13 μs, 66% less than baseline
  intersection-block_ns:       OK (0.44s)
    18.4 μs ± 1.5 μs, 73% less than baseline
  intersection-block_sn_swap:  OK (0.41s)
    17.8 μs ± 1.5 μs, 74% less than baseline
  intersection-common_nn:      OK (0.27s)
    1.06 ms ±  90 μs, 32% less than baseline
  intersection-common_nn_swap: OK (0.20s)
    545  μs ±  42 μs, 33% less than baseline
  intersection-common_ns:      OK (0.26s)
    975  μs ±  87 μs, 46% less than baseline
  intersection-common_nt:      OK (0.38s)
    77.0 μs ± 6.2 μs, 40% less than baseline
  intersection-common_sn_swap: OK (0.43s)
    430  μs ±  22 μs, 67% less than baseline
  intersection-common_tn_swap: OK (0.45s)
    43.3 μs ± 3.6 μs, 61% less than baseline
  intersection-disj_nn:        OK (0.58s)
    1.54 μs ±  85 ns, 62% less than baseline
  intersection-disj_nn_swap:   OK (0.55s)
    1.55 μs ±  92 ns, 65% less than baseline
  intersection-disj_ns:        OK (0.51s)
    1.20 μs ±  85 ns, 64% less than baseline
  intersection-disj_nt:        OK (0.58s)
    780  ns ±  47 ns, 66% less than baseline
  intersection-disj_sn_swap:   OK (0.51s)
    1.19 μs ±  84 ns, 65% less than baseline
  intersection-disj_tn_swap:   OK (0.57s)
    750  ns ±  47 ns, 65% less than baseline
  intersection-mix_nn:         OK (0.54s)
    3.16 ms ± 183 μs, 60% less than baseline
  intersection-mix_nn_swap:    OK (0.31s)
    3.06 ms ± 300 μs, 62% less than baseline
  intersection-mix_ns:         OK (0.40s)
    839  μs ±  54 μs, 58% less than baseline
  intersection-mix_nt:         OK (0.51s)
    65.5 μs ± 3.1 μs, 47% less than baseline
  intersection-mix_sn_swap:    OK (0.31s)
    585  μs ±  48 μs, 65% less than baseline
  intersection-mix_tn_swap:    OK (0.48s)
    56.1 μs ± 3.0 μs, 53% less than baseline
union and intersection are only changed in terms of the unbalanced split, for difference I had to make a larger change so it is not a good direct comparison. There are also a handful of increases in union, not sure why. Anyway, this seems useful, so I'll also try to understand the proofs and see if they still apply with this change. | 
| One option to consider is to switch to an unbalanced split (or the "hedge" algorithms we used to use) when the sets/maps get small enough (below some fixed size). That will avoid breaking big O while getting a lot of the performance benefits in the cases where it's good. | 
| I'm surprised it's not always worse for operations that return sets since they must return balanced sets in the end to preserve invariants no? Are you doing a single call to balance at the very end? I haven't been in the headspace to look at this in a while, but one thing I'd been meaning to do is try and make this allocation free. It sounds plausible to me since without re-balancing the triple that's returned is immediately consumed. I had tried to do this via CPS but having functions as arguments seemed to kill performance and I'm always a bit lost when trying to reason about the Core that comes out. | 
| Not all the reconstructed pieces end up getting incorporated. For  | 
| 
 I had to only for  -- A possibly unbalanced set.
-- Invariant: A Bin with non-zero size is balanced.
--            To construct an unbalanced set: Unbalanced (Bin 0 x l r)
newtype Unbalanced a = Unbalanced (Set a)
fromUnbalanced :: Unbalanced a -> Set a
fromUnbalanced (Unbalanced s0) = go s0
  where
    go (Bin 0 x l r) = link x (go l) (go r)
    go s = s
splitSUnbalanced :: Ord a => a -> Unbalanced a -> StrictPair (Unbalanced a) (Unbalanced a)
splitMemberUnbalanced :: Ord a => a -> Unbalanced a -> (Unbalanced a,Bool,Unbalanced a)
union :: Ord a => Set a -> Set a -> Set a
union t10 t20 = go t10 (Unbalanced t20)
  where
    go t1 (Unbalanced Tip) = t1
    go t1 (Unbalanced (Bin _ x Tip Tip)) = insertR x t1
    go (Bin 1 x _ _) t2 = insert x (fromUnbalanced t2)
    go Tip t2 = fromUnbalanced t2
    go t1@(Bin _ x l1 r1) t2 = case splitSUnbalanced x t2 of
      (l2 :*: r2)
        | l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1 -> t1
        | otherwise -> link x l1l2 r1r2
        where !l1l2 = go l1 l2
              !r1r2 = go r1 r2
difference :: Ord a => Set a -> Set a -> Set a
difference t10 t20 = go t10 (Unbalanced t20)
  where
    go Tip _ = Tip
    go t1 (Unbalanced Tip) = t1
    go t1@(Bin _ x l1 r1) t2 = case splitMemberUnbalanced x t2 of
      (l2,b,r2)
        | b -> merge l1l2 r1r2
        | l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1 -> t1
        | otherwise -> link x l1l2 r1r2
        where !l1l2 = go l1 l2
              !r1r2 = go r1 r2
intersection :: Ord a => Set a -> Set a -> Set a
intersection t10 t20 = go t10 (Unbalanced t20)
  where
    go Tip _ = Tip
    go _ (Unbalanced Tip) = Tip
    go t1@(Bin _ x l1 r1) t2
      | b = if l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1
            then t1
            else link x l1l2 r1r2
      | otherwise = merge l1l2 r1r2
      where
        !(l2, b, r2) = splitMemberUnbalanced x t2
        !l1l2 = go l1 l2
        !r1r2 = go r1 r2 | 
Unlike, say,
union/intersection,disjointdoesn't return a new structure. It can avoid the re-balancing work because it immediately inspects and forgets the produced tree. This allows significant constant factor speedups.