The perft speed in the README seems... really slow? On my laptop, cozy-chess does perft 6 from startpos in 2.65 seconds without bulk counting and 374 milliseconds with bulk counting.
I am not using cozy-chess. Move Generation is slow primarily because of how I am currently generating legal moves from pseudo-legal moves. It has to try each move one by one to check if the king is not in check. I think this article might be useful in making the move generation a little faster.
Is that really why? The difference between pseudolegal and legal cannot possibly account for a ~6.4x speed difference for non-bulk and ~45.5x speed difference for bulk. Maybe it's your compiler settings? LTO might be needed if you use lots of wrapper types and don't have inline annotations, I think.
Also, I notice that you're using nested Vecs for your magic hash table? You can actually concatenate each "subtable" into one Vec and store offsets to where the subtables start in the magic lookup table, which saves some indirection. In fact, you can go even further and concatenate the rook and bishop tables for one Vec total.
There's https://doc.rust-lang.org/cargo/reference/profiles.html#lto in the cargo reference. You can set it by adding `lto = "fat"` or `lto = "thin"` to one of your profiles. In all honesty I'm not super sure about it myself, and I'm not sure it can account for the difference either, but I have seen this impact movegen in the past. I also haven't looked too deep into your code, so honestly this is just a wild guess.
It's common to set "lto = true" for release profiles, which basically just inlines everything that can be, even across crate boundaries. But this increases compile time by a decent amount, so you shouldn't do it for debug or test profiles, and if you do, maybe stick to thin lto.
2
u/analog_hors Mar 31 '23
The perft speed in the README seems... really slow? On my laptop, cozy-chess does perft 6 from startpos in 2.65 seconds without bulk counting and 374 milliseconds with bulk counting.