Once you descend into the world of optimizing code by choosing specific instruction sets or, more precisely, extensions to instruction sets, you suddenly find yourself in a world of bits and bytes with comparatively little guidance on what choices are best and how exactly to implement them. Intel provides excellent manuals and we post articles on this site with some advice, but if you need to do a deep dive into all the ins and outs, the manuals I most suggest are those written by Agner Fog. They include Optimizing Software in C++ and Optimizing Subroutines in Assembly Language. Both books are undeniably works of love and every page rings with the author's personal experience in writing and testing the routines, including observations on the performance he obtained and the undocumented problems he ran into. Having now discovered these resources, I can't imagine going down this path without these manuals by my side.