Silicon-DPO Platinum: The Reasoning-Code Dataset