Silicon DPO Platinum: The Runtime-Verified Reasoning Dataset

SILICON-DPO: The Runtime-Verified Reasoning Dataset.

Static evaluation is dead.

Most coding datasets are just LLMs grading LLMs. They hallucinate correctness.

Silicon-DPO is different. Every single line of code in the "Chosen" split has survived the Runtime Sandbox.

If it didn't compile, it was rejected. If it failed the unit tests, it was rewritten.

We captured the entire trajectory: The failure, the <think> trace, the fix, and the final verifiable solution.

The Methodology

Generated by an adversarial multi-agent system (Silicon-MoE) simulating various difficult scenarios.

Data Specs:

• Format: .jsonl (Standard Alpaca/ShareGPT format)

• Verification: Python Runtime + assert Test Suites.

• Legacy Core: Includes curated high-quality samples from V12-V14 development phases.

Licensing Tiers

OPTION A: The Builder's Kit (€29)

Designed for: SFT (Supervised Fine-Tuning), Personal Projects, App Dev.

Train your model to write executable code. Pure and simple.

• Volume: 1,500 Pairs.

• Content:

• 1,250 SFT Pairs: Instruction + Final Verified Code. (Reasoning traces removed for compact training).

• Bonus: 250 Elite DPO Pairs (Full Access) to test the reasoning engine.

• License: Personal / Individual Research.

OPTION B: The Research Lab (€299)

Designed for: DPO/RLHF, Reasoning Model Training (O1-style), Enterprise.

Train a model that thinks before it codes. The full cognitive stack.

• Volume: 5,000+ Platinum Pairs (Full Corpus).

• Content:

• <think> Traces: Deep Chain-of-Thought logs diagnosing bugs.

• Self-Healing Trajectories: Capture the "Error -> Analysis -> Fix" learning loop.

• Negative Samples (rejected): Critical for Reward Model training.

• All Domains: Includes the complete Security, Architecture, and Finance subsets.

• License: Commercial Use Allowed. (Train proprietary models and resell the inference).

Why "Self-Healing" Data?

Human engineers don't write perfect code on the first try. They debug.

Standard datasets only show the destination. Silicon-DPO (Tier 2) teaches your model the journey of debugging complex systems.

Deploy the factory.

Buy this Pay in 12 installmentsFirst installment of €2.49, followed by 11 monthly installments of €2.41