A tool that removes censorship from open-weight LLMs
Summary
OBLITERATUS is an open-source toolkit that aims to understand and remove refusal (guardrails) from open-weight LLMs using abliteration techniques. It provides a multi-stage pipeline (map, break, understand, and informed pursuit) with both zero-code and programmable options, and it emphasizes community-sourced telemetry to build a large-scale dataset of refusal-geometry across models. The project highlights reversible and steering-based approaches, but it also raises significant safety and ethical considerations around bypassing content safeguards.