Safety alignment in Large Language Models (LLMs) has transcended purely technical boundaries to become a strategic architectural decision that encodes the values, risk tolerance, and moral philosophy of their developing organisations. This paper conducts a comparative analysis of guardrail architectures across prominent commercial LLMs—with particular attention to ChatGPT, Grok, and Perplexity AI—examining how differences in Reinforcement Learning from Human Feedback (RLHF) reward modelling and ethical design choices produce markedly different behavioural outcomes when models face extreme moral dilemmas involving violence. We classify models along a deontological–utilitarian spectrum, demonstrating that so-called “analytical openness” in safety design can constitute a critical alignment failure rather than a sophistication. Our findings argue that a hard-coded ethical floor—an inviolable set of refusal principles—is necessary for safe enterprise deployment, and that the absence of such a floor represents a measurable liability for business-to-business (B2B) applications. We close by proposing a four-axis framework for auditing LLM ethical alignment and identifying directions for standardisation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zen Revista
Building similarity graph...
Analyzing shared references across papers
Loading...
Zen Revista (Sun,) studied this question.
www.synapsesocial.com/papers/699d3ff8de8e28729cf64ec8 — DOI: https://doi.org/10.5281/zenodo.18729374