As large language models (LLMs) integrate into critical decision-making, their alignment with human values in high-stakes scenarios remains unclear. This study systematically investigates LLM behavioral consistency, focusing on cooperative intent, resource distribution, and moral reasoning, under simulated emergencies. We employed established psychological scales in two crisis scenarios: natural disaster resource allocation and crowd panic response. We use “catalyst” metaphorically: crisis framings serve as an observational stress test that amplifies and reveals latent behavioral trade-offs in LLMs rather than improving the models. Using a standardized API framework, we evaluated three primary LLMs (gpt-4o, DeepSeek-V3, and DeepSeek-R1) across repeated trials, analyzing both quantitative decisions and qualitative justifications. Results reveal that while LLMs reproduce broad human-like preferences (e.g., cooperation over competition), they exhibit systematic variations in ethical trade-offs and “flattened” decision distributions. Models differed significantly in cooperative framing and showed attenuated sensitivity to social variables (e.g., future interaction expectations) compared to humans. These findings advance computational crisis management and AI ethics, demonstrating context-dependent value misalignment risks. We propose a novel framework for evaluating behavioral consistency in silicon-based agents during crises, offering critical methodological and ethical guidance for deploying LLMs in socially complex, high-stakes environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cfb15cdc762e9d858b40 — DOI: https://doi.org/10.1057/s41599-026-07194-z
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Aoxiang Yang
Mingyang ZUO
Rui PENG
Humanities and Social Sciences Communications
Tsinghua University
Communication University of China
Building similarity graph...
Analyzing shared references across papers
Loading...