Multiple sequence alignment (MSA) is a fundamental tool for identifying conserved regions and inferring molecular structure, function, and evolutionary relationships. Despite decades of progress, aligning large and evolutionarily diverse sequence sets remains computationally challenging and prone to error propagation in order-dependent pipelines. Here, we present a comprehensive performance evaluation of CPA-FL (Clustering-based Progressive Alignment with Fuzzy Logic), a flexible MSA framework designed to improve robustness through graph-based clustering and fuzzy membership refinement. CPA-FL was benchmarked against widely used alignment tools across large protein families and curated reference datasets. Two large-scale protein families—HEN1 (438 sequences) and HST (477 sequences)—were used to assess alignment quality under multiple clustering and thresholding strategies. Results show that moderate, well-defined clustering combined with progressive profile HMM merging yields the highest SP per aligned column and BLOSUM62-weighted SP scores, indicating improved local alignment accuracy and preservation of evolutionary signal. In contrast, overly aggressive clustering under permissive threshold settings led to fragmentation and reduced biological coherence. Viterbi-based profile HMM merging produced the most compact alignments, reflecting efficient gap handling, while progressive profile HMM merging achieved enhanced local accuracy through iterative profile refinement. Comparative benchmarking against Clustal Omega, MUSCLE, Kalign, MAFFT, and T-Coffee demonstrated that CPA-FL configurations achieve competitive or superior performance, particularly in conserved regions. Statistical evaluation using Friedman non-parametric tests on BALiBASE 3.0 reference datasets confirmed significant performance differences across methods (P < 0.00001). Together, these results establish CPA-FL as a scalable and biologically meaningful framework for large-scale MSA, offering explicit control over clustering granularity while mitigating the brittleness of traditional progressive alignment approaches. • A new CPA-FL algorithm integrates clustering and fuzzy logic for large-scale MSA. • Evaluations on HEN1 (438 seqs), HST (477 seqs), and BaliBASE 3.0 show significant performance differences (Friedman test, P < 0.00001). • Defined clustering with profile-HMM merging achieved the highest SP and BLOSUM62 scores. • Viterbi-based merging produced compact alignments with efficient gap handling. • CPA-FL matched or outperformed Clustal Ω, MUSCLE, MAFFT, Kalign, and T-Coffee in conserved regions. • Minimum-threshold and component-based strategies optimized both accuracy and computational efficiency.
Behzad Hajieghrari (Wed,) studied this question.