Reducing event and data sizes is critical for experiments at the LHC, where high collision rates and increased detector granularity rapidly increase storage and processing requirements. In the CMS experiment, a recent development to address this challenge is the Raw’ format: a new approach for recording silicon strip data in which only the reconstructed cluster’s barycenter and averagechargearestored,ratherthantheanalog-to-digitalconvertercountsfromeverystrip. This format was successfully deployed online during Run-3 for PbPb collisions at CMS, achieving an event size reduction by nearly a factor of two and enabling CMS to record almost all hadronic minimum bias PbPb collisions. To further enhance Raw’, we optimized the number of bits used to encode the cluster barycenter and total charge, using tracking efficiency and resolution as benchmarks. Comparing standard Raw with Raw’ shows that refining the bit precision yields strongercompressionwhilemaintainingsimilarperformance. Additionally,weintroducealossless compression strategy that encodes distances between clusters instead of their absolute positions within a detector module. Unlike absolute positions, the distribution of these distances is peaked around zero, effectively reducing entropy of that variable. Consequently, LZMA compression becomes more efficient, allowing even stronger data reduction than the current Raw’ algorithms withoutlosinginformationintegrity. Lastly,wediscussprojecteddatasizesforPhase-2andexplore extending these techniques to other CMS detectors, notably the High-Granularity Calorimeter, which is anticipatedto generateasubstantial fraction offuturedata.
Saswati Nandan (Tue,) studied this question.