What question did this study set out to answer?

June 3, 2026Open Access

Estimating nutrient composition of packaged foods using natural language processing and optimization modeling

Key Points

The aim is to estimate the full nutrient composition of packaged foods using natural language processing and optimization modeling.
Matched ingredients from 5,371 packaged foods to the Canadian Nutrient File using an NLP algorithm.
Assessed match quality with cosine similarity scores.
Used optimization modeling to estimate ingredient proportions and reverse-engineer nutrient composition data.
Over 55% of ingredients matched to the Canadian Nutrient File with cosine similarity scores ≥ 0.9.
Median relative error for nutrient estimates was <|20%| across combined food categories.
Six food categories showed strong results with median relative errors <|20%| for all nutrients.

Abstract

AbstractBackground Food composition databases are fundamental for rigorous dietary assessment, yet they often include information only for generic foods. Objective This study aimed to estimate the full nutrient composition of packaged foods using natural language processing (NLP) and optimization modeling. Methods Nutrition Facts tables (NFT) and ingredient lists for 5,371 packaged foods collected by the Food Quality Observatory across 17 food categories available in Québec, Canada, were used. First, an NLP algorithm matched individual ingredients from packaged foods to the closest equivalents in the Canadian Nutrient File (CNF) 2015, which contains full nutrient profiles for over 5,690 ingredients and foods in Canada. Match quality was assessed using cosine similarity scores. Second, an optimization model estimated the proportion of all ingredients (g/100g) from the packaged foods, enabling the reverse-engineering of nutrient composition data found on the NFT. Model performance was assessed using relative errors comparing estimated versus known nutrient values reported on NFTs. Results Over 55% of ingredients were matched to the CNF with cosine similarity scores ≥ 0.9, indicating high-quality matches. Across all food categories combined, the median relative error for the estimates of energy and the 10 nutrients reported on NFT was Conclusions A method based on NLP and optimization modeling can reliably estimate ingredient proportions of a wide variety of packaged foods, allowing for the generation of complete nutrient profiles.

Bookmark

View Full Paper

Bookmark

View Full Paper

Estimating nutrient composition of packaged foods using natural language processing and optimization modeling

Key Points

Abstract

Cite This Study