What question did this study set out to answer?

The research aims to assess the feasibility of a hybrid prediction-optimization workflow for allocating the Brazilian federal budget under data constraints.

April 10, 2026Open Access

An exploratory hybrid AI workflow for Brazilian federal budget allocation

Key Points

The research aims to assess the feasibility of a hybrid prediction-optimization workflow for allocating the Brazilian federal budget under data constraints.
Developed a multi-output XGBoost model linking spending profiles to GDP growth, inflation, and Gini index.
Applied Bayesian optimisation with Tree-structured Parzen Estimator/Optuna to identify optimal budget allocations.
Augmented data by generating 1048 synthetic observations to address data scarcity.
Utilized randomized K-fold cross-validation for model evaluation.
Achieved mean R2 of 0.97 and mean MSE of 0.04 with the augmented dataset.
Found weaker generalization on real observations, with overall mean MSE of 1.03 and mean R2 of -0.45.
Identified a scenario that maximizes the objective function with GDP growth at 1.15, inflation at -0.04, and Gini at -0.17.

Abstract

Abstract This study assesses whether a hybrid prediction–optimisation workflow can be used as an exploratory exercise for Brazilian federal budget allocation under severe data constraints. Using executed expenditure by budgetary function (2000–2023; N = 24), a multi-output XGBoost model is estimated to link spending profiles to GDP growth, inflation, and the Gini index; Bayesian optimisation (Tree-structured Parzen Estimator/Optuna) is then applied to search, within explicit bounds and penalties, for allocation vectors that maximise a stated objective function favouring higher growth and lower inflation and inequality. To mitigate data scarcity, the short series is augmented with 1048 synthetic observations generated through controlled noise injection, bootstrapped resampling and variational autoencoder reconstruction. Under randomised K-fold cross-validation on the augmented dataset, the model achieves mean R 2 = 0.97 and mean MSE = 0.04, while diagnostics indicate larger errors at extreme values and a persistent training–validation gap. A secondary robustness check uses an anti-leakage design by applying cross-validation to the 24 real observations and generating synthetic data only within each training fold. This yields markedly weaker generalisation for GDP growth and inflation (overall mean MSE = 1.03; overall mean R 2 = −0.45), with positive performance remaining only for the Gini index ( R 2 = 0.60). Under these conditions, the optimisation step identifies a scenario that satisfies the objective function on standardised outputs (GDP growth = 1.15; inflation = −0.04; Gini = −0.17). The results support the use of the workflow to compare scenarios under explicit assumptions, rather than to produce prescriptive budget guidance.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Saulo De Oliveira Nonato

Marina Figueiredo Moreira

David Nadler Prata

Journals

Data & Policy

Actions

Institutions

Universidade de Brasília

Film Independent

Universidade Federal do Tocantins

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An exploratory hybrid AI workflow for Brazilian federal budget allocation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study