CEPALSTAT parser walkthrough#

This notebook is the practical guide for parsing the CEPALSTAT COU and MIP bundles supported by MARIO.

Warning

CEPALSTAT support in MARIO is currently best treated as an evolving beta interface. The parser already covers several important country-specific workbook families, but the source repository still contains many layout edge cases and country exceptions that may not behave perfectly yet. Issue reports are therefore especially valuable here: if a country bundle fails or behaves unexpectedly, please open an issue and include the specific country, year, table type, and workbook family when possible.

What this notebook covers#

  • where the official CEPALSTAT repository lives;

  • the difference between SUT and IOT workflows;

  • direct-file versus directory parsing;

  • how year=, country=, and iot_mode= are used;

  • which CEPALSTAT workbook families are currently supported;

  • which parser warnings matter in practice.

Relevant source page#

MARIO does not provide an automatic downloader for this source. The expected workflow is to parse local files that you already downloaded from the repository.

Expected path structure#

path can point either to one downloaded CEPALSTAT bundle archive or to a directory collecting multiple bundle archives.

Typical direct-file inputs look like:

/path/to/COL_COU_2023.zip
/path/to/DOM_MIP_2012.zip

When you want MARIO to select from a local directory, a practical layout is:

cepalstat_directory/
|-- ARG_COU_2020.zip
|-- BRA_COU_2020.zip
|-- COL_COU_2023.zip
|-- DOM_MIP_2012.zip
`-- CHI_MIP_2020.zip

MARIO does not require every country bundle to follow one identical internal workbook engineering, but it does expect CEPALSTAT-style bundle names and local files that you already downloaded from the repository.

Main entry point#

For normal user workflows, the public entry point is:

  • mario.parse_cepalstat(...)

The same function supports both:

  • SUT bundles (table="SUT");

  • IOT bundles (table="IOT").

Supported layout families#

CEPALSTAT is not technically uniform, so MARIO resolves a set of supported families behind the same public API.

Current SUT support includes:

  • integrated offer/use workbooks such as Colombia;

  • two-sheet workbooks such as Argentina;

  • split offer/demand workbooks such as Brazil;

  • multi-cuadro workbooks such as Chile.

Current IOT support includes:

  • direct matrix workbooks such as Dominican Republic and Guatemala;

  • cuadro workbooks such as Colombia;

  • symmetric workbooks such as Argentina;

  • demand-at-basic-prices workbooks such as Brazil;

  • matrix workbooks such as Chile.

[1]:
import mario

Parse one SUT bundle directly#

Use year= when the bundle contains more than one annual workbook or when the workbook itself exposes more than one reference year.

[5]:
db = mario.parse_cepalstat(
    path="/path/to/COL_COU_2023.zip",
    table="SUT",
    year=2019,
)

db
WARNING Parser: CEPALSTAT Argentina SUT workbook does not expose disaggregated value-added rows. Using the aggregate 'Valor Agregado Bruto pb' row.
INFO Parser: CEPALSTAT SUT parsed with 107 activities, 222 commodities and 7 final-demand categories.
INFO Metadata: initialized.
[5]:
name = CEPALSTAT SUT ARG 2019
table = SUT
tech_assumption = industry-based
scenarios = ['baseline']
Activity = 107
Commodity = 222
Factor of production = 8
Satellite account = 1
Consumption category = 7
Region = 1

Parse one IOT bundle directly#

Use iot_mode= when the bundle exposes both PxP and AxA, or set iot_mode="auto" when the workbook family should decide automatically.

[7]:
db = mario.parse_cepalstat(
    path="/path/to/DOM_MIP_2012.zip",
    table="IOT",
    iot_mode="pxp",
    calc_all=False,
)

db
INFO Parser: CEPALSTAT IOT parsed with 24 sectors, 7 final-demand categories and 7 factor rows.
INFO Metadata: initialized.
[7]:
name = CEPALSTAT IOT DOM 2012 PXP
table = IOT
scenarios = ['baseline']
Factor of production = 7
Satellite account = 1
Consumption category = 7
Region = 1
Sector = 24

Parse from one directory containing multiple bundles#

When path points to a directory, country= and year= are the main selectors. This is useful when the local CEPALSTAT folder contains many countries and vintages.

[8]:
db = mario.parse_cepalstat(
    path="/path/to/cepalstat_directory",
    table="IOT",
    country="ARG",
    year=1997,
    iot_mode="auto",
    calc_all=False,
)

db
INFO Parser: CEPALSTAT IOT parsed with 124 sectors, 7 final-demand categories and 1 factor rows.
INFO Metadata: initialized.
[8]:
name = CEPALSTAT IOT ARG 1997 AUTO
table = IOT
scenarios = ['baseline']
Factor of production = 1
Satellite account = 1
Consumption category = 7
Region = 1
Sector = 124

Warnings you should expect#

Some CEPALSTAT families require controlled fallbacks. Typical examples are:

  • the offer and use product sets differ slightly, so MARIO keeps the common intersection and warns;

  • some SUT families expose only aggregate value added rather than the full breakdown;

  • some IOT families do not expose explicit factor rows, so MARIO reconstructs aggregate value added residually and warns.

These warnings are informative: they mark real differences in the source layout, not parser noise.

Practical recommendation#

For CEPALSTAT, start by validating one country family at a time. The repository is coherent at the metadata level, but the actual workbook engineering differs a lot across countries and vintages. If you are building reproducible ingestion scripts, keep the source file names and years explicit rather than relying on broad directory parsing when avoidable.