Excel custom parser walkthrough#

This notebook shows how to parse existing MARIO-readable Excel workbooks with mario.parse_from_excel(...). It uses the small test workbooks shipped with MARIO, so the focus is on reading a completed workbook rather than generating a new template.

What this notebook covers#

  • parse standard IOT and SUT Excel workbooks;

  • parse Excel workbooks that need explicit matrix_layouts;

  • inspect the parsed baseline blocks and index labels.

Creating a blank workbook from sets and units is covered in the dedicated workflow on providing a custom database.

Example workbooks#

The examples use these packaged workbooks:

  • test_IOT_standard.xlsx: standard IOT layout;

  • test_IOT_special.xlsx: IOT layout with extra semantic levels on V and E;

  • test_SUT_standard.xlsx: standard SUT layout;

  • test_SUT_special.xlsx: SUT layout with extra semantic levels on satellite extensions.

The code below uses paths relative to this notebook source location in the MARIO repository. If you run the notebook elsewhere, replace the path strings with the location of your local Excel files.

Main entry point and key arguments#

Use mario.parse_from_excel(...) when path points to one completed MARIO-readable workbook.

Key arguments:

  • path: workbook to parse;

  • table: choose "IOT" or "SUT";

  • mode: choose "flows" or "coefficients";

  • data_sheet: optional sheet selector for the table payload;

  • unit_sheet: optional sheet selector for the units sheet;

  • matrix_layouts: optional semantic declaration for non-standard matrix layouts;

  • tech_assumption: optional SUT selector for IT or PT.

Workbook structure and packaged examples#

path must point to one Excel workbook.

custom_database.xlsx
├── data sheet
└── units sheet

The data sheet can use the canonical MARIO matrix layout exported by MARIO or a flat template generated with mario.write_parse_template(...). The units sheet must contain the units for the labels used by sectors, activities, commodities, factors, final demand, and satellites.

Packaged example workbooks used in this notebook:

Standard and special layouts#

In the standard workbook layout, the table uses three index levels on both rows and columns: region, level name, and level label. Use mode="flows" or mode="coefficients" to tell MARIO how to interpret the numbers.

For non-standard layouts, use matrix_layouts= to declare extra semantic layers, especially on V and E. VY and EY inherit the same extra layout as V and E, so you do not declare them separately.

[1]:
import mario

Parse a standard IOT workbook#

For a standard IOT workbook, only the file path, table kind and mode are required.

[2]:
iot = mario.parse_from_excel(
    path="/path/to/test_IOT_standard.xlsx",
    table="IOT",
    mode="flows",
)
INFO Parser: excel reading IOT flows from /path/to/test_IOT_standard.xlsx.
INFO Parser: state payload ready with 6 canonical blocks.
INFO Parser: excel state ready for IOT.
INFO Metadata: initialized.
[3]:
iot.meta.table
[3]:
'IOT'
[4]:
iot.get_index("Region")
[4]:
['Reg1', 'Reg2']
[5]:
iot.get_index("Sector")
[5]:
['Agriculture', 'Services', 'Industry']
[6]:
iot.V
[6]:
Region Reg1 Reg2
Level Sector Sector
Item Agriculture Industry Services Agriculture Industry Services
Item
Taxes 5.517199e+05 3.047096e+06 5.010693e+06 4908.415181 36852.481913 143818.770826
Wages 2.576824e+06 6.311953e+06 3.339472e+07 9698.892565 213257.479272 754513.227394
Capital 2.887880e+06 5.616641e+06 2.366083e+07 30784.073998 101949.241428 529918.086220

Parse an IOT workbook with custom matrix layouts#

Use matrix_layouts when specific matrices carry extra semantic levels beyond the default layout. In this example, V is indexed by region and sector, while E is indexed by region.

[7]:
iot_special = mario.parse_from_excel(
    path="/path/to/test_IOT_special.xlsx",
    table="IOT",
    mode="flows",
    matrix_layouts={
        "V": ("Region", "Sector"),
        "E": "Region",
    },
)
INFO Parser: excel reading IOT flows from /path/to/test_IOT_special.xlsx.
INFO Parser: state payload ready with 6 canonical blocks.
INFO Parser: excel state ready for IOT.
INFO Metadata: initialized.
[8]:
iot_special.V
[8]:
Region Reg1 Reg2
Sector Agriculture Services Industry Agriculture Services Industry
Region Sector Factor of production
Reg1 Agriculture Taxes 2.206879e+05 2.004277e+06 1.218838e+06 1963.366072 57527.508330 14740.992765
Services Taxes 5.517199e+04 5.010693e+05 3.047096e+05 490.841518 14381.877083 3685.248191
Industry Taxes 2.758599e+04 2.505346e+05 1.523548e+05 245.420759 7190.938541 1842.624096
Reg2 Agriculture Taxes 1.931020e+05 1.753742e+06 1.066483e+06 1717.945313 50336.569789 12898.368670
Services Taxes 2.758599e+04 2.505346e+05 1.523548e+05 245.420759 7190.938541 1842.624096
Industry Taxes 2.758599e+04 2.505346e+05 1.523548e+05 245.420759 7190.938541 1842.624096
Reg1 Agriculture Wages 1.030730e+06 1.335789e+07 2.524781e+06 3879.557026 301805.290958 85302.991709
Services Wages 2.576824e+05 3.339472e+06 6.311953e+05 969.889257 75451.322739 21325.747927
Industry Wages 1.288412e+05 1.669736e+06 3.155977e+05 484.944628 37725.661370 10662.873964
Reg2 Agriculture Wages 9.018884e+05 1.168815e+07 2.209184e+06 3394.612398 264079.629588 74640.117745
Services Wages 1.288412e+05 1.669736e+06 3.155977e+05 484.944628 37725.661370 10662.873964
Industry Wages 1.288412e+05 1.669736e+06 3.155977e+05 484.944628 37725.661370 10662.873964
Reg1 Agriculture Capital 1.155152e+06 9.464332e+06 2.246656e+06 12313.629599 211967.234488 40779.696571
Services Capital 2.887880e+05 2.366083e+06 5.616641e+05 3078.407400 52991.808622 10194.924143
Industry Capital 1.443940e+05 1.183041e+06 2.808320e+05 1539.203700 26495.904311 5097.462071
Reg2 Agriculture Capital 1.010758e+06 8.281290e+06 1.965824e+06 10774.425899 185471.330177 35682.234500
Services Capital 1.443940e+05 1.183041e+06 2.808320e+05 1539.203700 26495.904311 5097.462071
Industry Capital 1.443940e+05 1.183041e+06 2.808320e+05 1539.203700 26495.904311 5097.462071

Parse a standard SUT workbook#

For SUT workbooks, pass the table kind as "SUT". You can also pass tech_assumption= when you want MARIO to record the technology assumption used downstream.

[9]:
sut = mario.parse_from_excel(
    path="/path/to/test_SUT_standard.xlsx",
    table="SUT",
    mode="flows",
    tech_assumption="IT",
)
INFO Parser: excel reading SUT flows from /path/to/test_SUT_standard.xlsx.
INFO Parser: state payload ready with 10 canonical blocks.
INFO Parser: excel state ready for SUT.
INFO Metadata: initialized.
[11]:
sut.meta.table
[11]:
'SUT'
[12]:
sorted(sut["baseline"].keys())
[12]:
['EY',
 'Ea',
 'Ec',
 'S',
 'U',
 'VY',
 'Va',
 'Vc',
 'Xa',
 'Xc',
 'Ya',
 'Yc',
 'ea',
 's',
 'u',
 'wcc']
[13]:
sut.E
INFO Resolver: resolving E for baseline.
INFO Resolver: trying E via concat.
INFO Resolver: resolved E via concat.
[13]:
Region Region 1 Region 2 Region 1 Region 2
Level Activity Activity Commodity Commodity
Item Manufacturing Services Manufacturing Services Goods Services Goods Services
Item
Employment 3.752922e+06 6.547713e+05 1.926382e+04 3.939633e+03 0.0 0.0 0.0 0.0
CO2 2.220336e+13 8.884627e+12 1.910855e+11 4.839515e+10 0.0 0.0 0.0 0.0

Parse a SUT workbook with custom matrix layouts#

The same matrix_layouts mechanism applies to SUT workbooks. Here, the satellite-account matrices are declared with region and commodity detail.

[14]:
sut_special = mario.parse_from_excel(
    path="/path/to/test_SUT_special.xlsx",
    table="SUT",
    mode="flows",
    tech_assumption="IT",
    matrix_layouts={
        "E": ("Region", "Commodity"),
    },
)
INFO Parser: excel reading SUT flows from /path/to/test_SUT_special.xlsx.
INFO Parser: state payload ready with 10 canonical blocks.
INFO Parser: excel state ready for SUT.
INFO Metadata: initialized.
[15]:
sut_special.E
INFO Resolver: resolving E for baseline.
INFO Resolver: trying E via concat.
INFO Resolver: resolved E via concat.
[15]:
Region Region 1 Region 2 Region 1 Region 2
Level Activity Activity Commodity Commodity
Item Manufacturing Services Manufacturing Services Goods Services Goods Services
Region Commodity Satellite account
Region 1 Goods Employment 1.125877e+06 1.964314e+05 5.779146e+03 1.181890e+03 0.0 0.0 0.0 0.0
Services Employment 7.505844e+05 1.309543e+05 3.852764e+03 7.879266e+02 0.0 0.0 0.0 0.0
Region 2 Goods Employment 1.501169e+06 2.619085e+05 7.705528e+03 1.575853e+03 0.0 0.0 0.0 0.0
Services Employment 3.752922e+05 6.547713e+04 1.926382e+03 3.939633e+02 0.0 0.0 0.0 0.0
Region 1 Goods CO2 7.771177e+12 3.109620e+12 6.687991e+10 1.693830e+10 0.0 0.0 0.0 0.0
Services CO2 3.330504e+12 1.332694e+12 2.866282e+10 7.259272e+09 0.0 0.0 0.0 0.0
Region 2 Goods CO2 8.881345e+12 3.553851e+12 7.643418e+10 1.935806e+10 0.0 0.0 0.0 0.0
Services CO2 2.220336e+12 8.884627e+11 1.910855e+10 4.839515e+09 0.0 0.0 0.0 0.0

Practical notes#

  • mode="flows" reads flow matrices; use mode="coefficients" only for coefficient workbooks.

  • matrix_layouts must use MARIO semantic index names, such as Region, Sector, Activity, Commodity, Factor of production, Final demand, and Satellite account.

  • Fully empty matrices are accepted and filled with zeros; partially empty matrices are treated as invalid input.

Layout notes and caveats#

The workbook that mario.parse_from_excel(...) reads is always one Excel file with at least a data sheet and a units sheet.

In the standard layout, rows and columns use three index levels: region, level name, and level label. The units sheet lists the units for sectors or activities or commodities, factors of production, and satellite accounts used by the workbook. If the sheet names are not the defaults, use data_sheet= and unit_sheet= explicitly.

For non-standard layouts, matrix_layouts= declares the extra semantic levels carried by matrices such as V and E. When extra levels are defined for V or E, MARIO applies the same logic to VY and EY, so they do not need separate declarations.

Practical caveats:

  • Excel is usually the easiest custom entry point, but it is not the best format for large roundtrip workflows.

  • mode= is never inferred automatically: choose "flows" or "coefficients" explicitly.

  • When you define extra indices for E or V, provide the complete label set under those indices, not only the labels visible in one sub-block.