USEEIO parser walkthrough#
This notebook is the practical guide for parsing USEEIO workbooks in MARIO.
What this notebook covers#
where the currently supported
USEEIOworkbook exports come from;what the main published model aliases mean;
what MARIO parses today and what it intentionally does not parse;
how to interpret
format=correctly;how to use
path=andformat=;why the resulting database is treated as a
SUT;one important caveat on release year versus internal IO year.
Relevant source pages#
Data.gov catalog page: USEEIO v2.5 models
EPA model page: USEEIO models
EPA technical page: USEEIO technical content
Framework and workbook format reference: USEPA/useeior
MARIO does not download USEEIO workbooks automatically. The expected workflow is to download one workbook manually and then point the parser to that local file.
Main entry point#
For normal user workflows, the public entry point is:
mario.parse_useeio(...)
At the moment, MARIO supports only the verified workbook export family behind that entry point.
What MARIO currently parses#
The current backend is intentionally narrow:
it parses the local Excel workbook export;
it currently supports only the verified
v2.5workbook structure;it does not parse the full
useeiorbuild framework directly.
This is why the parser exposes an explicit format= argument even though only one format is supported today.
What the main model aliases mean#
The official USEPA/USEEIO model registry distinguishes model aliases such as yellowthroat, kingbird, or catbird. These aliases describe the content of the model, not the workbook layout.
For the currently relevant national v2.5 families:
yellowthroat: BEA Summary 2017, GLORIA-backed, with GHG and material-footprint extensions;waxwing: BEA Detail 2017, GLORIA-backed, with GHG and material-footprint extensions;kingbird: BEA Summary 2017, EXIOBASE-backed, with GHG extensions;kinglet: BEA Detail 2017, EXIOBASE-backed, with GHG extensions;oriole: BEA Summary 2017, CEDA-backed, with GHG extensions;catbird: BEA Detail 2017, CEDA-backed, with GHG extensions.
The current MARIO parser targets these national workbook exports, not the StateEEIO families that are listed in the same upstream registry.
One caution from the upstream registry: USEEIO v2.5-waxwing-22 is marked as deprecated because it was published with an incorrect extension and replaced upstream by v2.5.1-waxwing-22.
What format means#
format= is a parser-side selector. It tells MARIO what workbook organization it should expect.
So:
yellowthroatandkingbirdare different models;but they can still share the same parser
formatif their workbook tabs and matrix semantics are organized in the same way.
Today the parser supports:
format="auto": inspect the workbook and resolve the known layout automatically;format="v2.5_workbook": force the currently verified workbook structure.
For this verified format, MARIO expects the workbook to expose the standard useeior export components such as V, U, B, q, commodities_meta, final_demand_meta, and value_added_meta.
Key arguments and workbook layout#
Key public arguments:
path: one localUSEEIO*.xlsxworkbook or one directory containing one workbook;format: use"auto"or"v2.5_workbook";table: currently only"SUT"is supported.
Typical local layout:
USEEIO/
├── USEEIOv2.5-yellowthroat-22.xlsx
├── USEEIOv2.5-kingbird-22.xlsx
└── USEEIOv2.5-catbird-22.xlsx
Inside the workbook, MARIO expects the useeior export sheets such as V, U, B, q, commodities_meta, final_demand_meta, and value_added_meta. The workbook release label can differ from the internal IO year stored in the parsed metadata.
[1]:
import mario
Parse one explicit workbook#
Use this when you already know the exact workbook you want to ingest.
[2]:
db = mario.parse_useeio(
path="/path/to/USEEIOv2.5-yellowthroat-22.xlsx",
format="auto",
)
INFO Parser: USEEIO workbook parsed with 71 activities, 73 commodities, 3 value-added rows and 30 satellite rows.
INFO Metadata: initialized.
Pin the workbook format explicitly#
If you want to be strict and avoid auto detection, pin the verified format directly.
[3]:
db = mario.parse_useeio(
path="/path/to/USEEIOv2.5-waxwing-22.xlsx",
format="v2.5_workbook",
)
INFO Parser: USEEIO workbook parsed with 402 activities, 402 commodities, 3 value-added rows and 30 satellite rows.
INFO Metadata: initialized.
Parse from a directory#
You can also point the parser to a directory, but only when that directory contains a single workbook.
[4]:
db = mario.parse_useeio(
path="/path/to/USEEIO",
model_alias="yellowthroat",
release_year=2022,
)
INFO Parser: USEEIO workbook parsed with 71 activities, 73 commodities, 3 value-added rows and 30 satellite rows.
INFO Metadata: initialized.
Why MARIO parses USEEIO as a SUT#
For the verified workbook family, the structure is closer to a split-native SUT than to a symmetric IOT:
Vis the make matrix;Ucontains the use block plus final demand columns and value-added rows;Bis a direct environmental coefficient matrix;qprovides the commodity-output vector used to reconstruct direct environmental flows.
So the parser returns the native S, U, Yc, Va, Ec, … blocks rather than forcing a symmetric IOT interpretation.
Commodity-side environmental extension#
One important detail of the verified v2.5 workbook layout is that B is aligned with the commodity axis. Because of that, MARIO reconstructs the direct extension on the commodity side:
Ec = B * q
This is deliberate. For this workbook family, Ec is the correct direct extension block, while Ea stays zero-filled.
Release year versus internal IO year#
Some USEEIO workbooks expose a release label that differs from the internal IO year used by the model. MARIO stores the internal IO year in the parsed database metadata.
So if a workbook is labelled like ...-22.xlsx but the internal economic base year is 2017, MARIO will record 2017 as db.meta.year.
[5]:
db.meta.name, db.meta.year, db.meta.price
[5]:
('USEEIO v2.5 yellowthroat 2022', 2022, 'Model-year USD')
Current limitations#
only the verified
v2.5_workbookformat is supported;there is no MARIO downloader yet;
the parser targets the workbook export, not the full
useeiorframework pipeline.
That is also why keeping format= explicit is useful: it leaves room for future workbook families without pretending they are already supported.