TXT, CSV, and Parquet custom parser walkthrough#

This walkthrough covers the directory-based parsers for TXT, CSV, and Parquet payloads.

Main Entry Points#

mario.parse_from_txt(...)
mario.parse_from_parquet(...)

Key arguments and directory layout#

Use mario.parse_from_txt(...) for TXT or CSV payloads and mario.parse_from_parquet(...) for Parquet payloads.

Key arguments:

path: directory containing the files to parse;
table: choose "IOT" or "SUT";
mode: choose "flows" or "coefficients";
flat: set True for long-format payloads;
sep and _format: TXT or CSV only;
matrix_layouts: optional semantic declaration for non-standard matrix layouts;
tech_assumption: optional SUT selector for IT or PT.

Matrix-per-file payloads look like:

custom_txt_database/
├── Z.csv
├── Y.csv
├── V.csv
├── E.csv
├── EY.csv
└── units.csv

Flat payloads can use one combined data file plus units, or one flat file per matrix plus units. The same directory logic applies to Parquet.

Packaged example directories#

The parser examples below use exported database folders bundled with the documentation:

Extract each archive locally and point path to the inner flows or coefficients directory shown in the code examples below.

[ ]:

import mario

Matrix-per-file TXT or CSV#

Use flat=False for the historical matrix-per-file layout.

[ ]:

db = mario.parse_from_txt(
    path="/path/to/iot_export_csv/flows",
    table="IOT",
    mode="flows",
    _format="csv",
    flat=False,
)

INFO Parser: txt reading IOT flows from /path/to/MARIO/mario/test/supporting_files/iot_export_csv/flows in matrix mode (csv).
INFO Parser: Reading flows from txt files.
INFO Parser: Reading files finished.
INFO Parser: Investigating possible identifiable errors.
INFO Parser: parsing database finished.
INFO Parser: state payload ready with 6 canonical blocks.
INFO Parser: txt state ready for IOT.
INFO Metadata: initialized.

Flat TXT or CSV#

Use flat=True for long-format payloads. MARIO accepts either one combined data file or separate flat files per matrix, as long as units is present.

[ ]:

db = mario.parse_from_txt(
    path="/path/to/iot_export_csv",
    table="IOT",
    mode="coefficients",
    _format="csv",
    flat=True,
)

INFO Parser: txt reading IOT coefficients from /path/to/MARIO/mario/test/supporting_files/iot_export_csv/coefficients in flat mode (csv).
INFO Parser: reading coefficients from flat txt files.
INFO Parser: state payload ready with 6 canonical blocks.
INFO Parser: txt state ready for IOT.
INFO Metadata: initialized.

Flat Parquet#

The same logic applies to Parquet exports.

[ ]:

db = mario.parse_from_parquet(
    path="/path/to/iot_export_parquet",
    table="IOT",
    mode="flows",
    flat=True,
)

INFO Parser: parquet reading IOT flows from /path/to/MARIO/mario/test/supporting_files/iot_export_parquet/flows in flat mode.
INFO Parser: state payload ready with 6 canonical blocks.
INFO Parser: parquet state ready for IOT.
INFO Metadata: initialized.

Reference notes and caveats#

These parsers are directory-based. path must point to one directory, not to one individual file.

Use flat=True for long-format payloads. For TXT or CSV parsing, _format and sep matter; for Parquet parsing they do not.

The same semantic rules used for custom Excel parsing also apply here: if one IOT layout carries extra semantic levels, declare them through matrix_layouts= instead of relying on filename conventions alone.

These formats are usually preferable to Excel when the data already comes from a MARIO export or from an automated preprocessing workflow.

Download this notebook