StatCan parser walkthrough#

This notebook is the practical guide for parsing Statistics Canada supply-use and symmetric input-output tables in MARIO.

What this notebook covers#

  • where the StatCan tables come from;

  • when direct online parsing is enough and when a local cache is useful;

  • how SUT and IOT workflows differ;

  • how year=, level=, geo=, and valuation= are used;

  • what to expect from download=True;

  • which caveats matter for this API-driven parser.

Relevant source pages#

MARIO can query the official WDS API directly, so there is no need to download the raw .csv bundles manually unless you want a local cache.

Main entry point#

For normal user workflows, the public entry point is:

  • mario.parse_statcan(...)

The same function supports:

  • SUT tables;

  • IOT tables;

  • summary, detail, and link1997 levels where available;

  • direct online parsing or local cached raw files.

Key arguments#

The key public arguments are:

  • year: reference year to parse;

  • table: choose "SUT" or "IOT";

  • level: choose summary, detail, or link1997 when supported;

  • geo: geography label such as Canada or one province/territory;

  • valuation: only for IOT, usually basic or purchaser;

  • path: optional directory for locally cached raw files;

  • download: when True, MARIO downloads the raw WDS bundle into path and then parses it locally.

SUT versus IOT#

Use table="SUT" when you want the native split structure with S, U, Yc, Ya, Va, and Vc.

Use table="IOT" when you want the symmetric input-output representation with Z, Y, and V.

For SUT, level="link1997" is also available. For IOT, valuation= matters because the tables are exposed at both basic and purchaser prices.

Direct online parsing versus local cache#

Use direct online parsing when you just want the database and do not need the raw files afterwards.

Use download=True together with path=... when you want MARIO to keep the raw WDS .csv bundle locally. This is the most useful option when you parse the same StatCan table repeatedly or want a reproducible local cache.

Cache layout and supported values#

path is optional and acts as a local WDS cache:

StatCan/
└── cache/
    ├── 36-10-0438-01_*.csv
    └── 36-10-0001-01_*.csv

Supported levels:

  • SUT: summary, detail, link1997;

  • IOT: summary, detail.

For IOT, valuation can be basic or purchaser.

[1]:
import mario

Parse a Canada SUT directly from WDS#

This is the simplest StatCan workflow.

[2]:
db = mario.parse_statcan(
    year=2022,
    table="SUT",
    level="summary",
    geo="Canada",
)

db
INFO Parser: requesting StatCan WDS metadata for table 36100438.
INFO Parser: downloading StatCan table 36100438 from https://www150.statcan.gc.ca/n1/tbl/csv/36100438-eng.zip.
INFO Parser: StatCan SUT payload ready with shapes S=(32, 63), U=(63, 32), Yc=(63, 18), Va=(9, 32), Vc=(9, 63).
INFO Metadata: initialized.
[2]:
name = StatCan SUT summary Canada 2022
table = SUT
tech_assumption = industry-based
scenarios = ['baseline']
Activity = 32
Commodity = 63
Factor of production = 9
Satellite account = 1
Consumption category = 18
Region = 1

Parse a provincial SUT#

The SUT tables expose provinces and territories as separate geographies. The parser expects the published StatCan geography label.

[3]:
db = mario.parse_statcan(
    year=2022,
    table="SUT",
    level="detail",
    geo="Ontario",
)

db
INFO Parser: requesting StatCan WDS metadata for table 36100478.
INFO Parser: downloading StatCan table 36100478 from https://www150.statcan.gc.ca/n1/tbl/csv/36100478-eng.zip.
INFO Parser: StatCan SUT payload ready with shapes S=(216, 473), U=(473, 216), Yc=(473, 289), Va=(9, 216), Vc=(9, 473).
INFO Metadata: initialized.
[3]:
name = StatCan SUT detail Ontario 2022
table = SUT
tech_assumption = industry-based
scenarios = ['baseline']
Activity = 216
Commodity = 473
Factor of production = 9
Satellite account = 1
Consumption category = 289
Region = 1

Parse an IOT at basic prices#

For IOT, pass valuation="basic" or valuation="purchaser".

[4]:
db = mario.parse_statcan(
    year=2022,
    table="IOT",
    level="detail",
    valuation="basic",
)

db
INFO Parser: requesting StatCan WDS metadata for table 36100001.
INFO Parser: downloading StatCan table 36100001 from https://www150.statcan.gc.ca/n1/tbl/csv/36100001-eng.zip.
INFO Parser: StatCan IOT payload ready with shapes Z=(252, 252), Y=(252, 306), V=(8, 252).
INFO Metadata: initialized.
[4]:
name = StatCan IOT detail Canada 2022 Basic price
table = IOT
scenarios = ['baseline']
Factor of production = 8
Satellite account = 1
Consumption category = 306
Region = 1
Sector = 252

Parse an IOT at purchaser prices#

The public API stays the same; only the valuation selector changes.

[5]:
db = mario.parse_statcan(
    year=2022,
    table="IOT",
    level="summary",
    valuation="purchaser",
)

db
INFO Parser: requesting StatCan WDS metadata for table 36100084.
INFO Parser: downloading StatCan table 36100084 from https://www150.statcan.gc.ca/n1/tbl/csv/36100084-eng.zip.
INFO Parser: StatCan IOT payload ready with shapes Z=(32, 32), Y=(32, 28), V=(7, 32).
INFO Metadata: initialized.
[5]:
name = StatCan IOT summary Canada 2022 Purchaser price
table = IOT
scenarios = ['baseline']
Factor of production = 7
Satellite account = 1
Consumption category = 28
Region = 1
Sector = 32

Keep the raw WDS files locally#

When download=True, MARIO stores the downloaded raw .csv bundle inside path and then parses the local file. This is useful when you want to avoid re-downloading the same table.

[6]:
db = mario.parse_statcan(
    year=2022,
    table="SUT",
    level="detail",
    geo="Quebec",
    path="/path/to/StatCan",
    download=True,
)

db
INFO Parser: reading local StatCan CSV statcan_36100478_sut_detail.csv.
INFO Parser: StatCan SUT payload ready with shapes S=(217, 472), U=(472, 217), Yc=(472, 289), Va=(9, 217), Vc=(9, 472).
INFO Metadata: initialized.
[6]:
name = StatCan SUT detail Quebec 2022
table = SUT
tech_assumption = industry-based
scenarios = ['baseline']
Activity = 217
Commodity = 472
Factor of production = 9
Satellite account = 1
Consumption category = 289
Region = 1

Caveats#

  • the parser is API-driven, so network availability matters more than for local-file parsers;

  • download=True is optional, not required;

  • this walkthrough focuses on the economic StatCan tables only;

  • environmental extensions are intentionally left out here.