Skip to content

pwallace/ia-templatizer-flask

Repository files navigation

IA Templatizer — Quick Start

IA Templatizer is a web application for batch-generating metadata CSV files for Internet Archive ingest. Upload your input CSV and a metadata template (JSON), configure a few options, and download a correctly formatted output CSV ready for use with the Internet Archive upload tools.

Full documentation: see IA-Templatizer-User-and-Developer-Manual.md
Developer reference: see Student-Programmer-Reference.md


Starting the App

You need Python 3.7+ and Flask installed. From the project directory:

pip install -r requirements.txt
python app.py

Then open http://localhost:5000 in your browser.


Using the Form

The form has three sections.

1 — Template

Select a built-in template from the dropdown, or upload a custom .json template file. Built-in templates are the .json files in the templates/ folder. When you select a built-in template, the Options fields are pre-filled from that template's embedded settings.

You must supply either a built-in template selection or a custom upload — not both.

2 — Input

Upload your input CSV. This is the source metadata file you want to process — either a manually prepared file listing or a MODS-derived export from CONTENTdm.

3 — Options

Field Default Description
Multi-value delimiter |@| The string used to separate multiple values within a single cell in your source CSV.
Type column type Column that identifies the row type. Used when flattening compound objects.
Flatten compound objects off Check this when your source CSV has MODS compound-object structure (one item row followed by child GraphicalPage rows).
Advanced options Expand for page type value, images column, and sequence column (rarely need changing from defaults).
Expand directories off When the file column contains a directory path, generate a row for each file inside that directory.

Click Process CSV → to run.


Results

After processing, you will see a results page showing:

  • A log of how many rows were read, mapped, and written.
  • Any warnings (invalid dates, unrecognised rights statements, etc.).

Click Download to save the output CSV. After downloading, the result is removed from the server. If you need to run again, return to the form with ← Process another file.


Templates

A template is a JSON file that defines default metadata values and (optionally) a column mapping and runtime options. Two formats are supported:

Flat format — all fields at the top level. Best for manually prepared CSVs:

{
  "identifier-prefix": "hamilton",
  "mediatype": "texts",
  "collection": ["hamilton"],
  "rights-statement": "http://rightsstatements.org/vocab/NKC/1.0/",
  "subject": ["Hamilton College", "Photographs"]
}

Combined format — wraps defaults, column mapping, and options in named sections. Used for MODS pipelines:

{
  "defaults": {
    "mediatype": "texts",
    "collection": ["hamilton"],
    "subject": ["Hamilton College", "Communal societies"]
  },
  "mapping": {
    "mods_titleinfo_title": "title",
    "mods_subject_topic": "subject"
  },
  "options": {
    "flatten": false,
    "images_col": "files",
    "delimiter": "|@|"
  }
}

The app detects the format automatically. See the User & Developer Manual for a full reference of template fields, control fields, and the ! override prefix.


Output CSV

The output CSV contains one row per input item (plus continuation rows for compound objects), with columns in a fixed order:

identifierfilemediatypecollection[n]titledatecreatordescriptionsubject[n] → additional fields

Repeatable fields such as subject and collection are expanded into indexed columns (subject[0], subject[1], …). Template default values are merged with values from the input CSV; duplicates are removed.


Worked Example

The American Socialist newspaper collection uses the built-in template template_oneida-american-socialist.json with a MODS-derived source CSV.

  1. Select template_oneida-american-socialist.json from the dropdown.
  2. Upload MODS_Oneida_American_Socialist_ZIPs.csv as the input CSV.
  3. Leave all options at their defaults (the template sets images_col: files and delimiter: |@| automatically).
  4. Click Process CSV →.
  5. The results page will report 196 input rows and 196 output rows.
  6. Download the output CSV.

The output will have subject[0] through subject[8] (nine subjects from the template), a rights-statement column, and notes and source columns populated from the template defaults.


About

IA Templatizer: Web Edition is an application for generating CSV files for Internet Archive batch ingest (w/ pipeline option for mapping MODS metadata). Upload your input CSV and a metadata template (JSON), configure a few options, and download a correctly formatted output CSV ready for use with the Internet Archive upload tools.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors