IA Templatizer — Quick Start

IA Templatizer is a web application for batch-generating metadata CSV files for Internet Archive ingest. Upload your input CSV and a metadata template (JSON), configure a few options, and download a correctly formatted output CSV ready for use with the Internet Archive upload tools.

Full documentation: see IA-Templatizer-User-and-Developer-Manual.md
Developer reference: see Student-Programmer-Reference.md

Starting the App

You need Python 3.7+ and Flask installed. From the project directory:

pip install -r requirements.txt
python app.py

Then open http://localhost:5000 in your browser.

Using the Form

The form has three sections.

1 — Template

Select a built-in template from the dropdown, or upload a custom .json template file. Built-in templates are the .json files in the templates/ folder. When you select a built-in template, the Options fields are pre-filled from that template's embedded settings.

You must supply either a built-in template selection or a custom upload — not both.

2 — Input

Upload your input CSV. This is the source metadata file you want to process — either a manually prepared file listing or a MODS-derived export from CONTENTdm.

3 — Options

Field	Default	Description
Multi-value delimiter	`\|@\|`	The string used to separate multiple values within a single cell in your source CSV.
Type column	`type`	Column that identifies the row type. Used when flattening compound objects.
Flatten compound objects	off	Check this when your source CSV has MODS compound-object structure (one item row followed by child `GraphicalPage` rows).
Advanced options	—	Expand for page type value, images column, and sequence column (rarely need changing from defaults).
Expand directories	off	When the `file` column contains a directory path, generate a row for each file inside that directory.

Click Process CSV → to run.

Results

After processing, you will see a results page showing:

A log of how many rows were read, mapped, and written.
Any warnings (invalid dates, unrecognised rights statements, etc.).

Click Download to save the output CSV. After downloading, the result is removed from the server. If you need to run again, return to the form with ← Process another file.

Templates

A template is a JSON file that defines default metadata values and (optionally) a column mapping and runtime options. Two formats are supported:

Flat format — all fields at the top level. Best for manually prepared CSVs:

{
  "identifier-prefix": "hamilton",
  "mediatype": "texts",
  "collection": ["hamilton"],
  "rights-statement": "http://rightsstatements.org/vocab/NKC/1.0/",
  "subject": ["Hamilton College", "Photographs"]
}

Combined format — wraps defaults, column mapping, and options in named sections. Used for MODS pipelines:

{
  "defaults": {
    "mediatype": "texts",
    "collection": ["hamilton"],
    "subject": ["Hamilton College", "Communal societies"]
  },
  "mapping": {
    "mods_titleinfo_title": "title",
    "mods_subject_topic": "subject"
  },
  "options": {
    "flatten": false,
    "images_col": "files",
    "delimiter": "|@|"
  }
}

The app detects the format automatically. See the User & Developer Manual for a full reference of template fields, control fields, and the ! override prefix.

Output CSV

The output CSV contains one row per input item (plus continuation rows for compound objects), with columns in a fixed order:

identifier → file → mediatype → collection[n] → title → date → creator → description → subject[n] → additional fields

Repeatable fields such as subject and collection are expanded into indexed columns (subject[0], subject[1], …). Template default values are merged with values from the input CSV; duplicates are removed.

Worked Example

The American Socialist newspaper collection uses the built-in template template_oneida-american-socialist.json with a MODS-derived source CSV.

Select template_oneida-american-socialist.json from the dropdown.
Upload MODS_Oneida_American_Socialist_ZIPs.csv as the input CSV.
Leave all options at their defaults (the template sets images_col: files and delimiter: |@| automatically).
Click Process CSV →.
The results page will report 196 input rows and 196 output rows.
Download the output CSV.

The output will have subject[0] through subject[8] (nine subjects from the template), a rights-statement column, and notes and source columns populated from the template defaults.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apps		apps
codebase		codebase
docs		docs
templates		templates
web/templates		web/templates
LICENSE		LICENSE
README.md		README.md
app.py		app.py
ia-templatizer.py		ia-templatizer.py
requirements.txt		requirements.txt
test-load.csv		test-load.csv
test-load_output.csv		test-load_output.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IA Templatizer — Quick Start

Starting the App

Using the Form

1 — Template

2 — Input

3 — Options

Results

Templates

Output CSV

Worked Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IA Templatizer — Quick Start

Starting the App

Using the Form

1 — Template

2 — Input

3 — Options

Results

Templates

Output CSV

Worked Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages