ProPublica: Data Migration & Publishing Automation

Built custom Craft CMS plugins for migrating 3,000+ pages of investigative journalism from Expression Engine with automated publishing schedules

At a Glance

  • 3,000+ pages - migrated from Expression Engine to Craft CMS across multiple content structures
  • Custom import plugin - with data cleanup, normalization, and extraction from dirty legacy data
  • Batching & queue system - for long-running migrations without timeouts or memory issues
  • Publications scheduling - plugin enabling editors to schedule article auto-publishing
  • Image migration - handling broken paths and Craft CMS asset integration challenges

Senior Backend Developer on the ProPublica Craft CMS migration (via Solspace Inc.). I built the backend systems that made this migration possible: a custom import plugin that cleaned and normalized 3,000+ pages of legacy content from Expression Engine, and a publications scheduling plugin that gave editors control over when articles auto-publish to the homepage.

ProPublica is a Pulitzer Prize-winning investigative journalism organization. Their legacy Expression Engine website contained years of journalism that needed to migrate cleanly to Craft CMS without data loss or corruption.


The Data Migration Challenge

The Expression Engine data was messy. Inconsistent date formats, embedded HTML that needed cleaning, missing relationships between content, and duplicate records across sections. A simple export-import would have created chaos.

I built a custom Craft CMS import plugin that handled:

  • Data cleanup and normalization: Extracting clean dates from inconsistent formats, normalizing author names, stripping problematic HTML while preserving content integrity
  • Relationship reconstruction: Rebuilding connections between articles, authors, and categories that existed implicitly in the legacy data
  • Deduplication: Identifying and handling duplicate records across different content sections
  • Image migration: Solving broken asset paths and working around Craft CMS’s asset handling to ensure images attached correctly to their articles

The migration had to process 3,000+ pages across multiple content structures. This required a custom batching and queue mechanism to run the import over extended periods without hitting memory limits or timeout errors. The plugin could pause and resume, track progress, and handle failures gracefully.


Publications Scheduling Plugin

Beyond migration, I built a publications scheduling plugin that solved a specific editorial workflow need. Editors could write articles in advance and link them to homepage components with a scheduled date and time. When that time arrived, the article would auto-publish to the website without manual intervention.

This removed workflow friction for the editorial team: no more logging in at odd hours to publish time-sensitive content, no more missed publication windows.


What I Delivered

  • Custom Craft CMS import plugin with data cleanup, normalization, and extraction logic
  • Batching and queue system for long-running migrations
  • Image migration handling for broken paths and Craft asset integration
  • Publications scheduling plugin for automated article publishing
  • Support for migrating 3,000+ pages across various content structures

Interested in Similar Work?

If you're looking for similar solutions or want to discuss your project, I'd be happy to help.

Related Case Studies:

Implemented solutions:

Used technologies: