Engineering

How to Migrate Production Code to a Monorepo

Posted: August 13, 202411 min read
<- Back to Blog Home

Share

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!Sign up

In February 2024, the UI Platform team moved 1.3M lines of React micro-frontend code to a monorepo while retaining git history. Our team is responsible for the frontend architecture and UI Engineer experience at DigitalOcean, and moving to a monorepo is part of our frontend vision, of which much is lifted from Monica Lent’s Building Resilient Frontend Architecture talk. With a monorepo, we aimed to reduce our dependency management burdens and simplify our micro-frontend boilerplate to ultimately increase developer velocity.

While there are plenty of guides for getting started with monorepos, there are few that touch on migrating existing repositories over. This is the guide I wish I had when we started and I hope it helps someone else!

What is a monorepo?

A monorepo is a collection of isolated packages that live in a single repository. It reduces friction between shared code while keeping the safety gained from isolation. In contrast to a monolithic repository where the entire application is deployed as one, a monorepo allows packages to be deployed on their own.

Approach: moving to a monorepo

We’re fans of Kent Beck’s famous refactoring quote, “First make the change easy (warning this may be hard), then make the easy change”, and applied it to this work as best we could. In its essence, a monorepo is code colocation, so we restricted the actual migration to that alone; there would be no functional change in any of the apps but they would live next to each other. Any changes required to an app would get applied while it was in its own repo, so problems with colocation were isolated.

Our apps had been created over a period of roughly three years, and in many cases, the things that were learned from newer apps were not applied to older apps. It created a fair bit of inconsistency which added complexity to colocation and kicked off refactoring cycles. As we worked through each app, they needed to: run the local dev environment, tests, linters, and IDE plugins; run the CI/CD pipelines; and deploy to our staging environment. At least one of those steps broke with any two apps colocated, so we’d refactor the independent repos until the problem was resolved. Eventually, any two apps worked together, which actually meant all of the apps worked together.

For this article, I’ll break the project into three stages, though some pre-migration steps only became apparent as we worked through the task:

  1. Pre-migration: making the change easy

  2. Migration: colocating the apps

  3. Post-migration: optimizing the monorepo

Pre-migration: making the change easy

Scripting

We made automation our guiding principle–every change needed to be run from a script so that it was reproducible from scratch. We used zx so we could use both Node and CLI tooling in the same script. As we solved problems through refactoring, we’d update the script and template files (that mimicked the file structure of the future monorepo) and re-run it. We ran the script hundreds of times as it evolved and were able to eliminate human error on the day of the final migration because of the approach.

The script ran from an external repo so it wouldn’t be overwritten by force pushes, and performed the following steps:

  1. Initialized git in a temporary monorepo.

  2. Cloned each repo into a temporary folder.

  3. Removed things that would become irrelevant after migration and couldn’t be completed prior, like deleting yarn.lock and .nvmrc.

  4. Created a move commit that put all the files in the correct workspace folder.

  5. Merged the unrelated histories from local remotes.

  6. Copied the template files into the monorepo.

  7. and finally force-pushed the repository.

This is it, with annotations, in its entirety:

process.env.FORCE_COLOR = '1';

import { $, path, os, cd, spinner } from 'zx';



const SCRIPT_ROOT = path.resolve(__dirname);

const MONOREPO = path.join(os.tmpdir(), `monorepo-${Date.now()}`);

const REPO_PREFIX = 'git@github.com:username/';

const REPO_SUFFIX = '.git';

// repo names to fetch from Github

const REPOS = ['repo-a', 'repo-b'];



// 1. Initialize git in monorepo

cd(MONOREPO);

await $`mkdir -p ${MONOREPO}/apps/`;

await $`git init`;

await $`git commit --allow-empty -m "Initial commit"`;

cd(SCRIPT_ROOT);



// Merge git histories loop

for await (const repo of REPOS) {

  const repoUrl = `${REPO_PREFIX}${repo}${REPO_SUFFIX}`;

  const tempRepo = path.join(os.tmpdir(), `${repo}-${Date.now()}`);



  // 2. Clone the app into a temporary folder

  await $`mkdir -p ${tempRepo}`;

  await $`git clone ${repoUrl} ${tempRepo}`;

  cd(tempRepo);



  // 3. Remove these files and folders because they're no longer necessary and it speeds up this script

  await $`rm -f .gitignore .gitattributes .github .nvmrc yarn.lock node_modules .yarn build`;

  // try…catch so non-zero exit codes don't stop the script from continuing

  try {

    await $`git add .`;

    await $`git diff --staged --quiet || git commit -m "[${repo}]: Remove conflicting files" --no-verify`;

  } catch {}



  // 4. Create a move commit

  // In order to preserve git history accurately, we need to create a

  // move commit from the root of the sub-repo into a directory that

  // imitates the monorepo ie. from ./ to ./apps/

  const mainBranch = (await $`git branch --show-current`).stdout.trim();

  await $`mkdir -p apps/${repo}`;

  await $`git ls-tree ${mainBranch} --name-only | xargs -I{} git mv {} apps/${repo}`;

  await $`git commit -m "[${repo}]: Move ${repo} to app/${repo}"`;



  cd(MONOREPO);



  // 5. Merge git history using local remote so changes wouldn't break live codebases

  await $`git remote add ${repo} ${tempRepo}`;

  await $`git fetch ${repo}`;

  await $`git merge --allow-unrelated-histories ${repo}/main`;

  await $`git remote rm ${repo}`;

}



// 6. Copy template files

cd(SCRIPT_ROOT);

await $`cp -a monorepo-template/. ${MONOREPO}`;

cd(MONOREPO);



// Create fresh yarn.lock, yarn install exits with non-zero

try {

  await $`yarn install --refresh-lockfile`;

} catch {}



await $`git add .`;

await $`git commit -m "Init monorepo"`;



// 7. Rebuild the monorepo every time

await $`git remote add origin git@github.com:username/your-new-monorepo.git`;

await spinner(() => $`git push -f origin main`);

console.log('🎉 monorepo is live');

Github Action workflows

Updating our CI/CD jobs in Github Actions to support running both single- and multi-app repositories was one of the first tasks. We passed a working-directory into shared actions so each job would run from the application’s folder instead of the root as if it were in a single-app repository. We used working-directory as the input parameter name and set the default to ’.’ for backwards compatibility.

Our deploy workflows had custom keys, like app_name and service_id, which were hard-coded strings in each repo’s deploy workflow. We extracted these values into another file and added a step to read them so our workflow actions could be generic.

In the templated files, we built an action that could detect what workspaces changed, then would return a matrix to fire off subsequent jobs for only changed workspaces. It reduced wasted Github Action time, but also prevented more critical things like unnecessary deployments or e2e jobs from running.

Yarn 4 upgrade

After a couple of days attempting to fix inter-app dependency conflicts in Yarn 1, we decided upgrading to Yarn 4 was a required milestone because of its improved workspaces support. With nmHoistingLimits set to workspaces, each app could contain conflicting dependencies, effectively running in isolation.

Yarn has a great migration guide and was painless for the most part. We broke the work into two pull requests per application: explicitly add undeclared dependencies as per Yarn’s rules; and complete the upgrade to Yarn 4. In practice, I upgraded each app locally, then ran yarn dlx @yarnpkg/doctor and npx depcheck to identify the missing packages. Once I had the list, I reinstalled them on a new branch to safely separate changed dependencies from the Yarn upgrade.

The way Yarn is installed has fundamentally changed between version 1 and 4, so I needed to support the team when they ran into issues upgrading on their machines. In all cases, the problems stemmed from location issues, typically with the wrong version of Yarn running. Node, Corepack, and Yarn all need to be installed within your Node version manager, like /Users/you/.nvm/versions/node/v20.9.0/bin/node. You can check the locations with:

which node
# should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/node

# if you're using nvm and you get something else run:

# nvm use



which corepack

# should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/corepack

# if you get something else run:

# corepack enable



which yarn

# should output something like /Users/you/.nvm/versions/node/v20.9.0/bin/yarn

# if you get something else run:

# corepack install

Migration: colocating the apps

Once all apps were running as expected, we announced a migration date and the full plan. Like Stripe’s migration from Flow to TypeScript, we wanted developers to leave Friday afternoon and start work Monday morning in the brand new codebase with no ceremony.

On the day of, we posted steps in Slack so there was a clear record in case anything went wrong and that anyone watching could follow along. The steps were largely double-checks, but obviously included the actual migration too.

  1. We ran through one last review of the build script and template files then compared it against the last working run.

  2. We ran the script for the last time, rewriting the repo history again with a force-push.

  3. We manually kicked off the PR CI/CD pipeline to confirm all the apps pass.

  4. We manually ran the staging deploy jobs to ensure all the apps deployed.

  5. We turned on branch protection, merge checks, permissions, and other repo settings, as well as enabled our automatic CI/CD jobs.

  6. And finally, we archived the old app repositories.

We left instructions for getting started and held office hours for any engineers to drop in and troubleshoot each day for the following week. We also migrated a handful of open PRs that weren’t merged by the migration date with a couple commands from the command line:

# From the archived repo, rebase your PR commits into a single commit, change the sha prefix to fixup

git rebase main -i



# Run the move commit so all files live within an ./apps/ directory like the monorepo

# This only moves changed files to reduce conflicts + commit noise

APP_NAME=REPLACE_THIS_WITH_YOUR_APP_NAME

for file in $(git diff main --name-only --cached); do target_path=$(dirname $file); mkdir -p "apps/$APP_NAME/$target_path"; git mv $file "apps/$APP_NAME/$target_path" -v; done;



# Squash the commit to previous batch of PR commits

git commit --amend --no-edit



# Copy the sha output

SHA=$(git rev-parse --short HEAD)



# In the monorepo

cd monorepo



# Checkout a new branch that matches the original PR name

git checkout -b …



# Assuming the monorepo and original repo are in sibling folders, run

git --git-dir=../${APP_NAME}/.git format-patch -k -1 --stdout ${SHA} | git am -3 -k

# Then open the new PR

Post-migration: optimizing the monorepo

The following few weeks after the migration were spent tidying and optimizing it.

We installed dependency-cruiser to restrict the ability to reach into sibling modules through the file system and instead require standard package importing. This keeps our monorepo code isolated and prevents a ball-of-mud from forming. The rule that enforces that looks like:

{

  name: 'apps-not-to-apps',

  comment: 'One app should not reach into another app (in a separate folder)',

  severity: 'error',

  from: { path: '(^apps/)([^/]+)/' },

  to: { path: '^$1', pathNot: '$1$2' },

}

We moved packages and settings (like Prettier and Browserlist) that were duplicated in workspaces into the root directory, and then standardized them. We also abstracted developer dependencies (like eslint, stylelint, Cypress, and Jest) into isolated workspaces under ./packages, then imported them into each app with workspace:*. These new packages are self-contained so all of their plugins and settings could be accessed with a single import, and so it would be easy to keep track of their versions.

Our team made several Github Action improvements as scaling problems immediately surfaced when our pipelines ran across multiple applications.

  • We changed all yarn install commands to yarn workspaces focus so each workflow only installed the dependencies of an isolated app. The first yarn install in a monorepo can take a long time, and we regularly hit workflow timeouts. It’s likely that switching to Yarn Plug’n’Play will speed up installs with a cache as well, but we’re not quite there yet.

  • We gated our PR workflow jobs to reduce the time any workflow would complete and reduce the burden on parallel jobs. In order, the gates run: our build matrix job to determine what apps have changed; build commands and a non-matrix lint job that runs across the repo to reduce container setup time; unit-tests; and finally e2e tests. The tradeoff in parallelization has been well worth the reliability of successful runs, and we’ve moved individual jobs into combined workflows to reduce container setup time which has kept our total job time relatively unchanged.

  • We added run-name to our deploy workflows so it’s really easy to see what job is associated with which app. We also make heavy use of $GITHUB_STEP_SUMMARY for both debugging jobs and reporting.

  • We added max-parallel because some of our actions were getting rate-limited by external services.

Finally, we added a .git-blame-ignore-revs file with the shas of batch commits so they would get hidden from git history.

Conclusion

This move took us one quarter to complete and was the largest frontend code migration at DigitalOcean thus far. We’ve seen the average number of React-related feature PRs increase by 1.6x, and the average number of internal library bumps decrease by 95%. While it’s harder to get an accurate measurement, each batch of our library bumps used to take most of the day and can now be released and upgraded in under an hour. Soon we will completely eliminate those bumps with Module Federation. It’s also been significantly easier and safer to do sweeping changes, like fixing all our eslint errors and warnings, or upgrading third-party libraries.

There’s always room for improvement and the two challenges we ran into were from the Yarn 4 upgrade and our CI/CD deploy pipeline. We hadn’t communicated how critical Yarn 4 was to the project and that it was our new norm for frontends, so we inadvertently left some team members behind with Yarn 1. When the monorepo launched, they were unable to get the repo running and we spent most of the first few days troubleshooting environments. Additionally, while we ran staging deploys both before and on the day of the migration, we failed to consider running production deploys which were slightly different. Our automated production pipeline was broken first thing Monday morning, but we luckily had it up again before lunch. For the next project, we’ve created a more robust release template that includes communication and support around required developer changes as well as better steps for the entire production process.

Breaking work down as if it were a refactor worked extremely well. We were able to keep track of progress (even as new tasks were added) and point to discrete batches of work for both issues and successes. The approach felt measured and straightforward with very little room for surprises or risk. There are still things to optimize across our frontend architecture and the monorepo is helping us move through it much faster. If you’re starting with a brand new repo, I’d like to recommend these four articles that helped us along the way:

Share

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!Sign up

Related Articles

How SMBs and startups scale on DigitalOcean Kubernetes: Best Practices Part V - Disaster Recovery
Engineering

How SMBs and startups scale on DigitalOcean Kubernetes: Best Practices Part V - Disaster Recovery

Simplifying Distributed App Complexity with App Platform’s Log Forwarding and OpenSearch
Engineering

Simplifying Distributed App Complexity with App Platform’s Log Forwarding and OpenSearch

Enhancing Search Capabilities with K-NN Vector Search in OpenSearch
Engineering

Enhancing Search Capabilities with K-NN Vector Search in OpenSearch