HubSpot to SFDC Migration and Loading SFDC Notes through the API


Recently, I was part of an amazing team that completed a project for a client that involved migrating a large amount of data from HubSpot to SalesForce.  My remit was around data migration.

To manage the heavy lifting side of this, I built an app in Python and PostgreSQL to do all of this. The more nuanced/finessed stuff was largely done by the balance of the amazing team (i.e. setting up all the picklists and picklist values in SFDC).

Extract:

  • Companies
  • Contacts
  • Deals
  • Notes
  • Owners

Transform:

  • Companies -> Accounts
  • Contacts -> Contacts
  • Deals -> Opportunities
  • Notes -> Notes
  • Owners -> Owners

Load:

  • Accounts
  • Contacts (associated with Accounts)
  • Opportunities (associated with Accounts, Contacts, and Opportunity Products)
  • Notes (associated with Accounts, Contacts, and Opportunities)
  • Owners

During the load process, each record loaded gets its SFDC ID attributed to it, meaning we can do the association of accounts to contacts etc without a manual re-extraction of data from SFDC to get IDs.

The whole process was built to be iterative. By way of example, if a picklist value is missing for a field against a contact, all the contacts with that value would fail, but the balance would load. The error was logged, and once corrected, the whole process could be re-run with the load only occurring against those records that failed first time around. This made testing an efficient process as we tested in Sandbox (SBX) multiple times and were able to iterate to ‘perfection’. Equally importantly, if an Account failed to load, no contacts for that account would be loaded. Once the error with the Account was resolved and it was loaded, the contacts would get loaded without further effort on my (or anyone else’s) part.

On the day of migration, we ran the whole process in Production (including HS Extracts, Transform, and SFDC Loads), and it was effectively hands off. The data got extracted, stored, transformed, and loaded. We extracted around 16.5k HS Companies and 295k HS Contacts. There were custom rules in place that mean we loaded about 8k SFDC Accounts and 115k SFDC Contacts. The whole process took around 2.5 hours.

As part of the transform process, we (amongst other things):

  • Mapped HS picklist values to SFDC Values.
  • Mapped HS Owners to SFDC Owners.
  • Converted HS Datetime fields to SFDC Date fields.
  • Converted an HS Product Picklist to SFDC Opportunity Products.

A lot of the mappings were not one-to-one. There were owners that were consolidated, and picklist values that were consolidated.

There were plenty of other nuances here, but the one that I wanted to dig into a little was loading notes from HubSpot to SFDC.  Both are stored as HTML encoded data, and allow for formatting such as bold, italics, underline etc.  

I'm not going to lie, I struggled for a while to get Notes to load.  I started off trying:

  • To simply load the data into SFDC in the format that was extracted from HubSpot
  • To strip the HTML out, and load plaintext.  

Both of these approaches failed miserably, but after a lot of digging I came across this approach:

# 1. Encode the text to bytes using UTF-8 
noteContentBytes = aNote['content'].encode('utf-8')
# 2. Base64 encode the bytes
noteContentBase64Bytes = base64.b64encode(noteContentBytes)
# 3. Decode the base64 bytes to a string (for the API call)
noteContentBase64String = noteContentBase64Bytes.decode('utf-8')
newNote = sf.ContentNote.create({
'Title': aNote['title'],
'Content': noteContentBase64String})

The value of aNote['content'] was stored in PostgreSQL as a straightforward varchar(max).  There is a lot of other support code:

  • To extract note data and metadata from HS,
  • Store the note and associations in the DB,
  • Retrieve from the DB,
  • Using the simpleSalesforce library:
    • Insert the note into SFDC,
    • Insert note associations into SFDC,

But if you're ready to load an HTML note into HS, that snippet above will get you over the line.