Recently, I was part of an amazing team that completed a project for a client that involved migrating a large amount of data from HubSpot to SalesForce. My remit was around data migration.
To manage the heavy lifting side of this, I built an app in Python and PostgreSQL to do all of this. The more nuanced/finessed stuff was largely done by the balance of the amazing team (i.e. setting up all the picklists and picklist values in SFDC).
Extract:
Transform:
Load:
During the load process, each record loaded gets its SFDC ID attributed to it, meaning we can do the association of accounts to contacts etc without a manual re-extraction of data from SFDC to get IDs.
The whole process was built to be iterative. By way of example, if a picklist value is missing for a field against a contact, all the contacts with that value would fail, but the balance would load. The error was logged, and once corrected, the whole process could be re-run with the load only occurring against those records that failed first time around. This made testing an efficient process as we tested in Sandbox (SBX) multiple times and were able to iterate to ‘perfection’. Equally importantly, if an Account failed to load, no contacts for that account would be loaded. Once the error with the Account was resolved and it was loaded, the contacts would get loaded without further effort on my (or anyone else’s) part.
On the day of migration, we ran the whole process in Production (including HS Extracts, Transform, and SFDC Loads), and it was effectively hands off. The data got extracted, stored, transformed, and loaded. We extracted around 16.5k HS Companies and 295k HS Contacts. There were custom rules in place that mean we loaded about 8k SFDC Accounts and 115k SFDC Contacts. The whole process took around 2.5 hours.
As part of the transform process, we (amongst other things):
A lot of the mappings were not one-to-one. There were owners that were consolidated, and picklist values that were consolidated.
There were plenty of other nuances here, but the one that I wanted to dig into a little was loading notes from HubSpot to SFDC. Both are stored as HTML encoded data, and allow for formatting such as bold, italics, underline etc.
I'm not going to lie, I struggled for a while to get Notes to load. I started off trying:
Both of these approaches failed miserably, but after a lot of digging I came across this approach:
# 1. Encode the text to bytes using UTF-8
noteContentBytes = aNote['content'].encode('utf-8')
# 2. Base64 encode the bytes
noteContentBase64Bytes = base64.b64encode(noteContentBytes)
# 3. Decode the base64 bytes to a string (for the API call)
noteContentBase64String = noteContentBase64Bytes.decode('utf-8')
newNote = sf.ContentNote.create({
'Title': aNote['title'],
'Content': noteContentBase64String})
The value of aNote['content'] was stored in PostgreSQL as a straightforward varchar(max). There is a lot of other support code:
But if you're ready to load an HTML note into HS, that snippet above will get you over the line.