CRM Data Migration - TheDataEngineer

01 Foundation

Introduction to Salesforce Data Migration

Understanding the fundamentals of enterprise-grade CRM data migration

What is Salesforce CRM Data Migration?

Salesforce CRM data migration is the process of transferring data from an existing system (legacy system, another CRM, or flat files) to the Salesforce platform. This is a critical step in any Salesforce implementation or integration, ensuring that valuable business information is accurately and efficiently moved to the new platform.

The process requires strategic planning, meticulous execution, and validation to ensure data integrity, minimize downtime, and avoid business disruption. It's not just about moving data; it's about transforming, cleaning, and structuring it to fit Salesforce's data model and business processes.

Common Migration Scenarios

→
Legacy System Migration
Migrating from legacy on-premise systems (e.g., old databases, spreadsheets)
→
Multi-System Consolidation
Consolidating data from multiple disparate systems into a single Salesforce instance
→
CRM Upgrade
Upgrading from an older CRM system to Salesforce
→
M&A Integration
Merging data from acquired companies into an existing Salesforce org

Understanding Project Complexity

The complexity of a data migration project can vary significantly based on several critical factors:

Data Volume

Size and number of records to be migrated

Data Variety

Different types and formats of source data

Data Velocity

Speed at which data changes in the source system

Source Complexity

Complexity of the source data model

Target Configuration

Complexity of target Salesforce configuration

02 Key Phases

Key Phases of Salesforce Data Migration

A typical Salesforce data migration project follows a structured approach, encompassing several key phases to ensure data quality, integrity, and a seamless transition.

2.1

API Mapping Sheet (Data Mapping Document)

The Foundation of Migration Success

This crucial document defines the scope of the migration. It meticulously lists all in-scope objects and fields, their respective data types, transformation rules (how source data is manipulated), and mapping rules (where legacy data to the target (Salesforce CRM). Documenting all business rule transformations appropriately in this mapping sheet is vital. This document serves as a "sign-off" sheet, confirming the successful alignment of data and transformations, and ensuring no open-ended questions before proceeding to the development phase.

Sample API Mapping Sheet Example

Simplified example showing field-level mapping from legacy systems to Salesforce

Source Object	Source Field Name	Source Data Type	Target Salesforce Object	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules
Legacy_CRM_Accounts	Account_ID	VARCHAR(50)	Account	External_ID__c	Text (External ID)	Direct Map
Legacy_CRM_Accounts	Account_Name	VARCHAR(255)	Account	Name	Text	Direct Map
Legacy_CRM_Accounts	Customer_Type	VARCHAR(20)	Account	Type	Picklist	Map 'Retail' to 'Customer', 'Whole' to 'Partner'. Default to 'Other' if no match.
Legacy_CRM_Accounts	Annual_Revenue_USD	DECIMAL(18,2)	Account	AnnualRevenue	Currency	Direct Map
Legacy_CRM_Accounts	Created_Date	DATETIME	Account	CreatedDate	DateTime	Direct Map (if allowed by Salesforce API, otherwise use custom field)
Legacy_CRM_Contacts	Contact_ID	VARCHAR(50)	Contact	Legacy_Contact_ID__c	Text (External ID)	Direct Map
Legacy_CRM_Contacts	First_Name	VARCHAR(100)	Contact	FirstName	Text	Direct Map
Legacy_CRM_Contacts	Last_Name	VARCHAR(100)	Contact	LastName	Text	Direct Map
Legacy_CRM_Contacts	Email_Address	VARCHAR(255)	Contact	Email	Email	Validate format (e.g., contains '@' and '.'). Clean leading/trailing spaces.
Legacy_CRM_Contacts	Legacy_Account_ID	VARCHAR(50)	Contact	AccountId	Lookup(Account)	Lookup Salesforce Account ID using Legacy_Account_ID from Account object's External_ID__c.
Legacy_CRM_Opportunities	Opportunity_ID	VARCHAR(50)	Opportunity	Legacy_Opportunity_ID__c	Text (External ID)	Direct Map
Legacy_CRM_Opportunities	Opportunity_Name	VARCHAR(255)	Opportunity	Name	Text	Direct Map
Legacy_CRM_Opportunities	Deal_Stage	VARCHAR(50)	Opportunity	StageName	Picklist	Map 'Prospecting' to 'Prospecting', 'Negotiation' to 'Negotiation/Review'. Default to 'Closed Lost' if historical status is 'Lost'.
Legacy_CRM_Opportunities	Close_Date	DATE	Opportunity	CloseDate	Date	Direct Map
Legacy_CRM_Opportunities	Deal_Amount_USD	DECIMAL(18,2)	Opportunity	Amount	Currency	Direct Map
Legacy_CRM_Opportunities	Legacy_Account_ID	VARCHAR(50)	Opportunity	AccountId	Lookup(Account)	Lookup Salesforce Account ID using Legacy_Account_ID from Account object's External_ID__c.

2.2

Code Development & Data Staging

In this phase, the development of ETL (Extract, Transform, Load) code begins. A common practice involves creating intermediary tables to manage data flow and transformation:

📥

Source Data Submission

Data is initially submitted by clients or extracted from source systems.

🔄

Synonym Table (Staging Table)

Source data is first imported into a "Synonym Table" (Staging Table). This table acts as an exact, untouched replica of the raw source data, preserving its original format and source system headers. This ensures that the original source data remains intact and can be re-referenced if any transformations go awry.

📊

Landing Table

Data from the Synonym Table is then imported into the "Landing Table." This is where all necessary data transformations, cleansing, and standardization operations are performed. The Landing Table can contain source system headers or a mix of source and target system headers. Critically, it should include additional columns for tracking:

Source System ID: This column acts as a primary key from the source system. It is crucial for lookup operations in Salesforce (often mapped to an External ID field in Salesforce). For child objects (e.g., Contacts, Opportunities), the source system ID of their parent object is also available in the child object's landing table. This facilitates linking to the parent's Salesforce ID before performing data loads.
Unique Identifier (e.g., GUID): A new column, typically a Globally Unique Identifier (GUID) or a similar unique ID, should be generated for each record in the landing table. This GUID will serve as a temporary primary key. This is particularly useful for stamping records that are newly generated (records from the source system or heavily generated records from SQL) to Salesforce environment to your SQL Server load tables for any particular object, enabling easy reconciliation.
Tracking Columns: Columns like 'Update Message', 'Error Message', and 'Error Flag' (as detailed in section 2.6) should be present to track changes, errors, and transformation outcomes at the record level.

✅

Load Table (Salesforce Ready Table)

Once all transformations and validations are complete on the Landing Table, the prepared data is moved to the "Load Table." This table is specifically structured to match the Salesforce data model, containing only the exact Salesforce API names (not labels) as headers. This strict adherence to API names is essential for successful backend data loads into Salesforce.

Detailed Data Migration Staging Flow

Detailed Conceptual Data Flow: Source to Load Table

2.3

Pre-Validation Checks & Error Reporting

During code development, it's crucial to incorporate robust pre-validation checks. These checks ensure that the data quality meets minimum Salesforce system's requirements and identify any records that might cause failures during the actual data load. Generating a detailed error report from these checks and sharing it with the client is a vital preliminary step. This proactive approach helps in rectifying data quality issues at the source, ensuring a smoother migration and validating the data's readiness for Salesforce.

2.4

Data Loading Tools & Methods

When it's time to load data into Salesforce, several tools are available, chosen based on data volume, complexity, and security requirements:

🌐

Salesforce Inspector

A popular open-source browser extension (e.g., for Chrome, Firefox) often used for quick data loads and inspections. It's suitable for loading moderate volumes of data, typically up to 50,000 to 100,000 records. Its ease of use has made it a favorite among administrators during testing phases.

🖥️

Salesforce Data Loader

This is an official Salesforce-supported desktop application. It's highly preferred for its robust capabilities without any API call consumption and administrator-level access to Salesforce. It can handle larger volumes of data than Salesforce Inspector and is ideal for production loads.

⚡

Azure Data Factory (ADF) / Other ETL Tools with Bulk APIs

For very large data volumes, complex integrations, or automation pipelines, tools like Azure Data Factory (ADF) or similar ETL tools (e.g., Informatica, Mulesoft, Talend) can be used. These tools can build data push jobs that extract data from SQL databases (or other sources) and directly load it into Salesforce using Salesforce's Bulk APIs (preferably Bulk API 2.0 for enhanced performance and simplified error handling). This method is highly scalable and efficient for enterprise-level migrations.

2.5

External IDs and Parent-Child Relationship Handling

After the initial data load, especially for parent objects, it's crucial to capture the newly generated Salesforce IDs. These Salesforce IDs (often referred to as 'External IDs' in the context of migration for linking purposes) are essential for associating child object records with their respective parent records. Since child records typically have a lookup relationship to parent records, the Salesforce ID of the parent must be available to correctly link the child records during their load. This often involves an iterative loading process where parent records are loaded first, their Salesforce IDs are extracted, and then these IDs are used to populate the lookup fields on child records.

Parent-Child Lookup Flow

Parent-Child Data Loading Flow

1. Load Parent → 2. Extract SF IDs → 3. Map to Child → 4. Load Child

Parent-Child Data Loading Flow

2.5.1

Handling Self-Referencing Lookups

A common challenge in data migration arises when an object has a lookup relationship to itself (e.g., an Account referring to a Parent Account, or an Opportunity referring to a Predecessor Opportunity). In such cases, a direct single-pass load is not feasible because the Salesforce ID of the referenced record might not exist yet during the initial load.

The solution involves a post-data load update strategy:

1

Initial Load: All records for the self-referencing object (e.g., all Opportunities) are pushed into Salesforce without populating the self-referencing lookup field (e.g., 'Predecessor Opportunity' or 'PredecessorOpportunityId').
2

Capture Salesforce IDs: After the initial load, the newly generated Salesforce IDs for all these records are extracted and stamped back onto your load table in the source system, alongside their original source system IDs.
3

Post-Load Update: A second pass is performed. For each record, the legacy source system ID of its predecessor/parent (e.g., 'Legacy_Predecessor_Opportunity_ID') is used as a lookup into the source system IDs of the records that have already been loaded and stamped into Salesforce. This allows you to find the corresponding new Salesforce ID for the predecessor/parent.
4

Update Lookup Field: Finally, an update operation is performed on the self-referencing object in Salesforce, populating the lookup field with the correct Salesforce ID of its parent/predecessor.

Self-Referencing Lookup Flow

Initial Load (No Lookup) → Extract & Stamp IDs → Map Legacy to SF ID → Update Lookup Field

Conceptual Flow for Handling Self-Referencing Lookups

2.6

Coding Best Practices for Data Transformation

When performing data transformation or update operations, especially within SQL Server stored procedures or ETL scripts, it's a best practice to include tracking columns. These columns can significantly aid in debugging, auditing, and understanding data quality issues.

📝

Error Message Column: A 'VARCHAR' column to store specific error messages if a record fails a validation or transformation rule.
🔄

Update Type Column: A 'VARCHAR' column to indicate the type of transformation or update applied (e.g., 'Amount_Formatted', 'Address_Standardized', 'Default_Applied').
🚩

Error Flag Column: A 'BOOLEAN' or 'BIT' column (e.g., 'Is_Error_Flag') to quickly identify records that encountered issues during processing.
⏰

Update Timestamp: A 'DATETIME' column to record when the last transformation or update was applied to the record.

2.7

Automated Data Extraction from Salesforce

For environments dealing with large volumes of Salesforce records, manual data extraction or reliance solely on tools like Salesforce Inspector can be inefficient. An automated approach using ETL tools like Azure Data Factory (ADF) combined with "parameter tables" is highly recommended.

These "base tables" or "parameter tables" act as control mechanisms for the ADF pipelines. A common example is a table named 'Salesforce_Extract_Metadata', which stores essential information about Salesforce objects and their fields.

Salesforce_Extract_Metadata Table Structure (Conceptual)

Column Name	Data Type	Description
'Object_API_Name'	VARCHAR(100)	Salesforce API Name of the object (e.g., 'Account', 'Contact', 'Opportunity').
'Field_API_Name'	VARCHAR(255)	Salesforce API Name of the field (e.g., 'Name', 'Industry', 'Email').
'Field_Label'	VARCHAR(255)	User-friendly label of the field (e.g., 'Account Name', 'Industry Type').
'Is_Active'	BOOLEAN	Flag to indicate if the field should be included in extraction (TRUE/FALSE).

Sample 'dbo.salesforce_extract_metadata' Table Entries

Here's a sample of the 'dbo.salesforce_extract_metadata' table entries for the Account object:

EntityName	BusinessEntityName	Field API Name	Filter Condition
Account	Account	Name	where Processing_Status__c='Confirmed'
Account	Account	Source_System_Id__c	where Processing_Status__c='Confirmed'
Account	Account	State	where Processing_Status__c='Confirmed'
Account	Account	City	where Processing_Status__c='Confirmed'
Account	Account	OwnerId	where Processing_Status__c='Confirmed'
Account	Account	RecordTypeId	where Processing_Status__c='Confirmed'
Account	Account	RecordType.Name	where Processing_Status__c='Confirmed'
Account	Account	Owner.Employee_Code__c	where Processing_Status__c='Confirmed'
Account	Account	CreatedDate	where Processing_Status__c='Confirmed'
Account	Account	LastModifiedDate	where Processing_Status__c='Confirmed'

SQL Query to Generate SOQL SELECT Clause

SELECT
    Object_API_Name,
    'SELECT ' + STRING_AGG(QUOTENAME([Field_API_Name]), ', ') 
    WITHIN GROUP (ORDER BY [Field_API_Name]) + ' FROM ' + Object_API_Name 
FROM
    dbo.salesforce_extract_metadata
WHERE
    -- Add any condition to filter active fields or specific objects if needed
    -- For example, to get fields for 'Account' object:
    Object_API_Name = 'Account'
GROUP BY
    Object_API_Name;

This dynamically generated SOQL query is then executed by ADF against the Salesforce environment. The entire process of generating the SOQL query and writing data back into the database can be visualized as follows:

SOQL Query Generation and Data Write-Back Flow

Metadata Table → Generate SOQL → Execute in Salesforce (ADF) → Write to Database

Conceptual Flow: Metadata to SOQL Query Generation and Data Write-Back

03 Process Flow

End-to-End Data Migration Flow

A successful Salesforce data migration isn't a single event but a carefully orchestrated series of steps, often iterated through various testing environments before reaching production.

Here's a typical end-to-end flow:

API Mapping Sheet Finalization

This initial step involves creating a detailed mapping of all source system data fields to their corresponding Salesforce target objects and fields. This process is crucial for defining the project's scope, identifying necessary data transformations, and estimating the effort required for code development. It ensures everyone agrees on what data moves where and how it changes.

Mapping Sheet Signoff

Formal approval of the API mapping sheet by all key stakeholders (business, technical, and data governance). This step-by-step technical checkpoint confirms that everyone is in alignment before actual development work begins.

Dummy Data Creation / Sample Data Pre-validation (Development & Validation - DV - Cycle)

Before loading into Salesforce, prepare your data. This might involve creating dummy data sets or taking a small, representative sample of your actual source data then undergoes pre-validation checks to identify and fix any quality issues or transformation errors in a lower sandbox environment (Development or Validation cycle).

DV Cycle Data Loads

Perform the initial data loads using the prepared dummy or sample data into a development or validation sandbox. This is where you test your ETL scripts and loading mechanisms in a controlled environment.

Post-Data Load Updates (Self-Referencing Lookups)

After initial data loads, especially for objects that refer to themselves (like a Parent Account field on an Account record), perform a separate update step. This ensures that all records link correctly (as the target records within the same object did not exist when Salesforce IDs are generated.

Crosswalk Generation

Create a "crosswalk" file. This is essentially a merged file containing your original source system IDs alongside the newly generated Salesforce IDs for each record. This file is invaluable for:

Data Validation: Easily reconciling data between source and target systems.
One-Time Integration: Providing a lookup mechanism for downstream systems that might need to connect to the new Salesforce IDs.

QA Testing & Defect Resolution (for 2 weeks)

Conduct thorough Quality Assurance (QA) testing. This involves comprehensive validation of the migrated data, testing business processes, and verifying integrations. This phase also includes dedicated support for identifying and resolving any defects or data discrepancies found.

Sign-off in DV Environment

Obtain formal approval from key stakeholders ensuring that data quality, migration processes are validated and meet requirements in the Development/Validation environment.

Iterative Load Cycles (E2E, MFAT, SIT, UAT, Production)

The process from Step 3 (Dummy data creation/sample pre-validation) onwards is typically repeated across comprehensive load cycles:

End-to-End (E2E) Cycle: Testing the full data flow and integrations.
Mock Full Acceptance Testing (MFAT): Loading full production values to test performance and scalability.
System Integration Testing (SIT): Focused on verifying integrations with other enterprise systems.
User Acceptance Testing (UAT): Business users validate the system and data for readiness.
Production Load: The final, live migration.

For the production cycle, the testing support transitions into Hypercare, an intensive support period immediately after go-live to address any critical issues that arise.

Salesforce CRM Data Migration Flowchart

API Mapping: Source to Salesforce Object/Field Level

Define exactly how data from your old system maps to Salesforce objects (like Accounts, Contacts) and their fields. This clarifies what data moves, how it transforms, and helps estimate the work.

↓

Mapping Sheet Sign-off

Get formal approval on the detailed data mapping document. This ensures everyone agrees on the data structure before development begins.

↓

Dummy Data Creation OR Sample Data Validation
(Development/Validation Cycle)

Prepare either test data or a small, representative sample of your actual source data. This is used for initial testing in a non-production Salesforce environment (like a Sandbox).

↓

DV Cycle Data Loads

Load the prepared dummy or sample data into your Salesforce Development/Validation Sandbox to test the migration scripts and processes.

↓

Post Data Load Updates (e.g., Linking Records in Salesforce)

After initial data loads, perform necessary updates, such as linking related Salesforce records (e.g., setting the 'Parent Account' lookup on Account records, or linking Contacts to Accounts).

↓

Crosswalk Generation

Create a "crosswalk" file. This is a crucial document that links your original source system IDs to the new Salesforce IDs for each migrated record. It's vital for validation and integrating with other systems.

↓

QA Testing & Defect Resolution (2 Weeks)

Thoroughly test the migrated data and processes in the DV environment for about two weeks, identifying and fixing any issues or "defects."

↓

Sign-off in Development/Validation (DV) Environment

Once testing is complete and satisfactory, formally approve the data migration in the DV environment, confirming it's ready for higher environments.

↓

9. Incremental Load Cycles & Environments

This entire process (from Step 3 onwards) is repeated for each subsequent environment to ensure data quality and process readiness.

E2E

(End-to-End Testing)

→

MFAT

(Mock Full Acceptance Testing)

→

SIT

(System Integration Testing)

→

UAT

(User Acceptance Testing)

→

Production

(Go-Live & Hypercare)

↻

Each cycle loops back to "Step 3: Dummy Data Creation / Sample Data Validation"

↓

End of Migration Process

Once Production data is loaded and validated, the migration is complete. Hypercare refers to intensive support immediately after the production go-live.

04 Objects

Salesforce Objects in Data Migration

Understanding the core Salesforce objects and their relationships is fundamental for successful data migration. Below is a list of common objects involved in CRM data migrations, along with their functional representation and sample API mappings.

User (Related Objects: Permission Sets, Profiles, Roles, Groups)

The User object in Salesforce represents an individual who can log in to Salesforce. It stores information about the user, their permissions, and their access levels within the organization. Migrating user data often involves careful consideration of security, roles, profiles, and permission sets to ensure correct access to records and functionalities.

Sample API Mapping: User Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Employee_ID	VARCHAR(50)	FederationIdentifier	Text (External ID)	Direct Map	Used for SSO or external system integration.
First_Name	VARCHAR(100)	FirstName	Text	Direct Map
Last_Name	VARCHAR(100)	LastName	Text	Direct Map	Required field.
Email_Address	VARCHAR(255)	Email	Email	Validate format; must be unique.	Required for login.
Username	VARCHAR(255)	Username	Email	Direct Map (often same as Email, but must be unique across all Salesforce orgs)	Required for login.
User_Role_Name	VARCHAR(100)	UserRoleId	Lookup(Role)	Lookup Salesforce Role ID based on Role Name.	Roles define data visibility hierarchy.
User_Profile_Name	VARCHAR(100)	ProfileId	Lookup(Profile)	Lookup Salesforce Profile ID based on Profile Name.	Profiles define object/field permissions.
Is_Active	BOOLEAN	IsActive	Boolean	Map 'Yes'/'1' to TRUE, 'No'/'0' to FALSE.	Deactivated users do not count towards licenses.

Account

The Account object represents companies, organizations, or individual consumers that you do business with. It is one of the most fundamental objects in Salesforce, serving as the central hub for all related activities, contacts, opportunities, and cases. Proper migration of accounts is critical for maintaining customer relationships and historical data.

Sample API Mapping: Account Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Account_ID	VARCHAR(50)	External_ID__c	Text (External ID)	Direct Map	Used for upsert operations and linking child records.
Account_Name	VARCHAR(255)	Name	Text	Direct Map	Required field.
Industry_Type	VARCHAR(50)	Industry	Picklist	Map source industry values to Salesforce picklist values. Default to 'Other' if no match.	Ensure picklist values exist in Salesforce.
Primary_Contact_Email	VARCHAR(255)	Primary_Contact_Email__c	Email	Direct Map, validate email format.	Custom field for primary contact email.
Annual_Revenue_USD	DECIMAL(18,2)	AnnualRevenue	Currency	Direct Map
Account_Owner_ID	VARCHAR(50)	OwnerId	Lookup(User)	Lookup Salesforce User ID based on Source Owner ID (e.g., Employee_ID).	Ensures correct ownership in Salesforce.

Contact

The Contact object represents individuals associated with an Account. These are the people you interact with at companies or organizations. Contacts are linked to Accounts, forming the basis of customer relationship management. Migrating contacts involves linking them correctly to their respective accounts.

Sample API Mapping: Contact Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Contact_ID	VARCHAR(50)	Legacy_Contact_ID__c	Text (External ID)	Direct Map	Used for upsert operations and linking.
First_Name	VARCHAR(100)	FirstName	Text	Direct Map
Last_Name	VARCHAR(100)	LastName	Text	Direct Map	Required field.
Email_Address	VARCHAR(255)	Email	Email	Validate format (e.g., contains '@' and '.'). Clean leading/trailing spaces.	Invalid emails might cause load failures.
Legacy_Account_ID	VARCHAR(50)	AccountId	Lookup(Account)	Lookup Salesforce Account ID using Legacy_Account_ID from Account object's External_ID__c.	Requires Account records to be loaded first.

Individual

The Individual object is part of Salesforce's Person Accounts or Privacy and Data Protection features (like GDPR compliance). It represents a natural person and is used to store privacy preferences and other personal data separate from the Contact or Lead object. It allows for a more granular approach to managing individual consent and data rights.

Sample API Mapping: Individual Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Privacy_Consent_ID	VARCHAR(50)	External_ID__c	Text (External ID)	Direct Map	Used for upsert operations.
Contact_Preference	VARCHAR(50)	CanContact__c	Boolean	Map 'Opt-In' to TRUE, 'Opt-Out' to FALSE.	Custom field for contact preference.
Data_Deletion_Request	BOOLEAN	IsDataSubjectRequest__c	Boolean	Direct Map	Indicates a request for data deletion.
Legacy_Contact_ID	VARCHAR(50)	ContactId	Lookup(Contact)	Lookup Salesforce Contact ID using Legacy_Contact_ID from Contact object's Legacy_Contact_ID__c.	Links the Individual record to a Contact.

Opportunity

The Opportunity object represents a potential sale or pending deal with an Account. It tracks the sales process from lead qualification to closing the deal, including stages, amounts, and close dates. Opportunities are central to sales forecasting and revenue tracking in Salesforce.

Sample API Mapping: Opportunity Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Opportunity_ID	VARCHAR(50)	Legacy_Opportunity_ID__c	Text (External ID)	Direct Map	Used for upsert operations and linking.
Deal_Name	VARCHAR(255)	Name	Text	Direct Map	Required field.
Current_Stage	VARCHAR(50)	StageName	Picklist	Map source stage values to Salesforce picklist values. Handle 'Closed Won'/'Closed Lost' appropriately.	Required field.
Expected_Close_Date	DATE	CloseDate	Date	Direct Map	Required field.
Estimated_Amount_USD	DECIMAL(18,2)	Amount	Currency	Direct Map
Legacy_Account_ID	VARCHAR(50)	AccountId	Lookup(Account)	Lookup Salesforce Account ID using Legacy_Account_ID from Account object's External_ID__c.	Requires Account records to be loaded first.
Opportunity_Owner_ID	VARCHAR(50)	OwnerId	Lookup(User)	Lookup Salesforce User ID based on Source Owner ID.	Ensures correct ownership.

Opportunity Split

The Opportunity Split object allows sales teams to allocate credit for an opportunity to multiple sales team members. This is particularly useful in collaborative sales environments where more than one person contributes to closing a deal. It tracks the percentage of revenue or overlay credit each team member receives.

Sample API Mapping: Opportunity Split Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Opp_Split_ID	VARCHAR(50)	External_ID__c	Text (External ID)	Direct Map	Used for upsert operations.
Legacy_Opportunity_ID	VARCHAR(50)	OpportunityId	Lookup(Opportunity)	Lookup Salesforce Opportunity ID using Legacy_Opportunity_ID from Opportunity object's Legacy_Opportunity_ID__c.	Required. Opportunity records must be loaded first.
Sales_Rep_ID	VARCHAR(50)	UserId	Lookup(User)	Lookup Salesforce User ID based on Sales_Rep_ID from User object's FederationIdentifier.	Required. User records must be loaded first.
Split_Percentage	DECIMAL(5,2)	SplitPercentage	Percent	Direct Map. Ensure sum of splits for an opportunity equals 100% (for Revenue splits).	Required.
Split_Type	VARCHAR(50)	SplitTypeId	Lookup(SplitType)	Lookup Salesforce Split Type ID (e.g., 'Revenue', 'Overlay').	Requires SplitType setup in Salesforce.

Opportunity Split Asset

The Opportunity Split Asset object is typically a custom object or part of a specialized Salesforce product (like Revenue Cloud or specific CPQ implementations) used to track how revenue from specific assets (e.g., subscriptions, products) within an opportunity is split among team members. This provides a more granular view of revenue attribution than the standard Opportunity Split.

Sample API Mapping: Opportunity Split Asset Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Split_Asset_ID	VARCHAR(50)	External_ID__c	Text (External ID)	Direct Map	Used for upsert operations.
Legacy_Opportunity_ID	VARCHAR(50)	OpportunityId	Lookup(Opportunity)	Lookup Salesforce Opportunity ID.	Required.
Legacy_Product_Asset_ID	VARCHAR(50)	AssetId	Lookup(Asset)	Lookup Salesforce Asset ID.	Requires Asset records to be loaded or created.
Sales_Rep_ID	VARCHAR(50)	UserId	Lookup(User)	Lookup Salesforce User ID.	Required.
Asset_Split_Percentage	DECIMAL(5,2)	Split_Percentage__c	Percent	Direct Map.	Custom field.

Opportunity Split Alliance

The Opportunity Split Alliance object is also typically a custom object, often used in scenarios where multiple partners or alliances contribute to an opportunity, and their respective revenue or credit shares need to be tracked. This is common in channel sales or complex enterprise deals involving various stakeholders. It allows for tracking splits beyond just internal sales team members.

Sample API Mapping: Opportunity Split Alliance Object

Source Field Name	Source Data Type	Target Salesforce API Name	Target Salesforce Data Type	Transformation Logic / Business Rules	Notes / Comments
Legacy_Split_Alliance_ID	VARCHAR(50)	External_ID__c	Text (External ID)	Direct Map	Used for upsert operations.
Legacy_Opportunity_ID	VARCHAR(50)	OpportunityId	Lookup(Opportunity)	Lookup Salesforce Opportunity ID.	Required.
Partner_ID	VARCHAR(50)	PartnerAccountId	Lookup(Account)	Lookup Salesforce Account ID for the Partner.	Requires Partner Accounts to be loaded.
Alliance_Split_Percentage	DECIMAL(5,2)	Split_Percentage__c	Percent	Direct Map.	Custom field.
Alliance_Role	VARCHAR(50)	Role__c	Picklist	Map source role to Salesforce picklist values (e.g., 'Referral', 'Co-Sell').	Custom field.

05 Data Types

Salesforce CRM Data Types

Understanding the various data types available in Salesforce CRM is crucial for accurate data mapping and successful migration. Each data type has specific characteristics, limitations, and implications for how data is stored and displayed.

5.1. Text Field (Open Text)

An Open Text Field in Salesforce allows users to input any free-form text up to a defined character limit. These are versatile fields used for names, codes, short descriptions, or any data that doesn't fit a more structured type.

Text (255): The most common text field, supporting up to 255 characters.
Text Area: Supports up to 255 characters across multiple lines.
Text Area (Long): Extends the limit significantly, typically up to 131,072 characters (128 KB). These are commonly used for detailed columns or user-generated inputs where substantial text is expected.
Text Area (Rich): Similar to Text Area (Long) but allows rich text formatting (bold, italics, lists, etc.).

5.2. Lookup Field

A Lookup Field establishes a relationship between two objects, linking a record of one object to a record of another object. It's a "many-to-one" relationship, where multiple child records can relate to a single parent record.

Functionality: It allows users to search for and select a record from the related object. For example, an Opportunity record might have a Lookup field to the Account object (e.g., `AccountId`) or to the User object (e.g., `OwnerId` or a custom `Partner_User__c`).
Mandatory vs. Optional: While lookup fields define a relationship, they are not inherently mandatory. Their mandatory nature depends on business rules or field properties configured in Salesforce. If a lookup field is marked as "Required" or enforced by a validation rule, it must contain a valid reference to a parent record during data insertion.
Migration Impact: During data migration, lookup fields require careful handling. Parent records must be loaded before their child records, and the Salesforce ID of the parent record must be available to populate the lookup field in the child record.

5.3. Master-Detail Relationship Field

A Master-Detail Relationship is a tighter, stronger relationship between two objects compared to a Lookup. It creates a parent-child dependency where the detail (child) record cannot exist without the master (parent) record.

Dependency: Deleting the master record automatically deletes all related detail records (cascading delete).
Security & Sharing: The detail record inherits the security and sharing settings of its master record.
Roll-Up Summary Fields: Master-Detail relationships allow for the creation of roll-up summary fields on the master object, which can aggregate data from the detail records (e.g., sum of all opportunity amounts on an Account).
Required Field: The Master-Detail relationship field on the detail object is always required.
Migration Impact: Similar to lookup fields, master records must be loaded before detail records. The strict dependency means data integrity is paramount.

5.4. Picklist (Normal)

A Normal Picklist provides a drop-down list of predefined values for a field. This helps in controlling data quality, ensuring consistency, and simplifying data entry for users by offering a specific set of values.

Controlled Vocabulary: Ensures that users select from a standard list, preventing free-text entry that could lead to inconsistencies.
Global Picklists: Values can be shared and managed across multiple objects and fields, promoting standardization.
Migration Impact: Source values must be accurately mapped to the predefined picklist values in Salesforce. Any source values not present in the Salesforce picklist will cause load failures unless handled by transformation rules (e.g., mapping to a default value or 'Other').

5.5. Dependent Picklist

A Dependent Picklist is a picklist whose available values are filtered based on the value selected in another controlling field (which can be another picklist or a checkbox). This is used to define hierarchical or conditional relationships between values to improve the user interface.

Hierarchical Control: For example, selecting a 'Country' (controlling field) might filter the available 'State/Province' options (dependent picklist). This helps in guiding logical data entry.
UI Experience: Enhances the user experience by dynamically adjusting available options based on previous selections.
Migration Complexity:

Data Mapping: Requires careful mapping of both controlling and dependent field values.

Extraction of Values: Historically, extracting the full matrix of dependent picklist values for data mapping and calculation has been complex. While some Salesforce extensions or tools might offer streamlined download options in the past, these are often deprecated or require specific technical expertise.

Technical Activity: Generating accurate dependent picklist mappings may require a "tech debt activity" from the development team, involving programmatic extraction (e.g., Apex, API calls) or specialized tools to correctly map the intricate relationships, as direct UI downloads are not always comprehensive or user-friendly for bulk operations. This is a crucial step to ensure data integrity during migration, as incorrect dependent picklist values can lead to load failures.

06 Testing

Data Migration Testing Cycles

For any Salesforce data migration project, a phased approach involving multiple testing cycles is highly recommended. Each cycle builds upon the previous one, adding more rigor and scope to ensure a robust and successful migration.

6.1. Data Validation Cycle (Unit Testing / Sample Load)

Purpose: To verify the correctness of data types, API name mappings, and basic transformations on a small subset of data.
Environment: Typically performed in a partial copy sandbox or developer sandbox.
Scope: Focuses on a few sample records, not the complete data volume. This helps quickly identify and rectify initial mapping or transformation errors.

6.2. End-to-End Validation Cycle (Integration Testing)

Purpose: To ensure that data flows correctly across all integrated systems (both downstream and upstream applications) after migration. This verifies that APIs, integrations, and access permissions are correctly configured and that data flow is not impacted.
Environment: Usually conducted in a full sandbox or UAT (User Acceptance Testing) environment that closely mirrors production.
Scope: Involves larger data sets and testing of end-to-end business processes that rely on the migrated data.

6.3. MFAT Cycle (Mock Full Acceptance Testing / Performance Testing)

Purpose: MFAT (Mock Full Acceptance Testing) involves testing the data migration process with the exact production volume of data. This cycle aims to confirm that the Salesforce environment can support the full data push activity without performance degradation or unexpected failures.
Environment: A dedicated full sandbox or a performance testing environment.
Scope: The primary goal is to assess the time taken to push the entire production environment data, understand throughput, identify bottlenecks, and ensure the process can complete within acceptable timeframes. This is crucial for planning the actual production cutover window.

The entire data migration process, from initial data validation (DV) through End-to-End (E2E), Mock Full Acceptance Testing (MFAT), System Integration Testing (SIT), and User Acceptance Testing (UAT), is an iterative one. Steps like data preparation, loading, validation, and defect resolution are repeated for each cycle, with increasing data values and complexity. The final production cutover is followed by an intensive Hypercare period, providing immediate and critical support to ensure business continuity after go-live.

Salesforce CRM Data Migrations

Introduction to Salesforce Data Migration

What is Salesforce CRM Data Migration?

Common Migration Scenarios

Understanding Project Complexity

Key Insight

Key Phases of Salesforce Data Migration

API Mapping Sheet (Data Mapping Document)

The Foundation of Migration Success

Sample API Mapping Sheet Example

Code Development & Data Staging

Source Data Submission

Synonym Table (Staging Table)

Landing Table

Load Table (Salesforce Ready Table)

Detailed Data Migration Staging Flow

Pre-Validation Checks & Error Reporting

Data Loading Tools & Methods

Salesforce Inspector

Salesforce Data Loader

Azure Data Factory (ADF) / Other ETL Tools with Bulk APIs

External IDs and Parent-Child Relationship Handling

Parent-Child Lookup Flow

Handling Self-Referencing Lookups

Self-Referencing Lookup Flow

Coding Best Practices for Data Transformation

Automated Data Extraction from Salesforce

Salesforce_Extract_Metadata Table Structure (Conceptual)

Sample 'dbo.salesforce_extract_metadata' Table Entries

SQL Query to Generate SOQL SELECT Clause

SOQL Query Generation and Data Write-Back Flow

End-to-End Data Migration Flow

API Mapping Sheet Finalization

Mapping Sheet Signoff

Dummy Data Creation / Sample Data Pre-validation (Development & Validation - DV - Cycle)

DV Cycle Data Loads

Post-Data Load Updates (Self-Referencing Lookups)

Crosswalk Generation

QA Testing & Defect Resolution (for 2 weeks)

Sign-off in DV Environment

Iterative Load Cycles (E2E, MFAT, SIT, UAT, Production)

Salesforce CRM Data Migration Flowchart

API Mapping: Source to Salesforce Object/Field Level

Mapping Sheet Sign-off

Dummy Data Creation OR Sample Data Validation(Development/Validation Cycle)

DV Cycle Data Loads

Post Data Load Updates (e.g., Linking Records in Salesforce)

Crosswalk Generation

QA Testing & Defect Resolution (2 Weeks)

Sign-off in Development/Validation (DV) Environment

9. Incremental Load Cycles & Environments

End of Migration Process

Salesforce Objects in Data Migration

User (Related Objects: Permission Sets, Profiles, Roles, Groups)

Sample API Mapping: User Object

Account

Sample API Mapping: Account Object

Contact

Sample API Mapping: Contact Object

Individual

Sample API Mapping: Individual Object

Opportunity

Sample API Mapping: Opportunity Object

Opportunity Split

Sample API Mapping: Opportunity Split Object

Opportunity Split Asset

Sample API Mapping: Opportunity Split Asset Object

Opportunity Split Alliance

Sample API Mapping: Opportunity Split Alliance Object

Salesforce CRM Data Types

5.1. Text Field (Open Text)

5.2. Lookup Field

5.3. Master-Detail Relationship Field

5.4. Picklist (Normal)

5.5. Dependent Picklist

Data Migration Testing Cycles

6.1. Data Validation Cycle (Unit Testing / Sample Load)

6.2. End-to-End Validation Cycle (Integration Testing)

6.3. MFAT Cycle (Mock Full Acceptance Testing / Performance Testing)

Dummy Data Creation OR Sample Data Validation
(Development/Validation Cycle)