Copy Production data to Developer Sandbox
I'm on Enterprise Edition and only have a developer sandbox... (I really miss my Full Copy sandbox and UE).
I need to do some tests against "real" data. I have a strict rule of not testing or developing in Production. This means I need to get a sample of my Real data in production into my sandbox.
Is there anyway to do this without doing a bunch of exports, imports, and additional reports/imports to get the relationships right?
I have a few ETL scripts I built using Talend Open Studio to move data over.
Using a tool like Talend (which is Free and has SFDC Connectors) or Informatica Cloud or Boomi, you can create scripts to pull data from your Production Org and populate in your Sandbox Org. Of course depending on your sandbox you'll hit size limits so you need to filter your data pulls. Not a big deal.
This approach also requires Enterprise Edition as you will need API access.
You can build your scripts to properly create the relationships.
Step 1 - Upsert Accounts (Store SFDC Production ID in an External Field, say "PROD_ID__c". Step 2 - Upsert Contacts (Store SFDC Production ID in an Exteral Field, say "PROD_ID__C". Use ACCOUNTID on the Contact as the External ID in the script to automatically relate the Parent / Child Objects. Step 3 - Continue Pattern.
Pros of ETL Approach: Repeatable & Saves you TONS of time on Sandbox Refreshes. Less error prone as your not manually running Data Loader / Reports. Low Cost if you use Open Source or Jitterbit, etc. You can automate this to keep data in sync as necessary via schedulable services (Apple Automater, Windwos Schedule Services, etc)
Cons: Requires some up front time & effort to build out the ETL scripts. IF you modify the Fields in Production you need to update in Sandbox. You need to know your Data Model. May sound stupid but if you have dependencies on multiple objects you need to build your script so that it loads the Object Records in an order that makes sense.
thanks for your suggestion. I'm trying TalenD for a few days now and I can only say **I like it a lot**! It seems that you already have some experience with Salesforce and TalenD yet. I run into some challenges and maybe you could point me in the right direction: http://salesforce.stackexchange.com/questions/47046/copy-opportunities-with-lineitems-with-talend and http://salesforce.stackexchange.com/questions/46939/datamigration-salesforce-to-salesforce-with-talend-best-practice-for-self-refe
Recently, I began using Jitterbit's dataloader replacing the Salesforce official version. I found it much better for this type of scenario because of the complex login, source, target save scenarios, and scheduled actions. Short of an ETL tool or the more complex suggestions above, this seems to be a decent option. And it's free!
You could also look into Monarch a product by Dreamfactory the folk who do Snapshot Monarch
We use them and it is extremely easy. We have a checkbox on the Account object hidden to all non-sys admin users and then we populate 100 Account Records and then Monarch helps us pull all the dependencies right down to Assets, Service Contracts etc. We use this to import (again using Monarch) to all our Dev sandboxes.
This can be done, for free, with open-source ETL tools, for example TalenD
I've just written a lengthy step-by-step example of how to do this with TalenD on my blog:
This covers all the details, including how to use External IDs to make sure relationships between objects are preserved as you copy the data from Source to Target.
thanks for your example, it helped me a lot! It seems that you already have some experience with Salesforce and TalenD yet. I run into some challenges and maybe you could point me in the right direction: http://salesforce.stackexchange.com/questions/47046/copy-opportunities-with-lineitems-with-talend and http://salesforce.stackexchange.com/questions/46939/datamigration-salesforce-to-salesforce-with-talend-best-practice-for-self-refe
I have addressed this requirement before using reflection and the SF API. Through an external script you can use reflection to get the schema and query the data from an object. By setting a recursion level you can have the script pull related data from production and insert it into sandbox in reverse order.
There are a lot of hand coded exceptions that have to be built in as some data cannot be transferred and other values need to be referenced by name since ID's will be different (Record types and the ilk)
I implemented my approach in python but you can have your pick of the litter, http://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_describesobjects.htm is the reference API doc for the reflection.
this will return a bunch of info on whether the field is writable, the type, etc. selecting a record by ID afterwards will give you the available data. Put those two things together with some code to recurse over foreign key fields and your off to the races =)
I'm not familiar with reflection. Do you have any recommended sources with examples?
depends on the language, I would suggest looking at http://www.pocketsoap.com/beatbox/ if you are familiar with python. That library has a method called describeSObjects which converts to the API call http://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_describesobjects.htm
We did some R&D on this topic and we built a tool like this to copy and keep in sync the data between sandboxes and production. We use a self written java tool that copies the production data to a mysql database and then from that database to a sandbox you specify. In the tool you can specify what object you want to sync to your sandbox and so on. Now we are trying to create a subset of data to load in a 10mb sandbox environment. So we are still improving the tool but it already has quit some capabillities. If you are interested in such a tool let me know
Yes we are for particular cases. We upgraded the tool so we use Heroku now to store the sync between records not the complete actual data. You can select an object and then it will take all related records. In that way we can move a test subset of data. We use now a bottom up approach but we are still looking at a top bottom approach also.
It appears that as of Winter '14 Salesforce will be offering something like this: https://success.salesforce.com/ideaView?id=08730000000BrKsAAK
In Winter '14 we will be offering a new product that will allow you to copy a sampled subset of production data based on what is defined in a sandbox template.
For each object selected to be copied, we will copy up to 10,000 records. The storage limit for this sandbox will be 5GB and the refresh interval will be 5 Days. The data sampling does take into account required foreign key relationships.
For example, if the sandbox template you define contains Accounts and Contacts, then, when applied to a partial sandbox creation, we will copy up to 10,000 account records and up to 10,000 related Contacts.
the new documentation says this:
Partial Data sandboxes include all of your organization’s metadata and add a selected amount of your production organization's data that you define using a sandbox template. A Partial Data sandbox is a Developer sandbox plus the data you define in a sandbox template. It includes the reports, dashboards, price books, products, apps, and customizations under Setup (including all of your metadata). Additionally, as defined by your sandbox template, Partial Data sandboxes can include your organization's standard and custom object records, documents, and attachments up to 5 GB of data and a maximum of 10,000 records per selected object. A Partial Data sandbox is smaller than a Full sandbox and has a shorter refresh interval. You can refresh a Partial Data sandbox every 5 days.
Note: Partial Sandbox Refreshes have additional licensing costs associated with them. Your mileage will vary based on your Account Executive, Size of your Account with SFDC, etc. If you use your a Open Source ETL tool often this is cheaper than the licensing costs. If you don't have skilled resources inhouse than spending more on Partial DAta Sandbox licenses may be better option.
Try Informatica Cloud Test Data Management (TDM). You simply choose the objects you're interested in moving, and TDM figures out what related objects to bring over. You can also choose groupings of objects as defined by SFDC, like Sales, Marketing, Case, Process, etc. You can find more at: http://www.informaticacloud.com/cloud-tdm
If you want to see more you can just fill out the Contact us link and someone will get back to you.
I've written an Open-Source windows app to do this. Its called SandboxBerry and its on GitHub:
This is how SandboxBerry works:
- You give it a list of Salesforce objects (in the form of an XML file)
- optionally you can specify filters for those objects in SOQL syntax
- SandboxBerry will then transfer the data for those objects from your Source (usually Production) to a Target (always a Sandbox)
- While transferring the data, it fixes the following problems:
- Relationships between objects will automatically be preserved, substituting new Ids for the old Ids
- Any inactive or missing users will be defaulted to the user specified by you
- Only the objects fields that are present in both Source and Target will be transferred
- Optionally, you can skip specific fields or hard-code their values in the Target - this can be useful for anonymisation
Its a tool aimed at developers so you'll need to be happy with editing XML files and knowing the API names for Salesforce objects, but hopefully it will be useful for people.
You can download the built version from the Github Releases section.
The SFDX CLI can download and upsert:
sfdx force:data:soql:query -q "SELECT Name FROM Blog__c" -r csv -u prod > data.csv sfdx force:data:bulk:upsert -s Blog__c -f data.csv -i Name -u sandbox
The object must have field with unique values for all rows. Ideally with a unique constraint or marked as an external id, but not required.
It also supports lookup fields as long as the lookup object has an external id field. When selecting the data use
SELECT Parent.My_External_Id__c, ... FROM ...