Are you saviing the CatalogEntryDto one record at a time or btaching them in groups of 50 or so
One at a time. Using this code as basis:
http://sdk.episerver.com/commerce/1.1.2/Content/Developers%20Guide/Catalog%20System/Catalog%20How%20To%20Code%20Samples/Creating%20Catalog%20Entries.htm
So you can batch them in groups?
How are you handling updates to a single catalog entry if it's the same as before? I'm guessing that most of your entries will be unchanged?
Being able to normally only make a delta update of the actual changes will probably speed up things. That took down an import job I had at least from 45 min to 50 s. (It was a bet to get it down below 1 min within a working day...I won :))
Reading and comparing is normally cheap, storing is expensive. Might be an idea?
We expereinced this with our two clients. It is really very slow to do this using Mediachase Classes. But with the help of EpiSErver Support we develop a schedule JOB that run a SSIS Package in background and imports data using Custom stored procedures. One of the clients have more then 10000 items and we are still facing a tough time while rebuilding Indexes for them. with 10000+ items it was taking more then hour for us.
Yup, that seems like the second variant of solution. Batch things in directly to the database. I did that to improve performance for an AD import once. It's a bit risky though and upgrades to the system will likely be tougher so probably only use that as a last resort...
@Daniel
Well your idea was up for discussion before we implemented the current solution. Our current solution just deletes all and import them again. The reason for this is that we also need to delete products from the catalog if they are removed from the remote catalog. So the eastiest way was just to delete all and create them from scratch each time.
@K Khan
Well our import takes 1½ hour for 18000 entries (products and variants). But we now face the problem of who will pay for changes the import functionality? Best situation is if we somehow can speed up the existing, although I am pretty pessimistic about this :)
What if you import to a new catalog (invisible for users until the import is done) while keeping the old while running the import. When import is done you switch to the new catalog and delete the old. Import job will still take the same time but downtime for users will be kept to a minimum...
Kind of like how search engines update their index...They keep their old until the new is ready to use, then switch...
I am sorry but i really didnot understand that how import is causing a down time and how visitors are effecting with the import process.
I Assume
Product listing pages are displaying results from Some Search Provider(e.g. Luscene search or Apptus) (Index will remain same untill you rebuild so it doesnot matter what are you importing.)
Category Pages/Product Pages are accessing data from DB (User will be able to see what every data is available at that time, If item have been removed at that time in new import it will require handling.)
Well it is causing a down time. We delete all entries from the catelog node before creating them again. So the results from the index in our case solr cannot be paired to entries, and therefore not show any result.
In standard setup (enoteca) the product listing page display results in the form as entries, which means that the result from index is paired with entries from the catalog db.
Well. I dont think so we should do this Untill we have some very special requirement.
Drawbacks if you are deleting items:
If Some item has been ordered and you have deleted that, In Order History that will not be available.
Never Delete any item once it has been added in catalog. Rather Use IsVisible field on website to decide it is available to show or not.
You can use 'Update action for existing entries. and 'Insert' Action for new ones.
If you really want to delete entries before inserting and have a very solid reason then Yes to avoid downtime you have to find solution as you purposed before. That can bring different and more issues.
Yeah, keeping track of changes and only update what you need to update / delete is probably your best option.
Both for performance and avoid killing all the references.
Sorry :)
Orderlines and entries are related only by entry code (no direct FK relations). So Orderhistory is intact (of course it should be)
You can easily get the entry by code if you maintain same entry code. Only thing is that you cannot see detailed product info (but why would you as the product no longer exists) if entry was deleted.
There can be more impacts, If some entry is deleted and not Inserted again due to any failure, That would mean you have lose all of your related refferences with that product code.
It looks to me it will effect DB performance also. We are reindexing an item again. You will have to maintain DB. Keep an eye on page sizes also.
Still if you are convinced that you will delete items before inserting. Then to reduce the downtime with same DB. you may can structure your CSV as
Delete Action for Item 1
Insert Action for Item 1
Delete Action for Item 2
Insert Action for Item 2
First of you will not loose any references if you create them again with same entrycode. If you need a complete reference for you orderhistory ofcourse you cant delete the entry then. But we have no need, the orderline description and price is sufficent for our order history. Remember that the OrderSystem is an independant system from the Catalog system, with no direct relations (my database developer genes actually opose to this, but i kindda like it in this situation).
And Khan we are not using CSV import. We fetch the entire Catalog from a webservice, and then use the API to create the entries. I guess if we used the CSV import we would not have the performance problems?!?!?! (which would mean we should develop creations of CSV files, instead of the already existing object mapping we have. Not to mention the import of asserts and csv not working that good right?)
My problem is that the remote catalog will not report any products DELETED. If products in the remote catalog is deleted they are not transfered.
The products is however somewhat categorized. So a possible solution could be, to handle the seperate catagories and fetch them from the db first and check if they are in the bundle from the remote catalog. But this kind of reversed logic is from my oppinion more complex and prone to error, and should be avoided.
I would rather like Mark (from MediaChase) to elaborate on the grouping (hope its not csv he is talking about))?
In my experience, Unfortunatly CSV import will take same or 1-5 minutes more then calling api calls from some other application.
There are a couple different ways to get a catalogentrydto with multiple entries. I am just showing one way you could get the entries. I also only showing the CatalogEntryRow. You would have to do something similar for the other related tables. This will add or update the records. The nice thing about this is it will nor save them all, just the ones that have been modified or added. Since this is just a typed dataset it will look at the row state to determine if to update or add. This also does not handle deletes. You could easliy write a routine to search for all the ones you need to delete. If you want to keep how you are currently doing you would just have a for loop with the batch size. Then instead of doing entry.CatalogEntry[0].CatalogEntryId you would do entry.CatalogEntry[i].CatalogEntryId. Also if you need to add/update metafields you would need to do after the routine if you want to save in batch. The metaobject requires a valaid catalogentryid. I sometimes will update the metafields on updates and then for new ones add them after saving the dto. Anyway let me know if you have any further questions.
string codes = "'a','b','c','d','e','f','g'";
CatalogSearchParameters parameters = new CatalogSearchParameters()
{
SqlWhereClause = String.Format("Code in ({0})", codes)
};
CatalogSearchOptions options = new CatalogSearchOptions()
{
RecordsToRetrieve = 25,
ReturnTotalCount = true,
StartingRecord = 0
};
int count = 0;
CatalogEntryDto entries = CatalogContext.Current.FindItemsDto(parameters, options, ref count,
new Mediachase.Commerce.Catalog.Managers.CatalogEntryResponseGroup(Mediachase.Commerce.Catalog.Managers.CatalogEntryResponseGroup.ResponseGroup.CatalogEntryFull));
foreach (string code in codes.Split(','))
{
CatalogEntryDto.CatalogEntryRow entryRow = entries.CatalogEntry.FirstOrDefault(x => x.Code.Equals(code));
if (entryRow == null)
{
entryRow = entries.CatalogEntry.NewCatalogEntryRow();
entryRow.ApplicationId = AppContext.Current.ApplicationId;
entryRow.CatalogId = 1;
entryRow.ClassTypeId = "Variation";
entryRow.Code = "PRODUCTCODE";
entryRow.MetaClassId = 23;
entryRow.SetSerializedDataNull();
}
entryRow.EndDate = DateTime.Now.AddYears(2).ToUniversalTime();
entryRow.IsActive = true;
entryRow.Name = "PRODUCTNAME";
entryRow.StartDate = DateTime.UtcNow;
entryRow.TemplateName = "DigitalCameraTemplate";
if (entryRow.RowState == DataRowState.Detached)
entries.CatalogEntry.AddCatalogEntryRow(entryRow);
}
CatalogContext.Current.SaveCatalogEntry(entries);
We have developed an EPiServer Scheduled plugin that handles a nightly import of products / variant from a 3rd party system (via some rest webservices).
But inserting each entry into the catalog is relativly slow. It takes about 30 minutes to insert 6000 items, which is alot.
Anyone with experience into this?