I'm looking for a way to find the duplicate images in Episerver that have different names. Does anybody knows if we can do this and not do to much processing of the files in memory?
A bit of context: now we have images for each product. Some images contain multiple products, but each image is uploaded differently for each product.
We would like to display related products and their images too. The problem is that most of the related product already have the same image, but uploaded differently - which is basically a different image in Epi.
Is there a way to find the duplicates images?
If the episerver image upload is idempotent for the image file, I guess you could compare the sizes and (most probably) find duplicates without any error margin :D
I think the images will almost all have the same sizes as they are formated already.
I was thinking of a hash or something that should be unique, but that could take a lot in processing. Is this what you had in mind or if not can you explain a little further?
We could also do a job to find them, but then again I'm not sure how to store the updated data since we don't want to lose the initial names and paths of the images.
I'm thinking of a new property for the media updated after the job - not sure on the impact on the performance.
I would do it with file MD5 checksum.
You could calculate this on a published event when a image is uploaded and then save this on the image files content data in a new property.
For checking if dublicate exits you could do a scheduled job that saves the result as a custom report.
Run through Vision API or similar service to find also close matches between the images :)