Find duplicate images

Vote:
 

Hi,

I'm looking for a way to find the duplicate images in Episerver that have different names. Does anybody knows if we can do this and not do to much processing of the files in memory?

A bit of context: now we have images for each product. Some images contain multiple products, but each image is uploaded differently for each product.

We would like to display related products and their images too. The problem is that most of the related product already have the same image, but uploaded differently - which is basically a different image in Epi.

Is there a way to find the duplicates images?

Thank you,

Sergiu

#207866
Oct 07, 2019 15:43
Vote:
 

If the episerver image upload is idempotent for the image file, I guess you could compare the sizes and (most probably) find duplicates without any error margin :D

#207867
Oct 07, 2019 15:54
Vote:
 

I think the images will almost all have the same sizes as they are formated already.

I was thinking of a hash or something that should be unique, but that could take a lot in processing. Is this what you had in mind or if not can you explain a little further?

We could also do a job to find them, but then again I'm not sure how to store the updated data  since we don't want to lose the initial names and paths of the images.

I'm thinking of a new property for the media updated after the job - not sure on the impact on the performance.

#207868
Oct 07, 2019 16:05
Vote:
 

I would do it with file MD5 checksum. 

https://stackoverflow.com/questions/10520048/calculate-md5-checksum-for-a-file

You could calculate this on a published event when a image is uploaded and then save this on the image files content data in a new property. 

For checking if dublicate exits you could do a scheduled job that saves the result as a custom report.

#207870
Oct 07, 2019 16:49
Vote:
 

Run through Vision API or similar service to find also close matches between the images :)

#207883
Oct 07, 2019 23:31
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.