On DXC assets are not FileBlob, but AzureBlob. that's why your code won't run properly. I suggest to go with the abstraction (Blob) instead. I'm not familiar with DataLakeHelper, but it might have a method that take a stream, then you can use blob.OpenRead() to supply that stream to upload
Let me try the options. For the basic part, if some one can guide me if the approach am taking is correct.
I use a schedule job which in turn uses epi find to get content (may be around 25000) and write them to a file in Azure blob and then read this file to transfer to AWS Datalake using Amazon sdk from nuget.
Not sure if there is any preferred way for this kind of requirement. I see that epi find sometimes chokes to pull the amount of content and secondly accessing the file from Azure blob is kind of getting tricky.
I have no idea whether your approach is the most optimal one, but I would reverse responsibilities a bit and would implement following workflow:
It of course depends of the size of your blobs and whether you will be able to embed blob content into queue item (usually queues have quite small item size limitations) meaning that you might need to either just add reference to blob using SAS tokens or similar access option, or use Queue Attachment plugin (https://www.nuget.org/packages/ServiceBus.AttachmentPlugin/) for example (if you are on .NET on the other side).
Why I would split this workflow? It's because I would have few benefits out of the box:
I am having a scheduled job that creates a file using IBlobFactory's CreateBlob method and eventually saving it in media documents using ContentRepository.Save() with SaveAction.Default.
Now I want to access this file to transfer it to datalake system. Below is the code I am trying : The line where am checking if the BinaryData is FileBlob errors out with object reference error. This works locally but errors out in DXC environment. Is there anything i am missing in the process to access the file or is this not the correct to get handle of the physical file to transfer it.
var file = this._contentLoader.Get<DataLakeDataFile>(this.DataFileContentReference);
if (file != null)
{
if (file.BinaryData is FileBlob fileBlob)
{
var filePath = fileBlob.FilePath;
DataLakeHelper.UploadFile(DataLakeDocumentReportETLSettings.DocumentReportBucketName, filePath);
DataLakeReportsHelper.BroadcastAndLogInformation(this._loggingService, this._statusBroadcast, "File has been transferred.");
}
else
{
throw new Exception($"Error: data file was retrieved but it can't be used as a FileBlob.");
}
}