Java CAPS 5.1 and Java CAPS 6 – Streaming Large FTP Transfers

Transferring large payloads, on the order of tens or hundreds of megabytes, between a FTP server and a local file system, in either direction, requires selection of appropriate features of the Batch FTP and Batch Local File eWays, and tuning certain timing parameters.

Default timing parameter values result in timeout exceptions when transferring large payloads.

The Batch FTP eWay and the Batch Local File eWas are typically used to receive the entire payload before writing it out. This results in attempts to allocate memory many time the size of the payload being transferred and, for large files, causes memory exhaustion and application server failures.

Discussion in the attached document points out which timing parameters need to be tuned to facilitate transfer of large payloads. It also presents sample Java code that uses facilities of the Batch FTP and Batch Local File for streaming payload between the FTP server and the local file systems without using excessive amount of memory.

The material covered in the document was prepared using Java CAPS projects developed and tested in Java CAPS 5.1.0, exported, imported into Java CAPS Release 6 and tested again. It is expected that the code will work in all versions of Java CAPS from 5.1.0 up.

Streaming Large FTP Transfers with CAPS 5.1 and 6.pdf

FTPtoLocalFileStreaming_5.1.0_project_export.zip

LocalFiletoFTPStreaming_JC6_project_export.zip

34 thoughts on “Java CAPS 5.1 and Java CAPS 6 – Streaming Large FTP Transfers

  1. Ravi

    Hi Michael,
    Thanks for the screen shots of the new IDE. Anycase quick question, if i am using streaming, and use a JMS queue to store that message, should I have to still factor in the max size of the file I am polling for or can be something significantly smaller?

    Thanks
    Ravi

    Reply
  2. Michael Czapski

    Hello, Ravi.

    I assume you are referring to my blog entry on FTP to local file streaming. If not then the response may not make sense.

    The streaming solution relies on the fact that at no point is the entire payload in memory. The collaboration (or rather the eWay infrastructure) reads the file a chunk at a time and writes each chunk to the local file until finished. If you need to put that payload into a JMS message then it _will_ be in memory, several times. Furthermore, it does not make much sense, if you need to handle the payload as a message, to use streaming. Streaming is for either getting the content of the file from source to a destination file as efficiently as possible or for getting the content and breaking it us into individual messages as efficiently as possible. For the latter you would also use a Batch Record eWay. If the payload is on the order of a few Mb then reading it in its entirety and passing it around as messages is likely to not be harmful. If it gets much bigger then that then you will likely have issues. So, in short, you have to factor the size of he payload if you need to handle it in its entirety. Depending on the solution you have, how many times the message is processed by different components, etc., you may have to factor the size several times over.

    Regards

    Michael

    Reply
  3. Meenakshi

    Hi Michael,
    This streaming looks like it can be used only for FTP to BatchLocal.
    Can this be done similarly for FTP to Database table? I mean a bulk insert of the received .CSV file to Database table.

    Thank you,
    Regards,
    Meenakshi Mandal

    Reply
  4. Michael Czapski

    Hello, Meenakshi.

    To accomplish FTP tyo a DB table streaming you will need to strem from BatchFTP to Batch Record then inert into a DB table. Batch Record will give you a record, for example CR/LF delimited. You will need to parse the record, map fiellds of the record to columns in the DB OTD then do an Inert and, possibly, a commit after each recoord or after several records.
    Get record, parse, insert will execute in a loop.
    This will give you the ability to process arbitrarily large files without havinf to read their contents into memory for parsing.

    Regards

    Michael

    Reply
  5. David

    Hi Michael,

    We are actually trying to accomplish the opposite : make sure the file is completed loaded into memory before process it. We are using BatchInboud eWay to pull the directory of the file and then use BatchLocal to read the file.

    However, the JCD unmarshal the data before the file is completely transferred. Is there anyway to make sure the entire file is loaded to the memory before processing the data?

    Reply
  6. Michael Czapski

    Hello, David.

    By default the batch Local File eWay does precisely that. The example in this blog entry shows a way to override this default behavior. The Batch eWay documentation for Java CAPS 5.1.3, http://docs.sun.com/app/docs/doc/820-0981, Chapter 7, Using the Batch eWay with Java Collaborations, discusses this and provides examples that may help you.

    Regards

    Michael

    Reply
  7. David

    Michael,

    Thanks for your reply. I wasn’t clear on my question – the trigger file is large,and therefore the process gets triggered before the entire file has been created. In this situation, the BatchLocal eWay seems to load the partially transferred file. Can we override this behavior?

    Additionally, Batch SFTP doesn’t seem to support data streaming suggested in this post. How can we use data streaming if we need secure FTP?

    Reply
  8. Michael Czapski

    Hello, David.

    Alas, the only way to prevent Batch Inbound from triggering the Batch Local File is to make sure the writer of the file write the file completely and only then renames it to what the Batch Inbound is looking for.

    I have not spent the time looking into SFTP so I have no advice to offer on this topic.

    Regards

    Michael

    Reply
  9. Naveen

    Hi michael,
    Is there any way to streaming the data between BatchFTP External system to another BatchFTP External System.Here my requirement is Source and target FTP should handle size with unlimited size.

    Reply
  10. Michael

    Hello, Naveen.

    There is no way, of which I am aware, to get a StreamAdpater from the FTP OTD so there is no way to use a StreamAdapter from one FTP OTD in another FTP OTD, so the only way to do this that I know of is to stage through the local file system.

    Regards

    Michael

    Reply
  11. naveen

    Hi Michael,
    Could u please tell me how to calculate the file size from a file in FTP Server.Here FTP server can be anything like HP UNIX,AIX UNIX or Windows.Here the restriction is we should not archive it because File size is unlimited.I am using data streaming from one Batch FTP External System to Batch Local and transfer from Batch Local to another Batch External System.Here my idea is split the Data in to equal data segments based on the file size and put the data segments in to Batch Local System and inturn it will put in to another Batch External System.But the problem here is how to calculate the file size from Input file…?

    Reply
  12. Michael

    Hello, Naveen.

    I am not aware that there is a way to calculate a file size using the Batch eWay. There are a bunch of assumptions in FTP to Local File (staging) to FTP streaming. One assumption is that the file is ‘complete’ at the time the transfer starts – that is, whatever wrote the file finished writing it before the Batch eWay gets hold of it. Another assumption is that you can process the complete file – stream it to the local file system – before another batch eWay instance starts processing it (for the scenario you are describing). Again, this is for the same reason – must have complete file otherwise a premature transfer termination will occur if the writer is slower then reader – incomplete file will be transferred. This, in turn, assumes that the staging area has sufficient space to accommodate the file.

    There is a way in which you can stream a file and break it into records of fixed size or at a delimiter. This is described in product documentation and in the Java CAPS Book, with examples. For the fixed size transfer the file must be a multiple of the ‘size’, otherwise transfer will abort on reading teh last ‘short’ record. Since you don’t know the size of the file ahead of time, and there in any case may be files whose size is a prime, breaking the file at record size boundaries does not appear to be appropriate or workable for your case. If the files are not delimited then this option is not available to you either. I don’t know what kind of files you are dealing with.

    Regards

    Michael

    Reply
  13. Naveen

    Hi michael,
    Thanks for your reply….
    Here my file can be any type and any size.Here i have to devolop an application which is absolutely dynamic. and FTP server from where we are picking files can be either AIX UNIX,HP UNIX or Windows.I have got the idea to transfer the data from one BATCH FTP External system to another(as i have described earlier) but it is completely depends on Size of the input file from source FTP External System.Here my problem is File can be unlimited size.In that case it is not good approach to archive the entire file in Local system using Batch Local because it may leads to ‘Out of Memory Exception’.
    Thanks in advance…

    Regards,
    Naveen

    Reply
  14. Michael

    Hello, Nareen.

    You get ‘Out of Memory’ exceptions form loading large payloads into memory, as you normally would with Batch eWay _NOT_ in streaming mode. In streaming mode the file is transferred from the remote server to the local file system in such a way that at no point in time is the entire file, or even a significant part of it, loaded into memory. As I write in the blog, I transferred payloads ranging from a few K to over 1GB without a visible change in Application Server memory utilization. I would have been able to transfer files several Gigabytes in size had I had the disk space on my laptop 🙂 It is the disk space you need to accommodate the file in teh local staging file system, not memory.

    Regards

    Michael

    Reply
  15. Naveen

    Yeah Michael.what ever u told is right.But in the dynamic environment we may not expect local staging file system
    should have that enough disk space.Because file sizes can be in Tera Byte.And we can deploy this application any where.
    otherwise i could have used like this.
    byte[] b=otdLocalStaging.getConfiguration().getPayload();
    int size=b.length();
    And can divide the Payload based on the Payload size.

    Thanks and Regards,
    Naveen

    Reply
  16. Michael

    Hello, Nareen.

    Before you can issue: .getPayload() you will have to have the payload in memory via non-streaming get – precisely _NOT_ what you desire to do.

    The Java CAPS Book has an example solution which reads a file a buffer-full at a time, using a programmer-controlled buffer. This solution does not suffer from the limitation of the file having to be multiple buffer size or having to be delimited.

    Review section 2.6, Data Streaming, in Part II of the Java CAPS Book, specifically Listing 2-44. This may or may not help you.

    It sounds like a very strange requirement – transferring files, terabytes in size, over a network. I think solution architecture review is in order.

    Regards

    Michael

    Reply
  17. NaveenPRP

    Hi,
    Could anyone please tell me how to calculate the size of the file from Batch FTP External System with out reading the contents from the file? The Batch FTP External System can be either in HP UNIX,AIX UNIX or Windows environment.

    Reply
  18. naveenprp

    Hi michael,
    This is regarding Streaming.My requirement is first i need to stream from BatchFTP to BatchLocal and Then BatchLocal to BatchFTP.i.e

    BatchFTP1—>BatchLocal
    BatchLocal—>BatchFTP2

    Is there any way in which we can transfer only a part of data(Chunk of data) from BatchFTP to BatchLocal and then BatchLocal to BatchFTP and then next chunk and goes on…
    i.e.
    Chunk of data Chunk of data
    BatchFTP1———————>BatchLocal————————>BatchFTP2
    .
    .
    .
    .

    Chunk of data Chunk of data
    BatchFTP1———————>BatchLocal————————>BatchFTP2
    Till the end of data from BatchFTP1.
    So that i can transfer entire data from BatchFTP1 to BatchFTP2 irrespective of filesize.

    Awaiting for your reply.

    Reply
  19. Nagireddy Patil

    Hi Michael,
    Is it possible to transfer a file from source FTP system to Destination FTP system without using local Batch http://FTP..?If not,why it is possible can you explain..?

    Reply
  20. Michael Czapski

    Hello, Nagireddy.

    Using Batch FTP eWay or Batch FTP JCA Adapter it is possible to transfer a file between two remote FTP systems but only if the entire payload can be kept in1 memory during transfer. This severely limits the size of the payload that can be so transferred. The basic idea is that inbound Batch FTP eWay/JCA Adapter reads the content of the remote file (it is represented as a byte array) and writes it to the remote FTP server using another instance of the Batch FTP eWay/JCA Adapter.

    Does this help?

    Regards

    Michael

    Reply
  21. Nagireddy Patil

    Hello Michael,
    Thanks for your reply.
    I understand the memory constraints involved in transferring a file between two Batch FTP systems without using data streaming. I wanted to know if streaming could be achieved without using/involving the Batch Local File, i.e., instead of streaming data to Batch Local File can’t we use a Batch FTP?
    Is it possible to set Batch FTP OTD instance’s OutputStreamAdapter to the value of another Batch FTP OTD instance’s OutputStreamAdapter???

    Thanks and regards,
    Nagireddy.

    Reply
  22. Michael Czapski

    Hello, Nagireddy.

    Alas, no. This question popped up before. The only possibility I can think of, and I have not tried this, is to use Batch FT -> Batch Record -> Batch FTP. That would require the payload to be breakable into records either on delimiters or on a fixed size boundaries. If you payload is of this kind perhaps you can give it a try and post a comment here.

    Regards

    Michael

    Reply
  23. Nagireddy Patil

    Hello Michael,

    Thanks for your reply,As you told i will try with Batch Record if it sucesseds then i will definatly post it..

    Thanks
    Nagireddy Patil

    Reply
  24. kevin

    Hi Michael,

    Thanks for the sample project.

    I had imported your project and added Scheduler eWay to trigger the col.
    it work fine if I define file name in Target Location cBatchFTP eways connector properties. When I changed this properties to:
    Target File Name: [Sc]CR.*\rtf
    target File Name is Pattern : Yes

    got exception error

    message=[BATCH-MSG-M0172: FtpFileClientImpl.get(): No qualified file is available for retrieving.].Nested exception follows: —
    java.io.FileNotFoundException: BATCH-MSG-M0172: FtpFileClientImpl.get(): No qualified file is available for retrieving.— End of nested exception.

    I also added following codes to jcdFTPtoLocalFile:

    vFTPIn.getConfiguration().setTargetFileName(vFTPIn.getClient().getResolvedNamesForGet().getTargetFileName());
    vFTPIn.getClient().get();

    Regards

    Kevin

    Reply
  25. Michael Czapski

    Hello, Kevin.

    The error "No qualified file is available for retrieving" means that your regular expression does not match any files.

    The regular expression you are using for the target file name does not seem right. Try testing your regular expression with one of the online regex testing sites, for exmaple, http://www.regexplanet.com/simple/index.jsp, to see if your returns the kinds of file names you are expecting. I don’t knwo what kinds of file names you need to match so I can not suggest a regular expression that would work for you.

    Regards

    Michael

    Reply
  26. Kevin

    Hi Michael

    Thanks for your reply

    I had fixed the issue by changing the following

    1. added $ to file pattern [Ss]CR.*\.rtf$
    2. in jcdFTPtoLocalFile changed setTargetFileNamePattern to true
    vFTPIn.getConfiguration().setTargetFileNameIsPattern( true );
    Regards

    Kevin

    Reply
  27. las

    Hi Michael,
    Thanks for the good example.
    I wonder how we stream more than one file?

    In the payload-based approach, we can do:

    BatchLocalFile_in.getClient().getIfExists();
    while (BatchLocalFile_in.getClient().getPayload() != null) {
    BatchFTP_out.getClient().setPayload( BatchLocalFile_in.getClient().getPayload() );
    BatchFTP_out.getClient().put();
    BatchLocalFile_in.reset();
    BatchLocalFile_in.getClient().getIfExists();

    }

    I am not sure how to do similar coding in streaming approach?

    Thanks.
    lt

    Reply
  28. las

    Hi Michael,
    Thanks for your quick reply.

    I have a question, when is the content of the file loaded into memory?
    right after:

    G_BLFIn.getClient().get();

    or right after:

    G_BLFIn.getClient().getPayload();

    My guessing is right after: G_BLFIn.getClient().get(), and if it is then combine the two approaches does not help much.

    Thanks.
    lt

    lt

    Reply
  29. Michael Czapski

    The following JCD will stream multiple files, identified by a name pattern, from a local file system to a remote FTP server, as fast as it can, until it runs out of files. The name pattern and source directory are configured in the connectivity map. The target FTP server and directory are configured in the connectivity map.

    Beware, the JCD opens and closes connections for each file. If there is a firewall between the app server and teh FTP server it may consider this a DoS attack and refuse to allow connections.

    package Stream100sLocal2FTP;

    import com.stc.eways.batchext.LocalFileException;
    import com.stc.eways.common.eway.standalone.streaming.StreamingException;
    import com.stc.eways.batchext.BatchException;
    import java.io.FileNotFoundException;

    public class jcdFilesProcessor
    {
    long lStartMillis = System.currentTimeMillis();
    int iTimeoutMillis = 40 * 60 * 1000;

    public com.stc.codegen.logger.Logger logger;

    public com.stc.codegen.alerter.Alerter alerter;

    public com.stc.codegen.util.CollaborationContext collabContext;

    public com.stc.codegen.util.TypeConverter typeConverter;

    public void receive( com.stc.connectors.jms.Message input, com.stc.eways.batchext.BatchLocal G_BLFIn, com.stc.eways.batchext.BatchFtp vFTPOut )
    throws Throwable
    {
    ;
    long lNow = System.currentTimeMillis();
    java.util.Date dtNow = new java.util.Date( lNow );
    logger.debug( "\n===>>> Received trigger " + input.getTextMessage() + " at " + lNow + ", " + dtNow );
    ;
    int i = 0;
    boolean blMore = true;
    com.stc.eways.common.eway.standalone.streaming.InputStreamAdapter isa = null;
    ;
    while (blMore) {
    try {
    // log current file
    //
    logger.debug( "\n===>>> got file " + ++i + " " + G_BLFIn.getClient().getResolvedNamesToGet().getTargetFileName() );
    ;
    vFTPOut.getConfiguration().setTargetFileName(G_BLFIn.getClient().getResolvedNamesToGet().getTargetFileName());
    vFTPOut.getConfiguration().setTargetFileNameIsPattern( false );
    ;
    vFTPOut.getConfiguration().setDataConnectionTimeout( iTimeoutMillis );
    vFTPOut.getProvider().setDataSocketTimeout( iTimeoutMillis );
    vFTPOut.getProvider().setSoTimeout( iTimeoutMillis );
    ;
    isa = G_BLFIn.getClient().getInputStreamAdapter();
    vFTPOut.getClient().setInputStreamAdapter( isa );
    vFTPOut.getClient().put();
    ;
    // prepapre for next file
    //
    isa.releaseInputStream(true);
    ;
    if (!vFTPOut.reset()) {
    logger.error( "\n===>>> Failed to reset FTP" );
    throw new Exception( "Failed to reset FTP" );
    }
    if (!G_BLFIn.getClient().reset()) {
    logger.error( "\n===>>> Failed to reset Local File" );
    throw new Exception( "Failed to reset Local File" );
    }
    } catch ( com.stc.eways.batchext.BatchException be ) {
    //
    // File Not Found is expected and benign
    // That exception is so deeply nexted that the following code
    // is needed to determine if this is what cause the excetion
    //
    logger.debug("\n===>>> Nexted Exception: " + be.getNestedException().getClass().getName());
    if (be.getNestedException() instanceof FileNotFoundException) {
    FileNotFoundException lfe4 = (FileNotFoundException) be.getNestedException();
    logger.error( "\n===>>> Ignoring expected File Not Found Exception: " + lfe4.getMessage() );
    blMore = false;
    } else {
    logger.error( "\n===>>> Unexpected BatchException:" + be.getClass() + "\n", be );
    blMore = false;
    }
    } catch ( Exception e ) {
    logger.debug("\n===>>> Exception name: " + e.getClass().getName());
    logger.error( "\n===>>> Exception getting file " + e.getCause() + "\n", e );
    blMore = false;
    }
    }
    }

    }

    Reply
  30. Pingback: Java CAPS 6, JCA, Note 4 – Streaming FTP Inbound

Leave a Reply

Your email address will not be published. Required fields are marked *