Class NetworkCrawler

  • All Implemented Interfaces:
    DataProvider

    public class NetworkCrawler
    extends Object
    implements DataProvider
    Provider for data files directly fetched from network.

    This class handles a list of URLs pointing to data files or zip/jar on the net. Since the net is not a tree structure the list elements cannot be top elements recursively browsed as in DirectoryCrawler, they must be data files or zip/jar archives.

    The files fetched from network can be locally cached on disk. This prevents too frequent network access if the URLs are remote ones (for example original internet URLs).

    If the URL points to a remote server (typically on the web) on the other side of a proxy server, you need to configure the networking layer of your application to use the proxy. For a typical authenticating proxy as used in many corporate environments, this can be done as follows using for example the AuthenticatorDialog graphical authenticator class that can be found in the tests directories:

       System.setProperty("http.proxyHost",     "proxy.your.domain.com");
       System.setProperty("http.proxyPort",     "8080");
       System.setProperty("http.nonProxyHosts", "localhost|*.your.domain.com");
       Authenticator.setDefault(new AuthenticatorDialog());
     

    Gzip-compressed files are supported.

    Zip archives entries are supported recursively.

    This is a simple application of the visitor design pattern for list browsing.

    Author:
    Luc Maisonobe
    See Also:
    DataProvidersManager
    • Constructor Detail

      • NetworkCrawler

        public NetworkCrawler​(URL... urls)
        Build a data classpath crawler.

        The default timeout is set to 10 seconds.

        Parameters:
        urls - list of data file URLs
    • Method Detail

      • setTimeout

        public void setTimeout​(int timeout)
        Set the timeout for connection.
        Parameters:
        timeout - connection timeout in milliseconds
      • feed

        public boolean feed​(Pattern supported,
                            DataLoader visitor)
        Feed a data file loader by browsing the data collection.

        The method crawls all files referenced in the instance (for example all files in a directories tree) and for each file supported by the file loader it asks the file loader to load it.

        If the method completes without exception, then the data loader is considered to have been fed successfully and the top level data providers manager will return immediately without attempting to use the next configured providers.

        If the method completes abruptly with an exception, then the top level data providers manager will try to use the next configured providers, in case another one can feed the data loader.

        Specified by:
        feed in interface DataProvider
        Parameters:
        supported - pattern for file names supported by the visitor
        visitor - data file visitor to use
        Returns:
        true if some data has been loaded