List objects in a Amazon S3 folder without also listing objects in sub folders

  • I'm using the Amazon S3 Java SDK to fetch a list of files in a (simulated) sub-folder. This code is rather standard (AWSConfiguration is a class that contains a bunch of account specific values):



    String prefix = "/images/cars/";
    int prefix_size = prefix.length();
    AmazonS3 s3 = new AmazonS3Client(new AWSConfiguration());
    ObjectListing objectListing = s3.listObjects(new ListObjectsRequest().
    withBucketName(AWSConfiguration.BUCKET).
    withPrefix(prefix));


    Now this list will include objects like /images/cars/default.png as well as /images/cars/ford/Default.png (because they both contain the same prefix). To list only the objects that are directly inside the /images/cars/ "folder" I have the following function (in a class called S3Asset)



    public static boolean isInsideFolder(int root_size, String key) {
    return (key.substring(root_size).indexOf("/") == -1);
    }


    This looks at the full key for any / after the prefix as a clue that it is inside a sub-folder. This lets me iterate over the objects with the following code (I'm trimming the prefix for clarity):



    for(S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
    if(S3Asset.isInsideFolder(prefix_size, objectSummary.getKey())) {
    System.out.println(objectSummary.getKey().substring(prefix_size));
    }
    }


    As near as I can tell this is the cleanest way to do this but it has one characteristic that I don't like. If I'm looking a root level folder I'm requesting the names of all files in all sub-folders only to iterate over them and learn that there is only one object in the actual root level folder. I've considered associating a key with the value being the full path of the folder, which would allow me to request objects with a predictable key instead of the prefix, but the major downside to this is that the key would have to be generated in code and therefor assets uploaded directly in to the S3 Bucket (through the management console) would not have this key. Anyone have a better idea?


  • palacsint

    palacsint Correct answer

    9 years ago

    In the ListObjectsRequest javadoc there is a method called withDelimiter(String delimiter). Adding .withDelimiter("/") after the .withPrefix(prefix) call then you will receive only a list of objects at the same folder level as the prefix (avoiding the need to filter the returned ObjectListing after the list was sent over the wire).



    Some notes about the code:



    1, I'd extract out to a local variable for the ListObjectsRequest instance:



    final ListObjectsRequest listObjectRequest = new ListObjectsRequest().
    withBucketName(AWSConfiguration.BUCKET).
    withPrefix(prefix);
    final ObjectListing objectListing = s3.listObjects(listObjectRequest);


    It's easier to read.



    2, root_size should be rootSize. (Regarding to the Java Coding Conventions.)



    3, I would use String.contains instead of indexOf. It's more meaningful, easier to read since you don't have to use the -1 magic number.



    4, In the last snippet I'd create a local variable for the key:



    for (final S3ObjectSummary objectSummary: objectListing.getObjectSummaries()) {
    final String key = objectSummary.getKey();
    if (S3Asset.isImmediateDescendant(prefix, key)) {
    final String relativePath = getRelativePath(prefix, key);
    System.out.println(relativePath);
    }
    }


    5, Furthermore, I'd move the length call inside the helper method:



    public String getRelativePath(final String parent, final String child) {
    if (!child.startsWith(parent)) {
    throw new IllegalArgumentException("Invalid child '" + child
    + "' for parent '" + parent + "'");
    }
    // a String.replace() also would be fine here
    final int parentLen = parent.length();
    return child.substring(parentLen);
    }

    public boolean isImmediateDescendant(final String parent, final String child) {
    if (!child.startsWith(parent)) {
    // maybe we just should return false
    throw new IllegalArgumentException("Invalid child '" + child
    + "' for parent '" + parent + "'");
    }
    final int parentLen = parent.length();
    final String childWithoutParent = child.substring(parentLen);
    if (childWithoutParent.contains("/")) {
    return false;
    }
    return true;
    }


    Note the input check. (Effective Java, Second Edition, Item 38: Check parameters for validity)



    The multiple calls of length could look redundant and slow but premature optimization is not a good thing (see Effective Java, Second Edition, Item 55: Optimize judiciously). If you check the source of java.lang.String, you will find this:



    /** The count is the number of characters in the String. */
    private final int count;

    ...

    public int length() {
    return count;
    }


    String is immutable, so it's easy to cache its length and JDK does it for you.


    If this is the kind of input that you can expect from this community then I am 100% behind supporting it's promotion to a full fledged site! Thank you for such a complete answer.

    Point 4 is very close to how this code actually looks in my actual app, and for point 5, in my S3Asset class I have 2 methods (one that takes 2 strings and does the length check on the parent, and one that I referenced in my question). It's a premature optimization, but calling a string length function on each iteration when the value will always be the same just makes me feel bad. Now what will I do with all the extra nanoseconds I'm saving?

    Check the update, there is no extra nanosecond :-)

    It looks like the missing API call is "withDelimiter" which is clearly described on this page http://aws.amazon.com/releasenotes/213, if you update your answer I'll mark it as correct.

    Please give the KeyObjectType jar.

    It was just pseudo-code, the concrete type is `String`. I've updated my answer. Sorry for the inconvenience and thanks for the feedback!

    Great answer, I think it also can be improved by mentioned that at the ObjectListing level, the resulting list is truncated for more than 1000 acording to API Doc this https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html, The ObjectListing has a method objectListing.isTruncated() to indicate if there are more results, objectListing.getNextMarker() and objectListing.setNextMarker() are used to control the paging. A do-while() is suggested for the iteration, it would contain the "for (final S3ObjectSummary objectSummary: objectListing.getObjectSummaries())" loop.

    @le0diaz: Thanks! Could you write your comment as a separate answer? I'd upvote that too.

    @palacsint I think, as this is the best answer, just to edit and include the changes I'm suggesting, it would only add some lines of code to the point 4, you can consider this as reference: https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html . A new answer would not add any more significant to the existing,

    @le0diaz: I think we can have multiple good answers here, even from the same person: https://codereview.meta.stackexchange.com/questions/20/should-one-person-have-multiple-answers

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM

Tags used