Most Recent S3 Objects

I have backup files in an S3 bucket with names like db/<dbname>.<timestamp>. I want the name of the most recent backup.

Procedure #

1$ aws s3api list-objects-v2 --bucket mybucketname --prefix db/dbname \
2  | jq -r '.Contents[-1].Key'
3db/dbname.2023-03-22T00:50:18+00:00

An alternative approach might use the standard s3 ls command along with standard shell tools:

1$ aws s3 ls s3://mybucketname/db/ | grep "dbname." | awk '{print $4}' \
2  | sort -r | head -n 1
3db/dbname.2023-03-22T00:50:18+00:00

Comments #

The S3 ListObjectsV2 API is used to return all objects matching the prefix. The API ensures they will be sorted by their name in ascending order. Along with the naming convention, this means that the lat item listed will be the most recent backup.

Even though the API returns results in batches of 1000, the aws cli will automatically paginate through the results and combine them. A jq filter is used to filter the response and return the last element in the array.

An alternative procedure could make use of the --query parameter: this is a JMESpath filter and works similarly to jq:

1$ aws s3api list-objects-v2 --bucket mybucketname --prefix db/dbname \
2  --query='Contents | [-1].Key' \
3"db/dbname.2023-03-22T00:50:18+00:00"

Note that the query returns a quoted string and some additional massaging may be appropriate. One may be tempted to use the --text parameter to avoid this, but it has the undesirable side effect of negating the combination of multiple pages of output: i.e. the filter is applied to each result page separately.

Fields other than Key can be utilized in JSON queries. The ListObjectsV2 API response looks like:

 1{
 2  "Contents": [
 3    {
 4      "Key": "db/dbname2.2023-03-22T01:57:02",
 5      "LastModified": "2023-03-22T01:57:02.000Z",
 6      "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
 7      "Size": 1000,
 8      "StorageClass": "STANDARD"
 9    },
10    ...
11  ]
12}

If the objects are not as fortunately named as this example, one could use the LastModified field with a different jq filter to list most recently modified object:

1$ ... | jq -r '.Contents | sort_by(.LastModified)[-1].Key'

Or return the largest item in bucket:

1$ ... | jq -r '.Contents | sort_by(.Size)[-1].Key'

Regardless of response processing, listing S3 objects this way gets less and less efficient as the number of matches increases. A single response with 1000 matches can take multiple seconds to return to the client. Depending on the use case, maintaining a database of objects (perhaps based on S3 Inventory) or managing other aspects of the objects (like lifecycles or naming conventions) may be more appropriate.

References #

AWS CLI: s3api list-object-v2
AWS S3 API: ListObjectsV2
jq Manual: sort, sort_by
JMESpath - a JSON query lanuage

#aws #json