Google Cloud Storage List Blob objects with specific file name

15.4k views Asked by At

With the help of google.cloud.storage and list_blobs I can get the list of files from the specific bucket. But I want to filter(name*.ext) the exact files from the bucket. I was not able to find the exact solution.

For example: buket=data, prefix_folder_name=sales, with in prefix folder I have list of invoices with metadata. I want to get the specific invoices and its metadata(name*.csv & name.*.meta). Also, if I loop the entire all_blobs of the particular folder to get the selected files then it will be huge volume of data and it may affecting performance.

It would be good if someone one help me with this solution.

bucket = gcs_client.get_bucket(buket)
all_blobs = bucket.list_blobs(prefix=prefix_folder_name)
for blob in all_blobs: 
  print(blob.name)
6

There are 6 answers

0
Mike Schwartz On

You can filter for a prefix, but to filter more specifically (e.g., for objects ending with a given name extension) you have to implement client-side filtering logic. That's what gsutil does when you do a command like:

gsutil ls gs://your-bucket/abc*.txt
0
gfreeman On

It doesn't allow you to filter, but you can use use the fields parameter to just return the name of the objects, limiting the amount of data returned and making it easy to filter.

1
ahmetpergamum On

According to google-cloud-storage documentation Blobs are objects that have name attribute, so you can filter them by this attribute.

from google.cloud import storage

# storage_client = gcs client
storage_client = storage.Client()

# bucket_name = "your-bucket-name"
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name)

# filter_dir = "filter-string"
[blob.name for blob in blobs if filter_dir in blob.name ]
0
Hemendra On

You can use the following considering the filters as name and .ext for the files:

all_blobs = bucket.list_blobs()    
fileList = [file.name for file in all_blobs if '.ext' in file.name and 'name' in file.name]

for file in fileList: 
  print(file)

Here name will be the fileName filter and .ext will be your extension filter.

0
Michael Vehrs On

You can do this with the match_glob Parameter, e.g.

bucket.list_blobs(match_glob='*.ext')
0
LamerLink On

Expanding on @michael-vehrs answer, if you need to look at all directory levels, be sure to prepend **/.

from google.cloud import storage
bucket = storage.Client().bucket("bucket-name")
bucket.list_blobs(match_glob="**/*.ext")