Click here to Skip to main content
15,846,038 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I need to extract text from PDF files stored in Azure Storage Blob. I can read the filenames, but I can't extract their content:

import os
import uuid
import sys
from import BlockBlobService, PublicAccess

# Create the BlockBlockService that the system uses to call the Blob service for the storage account.
block_blob_service = BlockBlobService(
    account_name='servicestorageblob', account_key='nxJHeYjKRM+k1JTGd9OSCDwnGhoDhJtabWH2iY/owttklUYv8LaGK8ZwYTQENC6fnGJT4BCNR6mkm8tK1fcNDA==')

# Create a container called 'quickstartblobs'.
container_name = 'dataset'

# Set the permission so the blobs are public.
    container_name, public_access=PublicAccess.Container)

# List the blobs in the container.
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
    print("\t Blob name: " +
    # Download the blob to a local file
    # Add 'DOWNLOAD' before the .txt extension so you can see both files in the data directory
    download_file_path = os.path.join(local_path, str.replace(local_file_name ,'.pdf',
    print("\nDownloading blob to \n\t" + download_file_path)
    with open(download_file_path, "wb") as download_file:

What I have tried:

I tried

Azure Storage samples using Python | Microsoft Docs[^]
Is there a way to process PDF files in Blob Storage without downloading them locally using Python? - Stack Overflow[^]
Quickstart: Azure Blob storage client library v2.1 for Python | Microsoft Docs[^]

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900