Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/delta-io/delta-sharing/llms.txt

Use this file to discover all available pages before exploring further.

Understanding Table Paths

Before accessing shared data, you need to understand how Delta Sharing table paths work. A table path consists of two parts joined by #:
<profile-file-path>#<share-name>.<schema-name>.<table-name>
1

Profile file path

The path to your credential file (can be local or remote):
  • Local: /path/to/profile.share
  • S3: s3a://my-bucket/profile.share
  • ADLS: abfss://container@account.dfs.core.windows.net/profile.share
2

Fully qualified table name

The three-part identifier for the shared table: <share>.<schema>.<table>
The profile file path must use Hadoop FileSystem-compatible URLs. Use s3a:// (not s3://) for S3 paths.

Reading Shared Tables

Once you have your table path, reading shared data is simple in any language.
# Define the table path
table_path = "<profile-file-path>#<share-name>.<schema-name>.<table-name>"

# Load the shared table as a DataFrame
df = spark.read.format("deltaSharing").load(table_path)

# Display the data
df.show()

# Perform operations
df.filter(df.age > 21).select("name", "age").show()

Example with Real Path

# Using the example Delta Sharing server
table_path = "/tmp/open-datasets.share#delta_sharing.default.boston-housing"

df = spark.read.format("deltaSharing").load(table_path)
print(f"Rows: {df.count()}, Columns: {len(df.columns)}")
df.printSchema()

Working with DataFrames

Once loaded, shared tables behave like regular Spark DataFrames:
# Standard DataFrame operations work seamlessly
df = spark.read.format("deltaSharing").load(table_path)

# Aggregations
df.groupBy("category").count().show()

# Joins with other DataFrames
local_df = spark.read.parquet("/path/to/local/data")
joined = df.join(local_df, "id")

# Write results to storage
df.filter(df.status == "active").write.parquet("/output/path")

Common Patterns

# Select only needed columns for better performance
df = spark.read.format("deltaSharing").load(table_path)
result = df.select("id", "name", "timestamp")
# Apply filters to reduce data transfer
df = spark.read.format("deltaSharing").load(table_path)
filtered = df.filter((df.date >= "2024-01-01") & (df.status == "active"))
# Cache the DataFrame if you'll use it multiple times
df = spark.read.format("deltaSharing").load(table_path)
df.cache()

# Now multiple operations won't re-fetch the data
df.count()
df.groupBy("category").count().show()
# Create a temporary view for SQL access
df = spark.read.format("deltaSharing").load(table_path)
df.createOrReplaceTempView("shared_data")

# Query with SQL
spark.sql("SELECT * FROM shared_data WHERE age > 21").show()

Profile File Locations

Profile files can be stored in various locations:
table_path = "/home/user/credentials/profile.share#myshare.myschema.mytable"
df = spark.read.format("deltaSharing").load(table_path)

Error Handling

Common Issues:
  • Authentication errors: Verify your profile file credentials are correct
  • Table not found: Check the share, schema, and table names
  • Invalid path: Ensure you’re using # to separate the profile path from the table name
  • Network issues: Confirm connectivity to the Delta Sharing server
# Example error handling
try:
    df = spark.read.format("deltaSharing").load(table_path)
    df.show()
except Exception as e:
    print(f"Error accessing shared table: {e}")
    # Check profile file exists and is valid
    # Verify table path format
    # Confirm network connectivity

Next Steps

SQL Usage

Learn advanced SQL patterns for Delta Sharing

Streaming

Stream data from shared tables

Change Data Feed

Query table changes over time