Flight client example for accessing a data product

Last updated: Dec 13, 2024
Flight client example for accessing a data product

After you have subscribed to a data product using the Flight service as the delivery method for one or more items, you can programmatically access the data using an Arrow client. Arrow libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R and Ruby. See Apache Arrow for instructions on installing the libraries for each language. This topic provides a Python example for accessing the data in a data product from an Arrow client.

When you subscribe to a data product and select delivery by the Flight service, you receive a Flight URL and Descriptor that are used to access the data product. You can download the Flight URL and Descriptor from the subscription tile that is located under My subscriptions.

The Flight URL points to the external route of the Flight service for your environment. The Descriptor contains an asset ID and a catalog ID used to connect to a data source to deliver the items in a data product.

Example for accessing a data product with a Python Flight client

Follow these steps to access a data product with a Flight client in Python:

  1. Import the required libraries

    Import the Flight Python libraries together with the request and json libraries which are used to make REST API requests.

    from pyarrow import flight
    import requests
    import json
    
  2. Define an authentication handler

    class TokenClientAuthHandler(flight.ClientAuthHandler):
        def __init__(self, token):
            super().__init__()
            strToken = str(token)
            self.token = strToken.encode('utf-8')
        def authenticate(self, outgoing, incoming):
            outgoing.write(self.token)
            self.token = incoming.read()
        def get_token(self):
            return self.token
    
  3. Authenticate with Data Product Hub by using the REST API

    The following example authenticates with Data Product Hub using the authentication API. See Authentication.

    readClient = flight.FlightClient(
        'grpc+tls://api.dataplatform.cloud.ibm.com:443',
        override_hostname='api.dataplatform.cloud.ibm.com',
        disable_server_verification=True)
    
    response = requests.post('https://iam.cloud.ibm.com/identity/token', {'grant_type':'urn:ibm:params:oauth:grant-type:apikey', 'apikey': API_KEY}).json()
    token = 'Bearer ' + response['access_token']
    readClient.authenticate(TokenClientAuthHandler(token), options=flight.FlightCallOptions(timeout=5.0))
    

    The variable is defined as follows:

    API_KEY is an API key generated for you in IBM Cloud in API keys.

  4. Initialize the Flight client

    flightDescriptor = flight.FlightDescriptor.for_command(json.dumps(DESCRIPTOR))
    flightInfo = readClient.get_flight_info(flightDescriptor)
    

    The variable is defined as follows:

    DESCRIPTOR is the Flight descriptor in Python copied from the subscription tile for the data product.

  5. Read the data from the table and load into Pandas

    for endpoint in flightInfo.endpoints:
        reader = readClient.do_get(endpoint.ticket)
    table = reader.read_all()
    

    Your data is stored in the table variable and you can now work with the data in your application. For example, you can load the table into a Pandas dataframe using table.to_pandas() to work with the data in Pandas.

    table.to_pandas()
    

Parent topic: Getting a data product