Configure the Azure Data Lake Storage Gen 2 target
- Last UpdatedMay 21, 2024
- 3 minute read
Complete the procedure below to configure the Azure Data Lake Storage Gen 2 target.
-
Complete setting up your Azure Data Lake environment and gathering the information required to configure the target. For more information, see Tips to set up the Azure Data Lake environment.
-
Create the Azure Data Lake Storage Gen 2 target before you configure it. For more information, see Add a publish target.
-
Grant the user who will be connecting to Azure Data Lake Storage Gen 2 the following minimum permissions: In the Azure active directory, register new application access with client secret or use existing application access. In your storage account assign Storage Blob Data Owner role for this application access.
-
On the Administration page, click the Targets tab. Then select Azure Data Lake Storage Gen 2 in the Publish targets list.
-
Configure the Azure Data Lake Storage Gen 2 target with the following parameters:
Parameter
Description
Tenant ID
Identifier for your Azure Active directory account.
Application ID
Identifier for the Azure Data Lake Storage Gen 2 application that authenticates the PI Integrator for Business Analytics application with Azure Data Lake Storage Gen 2.
Client Secret Key
Key used to authenticate PI Integrator for Business Analytics with your AAD application.
Azure Storage Account Name
Azure account name that is authenticating to Azure services.
Data Storage Format
File format where data is stored. The default is Parquet.
-
Click Authenticate to verify that the provided credentials allow PI Integrator for Business Analytics to connect to Azure Data Lake Storage Gen 2.
If authentication is successful, a list of Data Lake containers appears in the Azure Container list.
-
Continue configuring the following parameters:
Parameter
Description
Azure Container
Azure Data Lake Storage Gen 2 Container where your data is stored.
ADLS Directory
Location on the Azure Data Lake Storage Gen 2 to which data is written.
Append Timestamp
When selected, a timestamp of the view's publish time is appended to the file name, and a new file is created each time the view is published. An appended sequence number is used at the end of the subsequent file names once the max rows/object limit has been crossed. If not selected, random GUID sequence is used in the file name and multiple publishing events may append data to the existing file until max rows/object limit has been reached. New GUID sequence is then generated for subsequent file.
Include Header
When selected, column names are added to the beginning of the table.
Field Delimiter
Character(s) that separate the data values in the Azure Data Lake Storage Gen 2 file. The default is a tabbed space.
Maximum Rows/Objects
Maximum number of objects in a file. Default: 100,000 rows. Allowed range 0 - 10,000,000.
Transfer Timeout
Data transfer timeout in seconds. Default: 900 (s). Allowed range 0 - 86,400 (1 day) (Optional)
Transfer Initial Size
Data transfer initial size in bytes. Default: 4,194,304 (4 MB). Allowed range 0 - 1,073,741,824 (1 GB). This value represents packet size, not entire file size. (Optional)
Transfer Maximum Size
Data transfer maximum size in bytes. Default: 4,194,304 (4 MB). Allowed range 0 - 1,073,741,824 (1 GB). This value represents packet size, not entire file size. (Optional)
Transfer Maximum Retries
Data transfer maximum retries. Default: 6. Allowed range 0 - 100. (Optional)
-
Click Verify ADL Writer to verify that PI Integrator for Business Analytics can write to the specified Azure Data Lake Storage Gen 2 location.
Note: If you have multiple containers in the Azure Container drop down, make sure the folder structure in the ADLS Directory matches the Azure Container you have selected.
-
Click Save Changes.
-
Give users access to the Azure Data Lake Storage Gen 2 target. For more information, see Grant access to targets.