Excel

Save PDF

Last UpdatedOct 31, 2025
11 minute read

You can extract the Excel-specific data (.xls and .xlsx) from the Gateway by defining specific settings.

To extract the specific settings for Excel data:

Click Excel.
Click the Input Source drop-down list.

You can select either FileSystem or S3 Bucket from the list.

For FileSystem Input Source

Input Path: To process all relevant files in File System:
1. Browse and select the user input file in the correct format, that is, .xls or .xlsx.
2. Select an input file and connect to it, then select the sheet and import, enter Sheet Structure Details (see the Requirements for the Thermo Library Generation Tool section).
3. Select the relevant columns to extract.
4. Generate the equivalent query and save the configuration.

The Extractor configuration is saved to an XML file with a data source section, for example:

<Datasources>
<Excel hasHeader="false">
<Input source="FileSystem">
<FileSystem>
<InputPath>C:\Users\User_Name\Desktop\SubfolderTestProject\Input\Excel\AttributeList.xls</InputPath>
</FileSystem>
</Input>
<Sheet>Sheet1</Sheet>
<Query />
<Details firstPopulatedRow="1" headingsRow="0" firstDataRow="1" lastDataRow="-1" />
</Excel>
</Datasources>

If you want to apply the same configuration for more than one file in a folder, then you must first select one of the files, set the extract configuration as above and save the settings.

Click the input folder icon and enter the path where the input files are present.

In this folder mode, you cannot modify the Extract setting for columns as this is driven by a specific file’s data. However, you can still modify the query. Save the settings for the selected folder.

<Datasources>
<Excel hasHeader="false">
<Input source="FileSystem">
<FileSystem>
<InputPath>C:\Users\<user_name>\Desktop\SubfolderTestProject\Input\Excel </InputPath>
</FileSystem>
</Input>
<Sheet>Sheet1</Sheet>
<Query />
<Details firstPopulatedRow="1" headingsRow="0" firstDataRow="1" lastDataRow="-1" />
</Excel>
</Datasources>

If you enable the Incremental Scan while processing a folder, you must select the Datetime Column when configuring for a single file. You cannot edit the selected file. The Last Scanned On date can be updated but it will be applied to all of the files in the folder. See the Requirements for the Thermo Library Generation Tool section.

Click Run to process all the files relevant to the extraction settings in the selected folder and its sub-folders, based on the extractor configuration that has been saved.

For S3 Input Source

S3 Details: See Accessing an AWS S3 Bucket for more information.

To process all the relevant files in an S3 bucket:

Fill all the mandatory fields with the required information to access the bucket, but you must not set a value for the Object Key.

The first file in the bucket will be downloaded into a temporary location displayed in the Input Path.
Click Connect to start configuring the file format parameters such as Sheet, Sheet Structure Details and Column Filter (see the Requirements for the Thermo Library Generation Tool section).
Save this configuration as it will be applied to each of the files in the bucket after you click the Run button.

Note: Each download from an S3 bucket is into a new temporary folder to avoid processing previously downloaded files. Hence, its location is not saved in the configuration.

<Datasources>
<Excel hasHeader="false">
<Input source="S3">
<S3>
<Authentication instance="false">
<CredentialFile path="C:\Users\User_name\.aws\credentials" profileName="3pg-dev" />
</Authentication>
<Region>eu-west-1</Region>
<BucketName>1ddatainput</BucketName>
<ObjectKey>Input/Excel/EqList.xls</ObjectKey>
</S3>
<FileSystem>
<InputPath>C:\Users\User_name\Desktop\test\Input\Excel\AttributeList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="*ALL_WORKSHEETS*" excludePattern="" />
<Query />
<Details firstPopulatedRow="1" headingsRow="0" firstDataRow="1" lastDataRow="-1" />
</Excel>

</Datasources>

Download: Click Download to validate the AWS Credentials and download the specified file from the S3 bucket to the input location of the Project folder. File System Details will be filled with the downloaded location and file details.
Exclude Hidden Data: Enable this check box to exclude Hidden Data from being extracted from the input Excel file (only .xlsx files). The technology used to exclude hidden data is incompatible with extracting the data sheet by sheet, so Import Schema, Column Filter and Query generator sections are disabled when this checkbox is enabled. The default value is to disable this function, in which case any hidden data will be extracted with the unhidden data.

Select Sheet

This section is common for both File System and S3 input file sources.

You can process either single or multiple worksheets using a single extract configuration using this feature.

Using Pattern mapping option, you can include/exclude patterns from the Extract section. The Extractor uses these patterns to select sheets from the available set of sheets in the excel file. This subset of sheets is then used to further extract data. For more information about processing multiple worksheets, see Requirements for the Thermo Library Generation Tool.

Sheet Structure Details

This section is common for both File System and S3 input file sources.

Define the following elements of the Source data structure:

Has Header: Select this option if the Excel file contains a header.
Connect: Click this button to open the Excel file. After this connection is established, the Select Sheet drop-down box is filled with the sheet name and Import Schema box is also populated. If that input is new and is not saved in the configuration file, then by default the first sheet is displayed. Otherwise, it populates the last selected sheet.

Select Sheet: Select the relevant sheet. Only one sheet can be extracted at a time. Click Import Schema to fetch the list of rows in that sheet. These rows are displayed in the Column Filter box, which you can select to generate a query. The sheet structures are described in the following table:

Settings	Description
First Populated Row	First row in the worksheet having any type of content.
Heading Row	The row that contains the column headings. If omitted, it is assumed that there is no Heading Row. If heading row is set to "0", it is assumed that there is no heading row.
First Data Row	The row that contains the first of the data rows. If omitted, it is assumed that the first data row is immediately after the 'HeadingRow' or, if that is also omitted, the first row in the file.
Last Data Row	The last row that contains a valid data row. If the last data row is set to "-1", it is assumed that all rows need to be processed.

ODBC ignores the rows which are empty, if those are present before any data present in the Excel file. Row indices of the table that are extracted by ODBC from the Excel file may not match with the row indices of the Excel file. So the First Populated Row is provided to determine the correct Heading Row, First Data Row and Last Data Row indices.

These sheet structure settings help to extract the data correctly when:

The first populated row in the Excel worksheet does not contain headings.
Empty rows are between heading row and first data row.
Text below the last data row is not considered as data.

Note: You cannot process an Excel spreadsheet through the Gateway if its name contains one or more single quotation marks. For example, a sheet name Pump's Detail is invalid as per the ODBC standards. However, you can still process spreadsheets with other special characters in their name.

Column Filter: Specify the column names and provides the filter to select particular column data for processing. Click Select to select all the columns.
Query Generator: Type the query in this box.
- Generate Query: Click this box to generate the query to fetch the data. Query is based on your selection criteria.
- Edit Query: Click this box to edit the query.
  
  Example of First Populated Row:
  
  Scenario 1:
  
  The XLS input file has a structure where general content is provided in the first row without any tabular data content.

When you click Import Schema, then the Heading rows (ID, Class, PM Summary) will not get extracted as expected. Because ODBC searches only the Data Row before the Heading Row. Hence, Import Schema option, which helps extract the schema of the sheet, will extract columns as F1, F2, F3 in the Column Filter. Hence, to process such files, you must select the required Heading Rows as F1, F2, F3 and so on and write the query manually, for example, SELECT [F1],[F2] FROM [Sheet1$] in the Query Generator.

Also you must specify the sheet structure details as shown in the above scenario:
- First Populated Row = 1
  - Heading Row = 6
  - First Data Row = 7
  - Last Data Row = -1 (as it applies to any extent of data row.)
    
    When you execute the query, then the Object ID within the specified limit will be processed.
    
    Scenario 2:
    
    If the XLS input file has the following structure, having empty content in the first few rows:

Then in this case Import Schema can extract the Heading Rows successfully and these rows will be displayed in the Column Filter. Because the rows before the Heading Row are empty, ODBC skips these rows while processing. ODBC returns 5th, 6th, 7th rows as 1st, 2nd, 3rd as actual first 4 rows are empty.

Also you must specify the Sheet Structure Details as shown in the above scenario:
- First Populated Row = 5
  - Heading Row = 5
  - First Data Row = 6
  - Last Data Row = -1 (as it applies to any extent of data row.)
    
    When you execute the query, then the Object ID within the specified limit will be processed.
    
    Note: The Import Schema is not dependent on Sheet Structure Details. You can populate the Heading Row as per the First Data Row and Last Data Row.

Reveiw from here: https://dev.azure.com/AVEVA-VSTS/Interoperability/_workitems/edit/4286351/

Note: If an Excel file has column names that look the same except for differences in upper/lower case, all of them will be extracted. To avoid confusion, Excel automatically renames the extra ones (for example: Tag, TAG1, and taG2).

Processing Multiple Worksheets

This feature enables you to process multiple worksheets using a single extract configuration.

Using the Pattern mapping option, you can include/exclude patterns from the Extract section. The Extractor uses these patterns to select sheets from the available set of sheets in the Excel file. This subset of sheets is then used to further extract data. You can have the following four scenarios to process the multiple worksheets:
- Scenario 1: When you select a new Excel file (Default scenario)
  
  When you select a new Excel file all the old data in the user interface screen stored previously is cleared. When you click Connect, the All check box is selected by default along with all the sheets in the All Sheets combo box. The Include Pattern is automatically generated and displays *ALL_WORKSHEETS* and the Exclude Pattern is empty by default. When you select more than one sheet then the Import Schema button is disabled along with the Column Filter and Query Generator sections. You can only perform Save Settings or Discard Settings and cannot generate a query.
- Scenario 2: When you select all the sheets
  
  When you select the All check box or individually select all the sheets present in the combo box, it will automatically select the All check box and set the Include Pattern to *ALL_WORKSHEETS* and the Exclude Pattern to empty.
- Scenario 3: When you open a project containing "Include Pattern" and "Exclude Pattern" in the Configuration
  
  You can also open a project with previously defined Include Pattern and Exclude Pattern options that are saved in the configuration file. These values are used to populate the Sheets comb box on screen. The sheets will also be selected according to Include and Exclude Pattern.
- Scenario 4: When you have set IncludePattern ="*ALL_WORKSHEETS*" and ExcludePattern="^xyz"
  
  You have set IncludePattern ="*ALL_WORKSHEETS*" and ExcludePattern="^xyz" (as a random pattern). If no sheets match ExcludePattern, then it will display the sheets according to IncludePattern. The following image displays sheets as per the IncludePattern:

Note: If the spreadsheet has named ranges defined, for example #PrintArea, these can be selected as if a sheet, and only the data in the selected named range will be extracted.

Configuration Attributes in Excel Configuration:

includePattern *required*: Describes the pattern used to match the sheets that will be included in the extraction process.
- valid inputs: Any valid regular expression pattern, as well as *ALL_WORKSHEETS* which extracts every drawing present in the sheet.
excludePattern *optional*: Describes the pattern used to match the sheets that will be excluded from the extraction process.
- valid inputs: Any valid regular expression pattern.

If the selected Excel document EquipmentList.xls contains sheets - Sheet1, Sheet2, Sheet3, Test1, Test2, Test3, then the following are the scenarios of extraction:

To Extract single sheet, you can use the following configuration:

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="^Test1$"/>
</Excel>

^Test1$ is the exact pattern which will match the name of exact sheet.
To Extract all sheets, you can use the following configuration:

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="*ALL_WORKSHEETS*"/>
</Excel>

"*ALL_WORKSHEETS*" indicates that the user wants to extract every sheet present in the Excel document.
Extract subset with include pattern

You can use the pattern that matches the subset of sheets that you want to extract. For example, if you want to extract all the sheets which start with the names Test [Test1, Test2, Test3], then use an includePattern="^Test":

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="^Test"/>
</Excel>
Extract subset with exact matching pattern

You can extract the sheets by providing exact matching pattern for includePattern. For example, if you want to extract data from sheets [Test1, Test2, Sheet1], then use includePattern="^Test1$|^Test2$|^Sheet1$":

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="^Test1$|^Test2$|^Sheet1$"/>
</Excel>

This will match the exact sheet names with all the desired sheets in the Excel document.
Extract subset with exclude pattern

You can use exclude pattern to extract subset when you want to discard some specific sheet(s) from the extraction process. Any sheet name that matches the pattern present in exclude pattern will be removed from the subset.

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Sheet includePattern="*ALL_WORKSHEETS*" excludePattern="^Sheet2$"/>
</Excel>

includePattern="*ALL_WORKSHEETS*" excludePattern="^Sheet2$"

This configuration describes a pattern that extracts everything from the Excel document excluding Sheet2 from the process.

Customized extraction (extracting columns selectively) is not supported from a single configuration element. To achieve this, you can use the following configuration:

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Query>SELECT [Column1],[Column2] from [Sheet1$]</Query>
</Excel>

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Query>SELECT [Column1],[Column2] from [Sheet2$]</Query>
</Excel>

<Excel hasHeader="true">
<Input source="FileSystem">
<FileSystem>
<InputPath>D:\Tabular\Input\EquipmentList.xls</InputPath>
</FileSystem>
</Input>
<Query>SELECT [Column1],[Column2] from [Sheet3$]</Query>
</Excel>

This configuration allows you to only extract selected columns from the source sheet.

Incremental Scan
- Required: If you select this option, the Gateway keeps track of the last scan date by storing it in a configuration file in the same location where the project configuration file exists. When the Gateway is used for the first time, a text file is created. For each subsequent run, it reads the text file and uses the last scan date to select only those rows that have been updated since that date. This function relies on one of the columns in the file to store the date-time when the row was created or updated.
- DateTime Column: Use this option to build the queries for a specific date-time.
- Last Scanned On: Use this option to set the last scan date-time. By default, this field is automatically filled with the latest scan.
Discard Settings: If you do not require those settings any longer, select this box to discard the settings.
Save Settings: Click this to save the Extract Excel settings options.

AVEVA™ Gateway for 1D Data

Excel

Table of Contents

Excel

For FileSystem Input Source

For S3 Input Source

Select Sheet

Sheet Structure Details

Define the following elements of the Source data structure:

In This Topic

Related Links