Masks and wildcards for extracting data

Save PDF

Last UpdatedJan 03, 2025
2 minute read

You split a message into fields based on the position of the field, and you use masks and wildcards to extract the required data.

Characters that are used to specify masks:

Character	Matches
?	Any single character.
*	Zero or more characters.
#	Any single digit (0 - 9).
[character string]	Any single character in character string. Must be enclosed in square brackets.
[!character string]	Any single character not in character string. Must be enclosed in square brackets.
( )	Indicates the data to be extracted into the field.
\	To match characters that are used for filter tokens, for example, question marks, precede the character with a backslash.

To extract data based on starting and ending position, specify the range using the format Cn - Cm. For more flexibility, you can use masks and wildcards in conjunction with position specifiers.

Examples of extracting data from messages into fields.

Example	Description
FIELD(1) = C1 - C10	Extract the first ten characters from the input line.
FIELD(2) = C11 - C11(",")	Extract the field that starts at character 11 and ends before the next comma.
FIELD(3) = C11(",") - (",")	Extract the field that starts after the first comma after character 11 and ends before the next comma.
FIELD(4) = C31 - C41("[;,:]")	Extract the characters starting at position 31 up to (but not including) the first semi-colon, comma, or colon after position 41.
FIELD(5) = C51 - C51("[!0123456789]")	Extract characters starting at position 51 up to ' (but not including) the first non-numeric character after position 51.

For input that is formatted as orthogonal matrices of rows and columns in comma-separated values ( .csv ) files, the simplest way to extract and assign values to individual fields without specifying the numeric start and end points is to use the following structure for defining the matrix. In this example, the separator is the semicolon and the construct expects exactly three columns in the input file.

FIELD(1)=["(*);*;*"]
FIELD(2)=["*;(*);*"]
FIELD(3)=["*;*;(*)"]

The white space characters, space and tab, can be used as separators. For example:

FIELD(1)=["(*) * *"]
FIELD(2)=["* (*) *"]
FIELD(3)=["* * (*)"]

In cases where there are many commas in the . csv file, you can reduce the effort to define the matrix by using a diagonal matrix. For example:

FIELD(1)=["(*);*"]
FIELD(2)=["*;(*);*"]
FIELD(3)=["*;*;(*);*"]
FIELD(4)=["*;*;*;(*);*"]
...

The final asterisk in each field definition addresses all commas after the field that is extracted, which helps avoid errors caused by missed commas.

To extract a double-quoted field and strip the quotes, use a backslash to escape the quotes:

FIELD(4) = ["*,*,*\"(*)\""]

PI Connector for UFL

Masks and wildcards for extracting data

Table of Contents

Masks and wildcards for extracting data

Related Links