Use of the tunable parameters in the UVCONFIG file

October 5, 2012 Leave a comment

Question

How do I use the tunable parameters in the UVCONFIG file?

Answer

The most commonly used parameters in the UVCONFIG file are as follows:

MFILES

This parameter defines the size of the server engine (DSEngine) rotating file pool. This is a per process pool for files such as sequential files that are opened by the DataStage server runtime. It does not include files opened directly at OS level by the parallel engine (PXEngine running osh).

The server engine will logically open and close files at the DataStage application level and physically close them at the OS level when the need arises.

Increase this value if DataStage jobs use a lot of files. Generally, a value of around 250 is suitable. If the value is set too low, then performance issues may occur, as the server engine will make more calls to open and close at the physical OS level in order to map the logical pool to the physical pool.

NOTE: The OS parameter of nofiles needs to be set higher than MFILES. Ideally, it would be recommended that nofiles be at least 512.

This will allow the DataStage process to open up to 512 – (MFILES + 8 ) files.

On most UNIX systems, the proc file system can be used to monitor the file handles opened by a given process; for example:

ps -ef|grep dsrpcd

root     23978     1  0 Jul08 ?        00:00:00 /opt/ds753/Ascential/DataStage/DSEngine/bin/accdsrpcd

ls -l /proc/23978/fd

lrwx------  1 root dstage 64 Sep 25 08:24 0 -> /dev/pts/1 (deleted)
l-wx------  1 root dstage 64 Sep 25 08:24 1 -> /dev/null
l-wx------  1 root dstage 64 Sep 25 08:24 2 -> /dev/null
lrwx------  1 root dstage 64 Sep 25 08:24 3 -> socket:[12928306]

So the dsrpcd process (23978) has four files open.

T30FILE

This parameter determines the maximum number of dynamic hash files that can be opened system-wide on the DataStage system.

If this value is too low, expect to find an error message similar to ‘T30FILE table full’.

The following engine command, executed from $DSHOME, shows the number of dynamic files in use:

echo "`bin/smat -d|wc -l` - 3"|bc

Use this command to assist with tuning the T30FILE parameter.

Technote has already been issued on this topic (www.ibm.com/support/docview.wss?uid=swg21390117).

Every running DataStage job requires at least 3 slots in this table. (RT_CONFIG, RT_LOG, RT_STATUS). Note, however, that multi-instance jobs share slots for these files, because although each job run instance creates a separate file handle, this just increments a usage counter in the table if the file is already open to another instance.

Note that on AIX the T30FILE value should not be set higher than the system setting ulimit -n.

GLTABSZ

This parameter defines the size of a row in the group lock table.

Tune this value if the number of group locks in a given slot is getting close to the value defined.

Use the LIST.READU EVERY command from the server engine shell to assist with monitoring this value. LIST.READU lists the active file and record locks; the EVERY keyword lists the active group locks in addition.

For example, with a Designer client and a Director client both logged in to a project named “dstage0”:

  • Device A number that identifies the logical partition of the disk where the file system is located.
  • Inode A number that identifies the file that is being accessed.
  • Netnode A number that identifies the host from which the lock originated. 0 indicates a lock on the local machine, which will usually be the case for DataStage. If other than 0, then on Unix it is the last part of the TCP/IP host number specified in the /etc/hosts file; on Windows it is either the last part of the TCP/IP host number or the LAN Manager node name, depending on the network transport used by the connection.
  • Userno The phantom process that set the lock.
  • Pid A number that identifies the controlling process.
  • Item-ID The record ID of the locked record.
  • Lmode The number assigned to the lock, and a code that describes its use.

When the report describes file locks (not shown here), it contains the following additional information:

  • Lmode codes described shared or exclusive file locks, rarely seen in normal DataStage use:

    FS, IX, CR Shared file locks.
    FX, XU, XR Exclusive file locks.

When the report describes group locks, it contains the following additional information:Lmode codes are:

EX Exclusive lock
SH Shared lock.
RD Read lock.
WR Write lock.
IN System information lock

  • G-Address Logical disk address of group, or its offset in bytes from the start of the file, in hex.
  • Record Locks The number of locked records in the group.
  • Group RD Number of readers in the group.
  • Group SH Number of shared group locks.
  • Group EX Number of exclusive group locks.

When the report describes record locks, it contains the following additional information:Lmode codes are:

RL Shared record lock
RU Update record lock

RLTABSZ

This parameter defines the size of a row in the record lock table.

From a DataStage job point of view, this value affects the number of concurrent DataStage jobs that can be executed, and the number of DataStage Clients that can connect.

Use the LIST.READU command from the DSEngine shell to monitor the number of record locks in a given slot (see previous section for use of the EVERY keyword).

For example, with one Director client logged in to a project named “dstage0”, and 2 instances of a job in that project that are running:

In the above report, Item-ID=RT_CONFIG456 identifies that the running job is an instance of job number 456, whose compiled job file is locked while the instance is running so that, for example, it cannot be re-compiled in that time. A job’s number within its project can be seen in the Director job status view – Detail dialog – for a particular job.

The unnamed column in-between UserNo and Lmode relates to a row number within the Record Lock table. Each row can hold RLTABSZ locks. In the above example, 3 slots out of 75 (Default value for RLTABSZ) have been used for row 62. When the number of entries for a given row gets close to the RLTABSZ value, it is time to consider re-tuning the system.

Jobs can fail to start, or generate -14 errors, if RLTABSZ is being reached.

DataStage Clients may see an error message similar to ‘DataStage Project locked by Administrator’ when attempting to connect. Note that the error message can be misleading – it means in this case that a lock cannot be acquired because the lock table is full, and not because another user already has the lock.

MAXRLOCK

This should always be set to the value of RLTABSZ – 1.

Each DSD.RUN process takes a record lock on a key name <project>&!DS.ADMIN!& of the UV.ACCOUNT file in $DSHOME (as seen in the examples above). Each DataStage client connection (for example, Designer, Director, Administrator, dsjob command) takes this record lock as well. This is the mechanism by which DataStage determines whether operations such as project deletion are safe, operations cannot proceed while a project lock is held by any process.

MAXRLOCK needs to be set to accommodate the maximum # of jobs and sequences plus client connections that will be used at any given time. And RLTABSZ needs to be set to MAXRLOCK + 1.

Keep in mind that changing RLTABSZ greatly increases the amount of memory needed by the disk shared memory segment.

Current Recommended Settings

Customer Support has reported in the past that using settings of 130/130/129 (for RLTABSZ/GLTABSZ/MAXRLOCK, respectively) work successfully on most customer installations. There have been reports of high-end customers using settings of 300/300/299, so this is environment specific.

If sequencers or multi-instance jobs are used, start with the recommended settings of 130/130/129, and increase to 300/300/299 if necessary.

Prior to DataStage v8.5 the following settings were pre-defined:

MFILES = 150

T30FILE = 200

GLTABSZ = 75

RLTABSZ = 75

MAXRLOCK = 74 (that is, 75-1)

DataStage v8.5 has the following settings pre-defined:

MFILES = 150

T30FILE = 512

GLTABSZ = 75

RLTABSZ = 150

MAXRLOCK = 149 (that is, 150-1)

However, note that these are the lowest suggested values to accommodate all system configurations, so tuning of these values is often necessary.

DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF

These are the shared memory address offset values for each of the four DataStage shared memory segments (Disk, Printer, Catalog, NLS). Depending upon the platform, PMEMOFF, CMEMOFF & NMEMOFF will need to be increased to allow for a large disk shared memory to be used.

Where these values are set to 0x0 (on AIX for example), the OS takes care of managing these offsets. Otherwise, the PMEMOFF – DMEMOFF = largest disk shared memory segment size.

Additionally, on Solaris for example, these values will be increased to allow for a greater heap size for the running DataStage job.

Note that when running the shmtest utility, great care must be taken with interpreting its output. The utility tests the availability of memory that it can allocate at the time it runs, and this will be affected both by the current uvconfig settings, how much shared memory is already in use, and other activity on the machine at the time.

InfoSphere DataStage parallel jobs: Understanding the content of the APT_DUMP_SCORE report

September 20, 2012 Leave a comment

Question

How do I read and interprete the results from the APT_DUMP_SCORE report?

Answer

The APT_DUMP_SCORE report documents activity within the InfoSphere Information Server parallel engine. The parallel engine is a runtime engine that uses a combination of interpreted and compiled code to extract, transformation and load of data. The InfoSphere DataStage Designer enables you to specify the data flow and processing steps in a job, without having to specify the degree and nature of parallel processing which happens during job run time.

A configuration file (APT_CONFIG_FILE) specifies the nature and amount of parallelism, and the specific resources that are used to run a job. When a job is to be executed, the data flow information in the compiled job is combined with the configuration file to produce a detailed execution plan called the score. The score is useful in analyzing job performance and diagnosing problems in jobs. If the user sets the environment variable APT_DUMP_SCORE, a text representation of the score (a report) is written to the job’s log.

The job score that populates the report includes information about all operators in the job, including operators that are automatically inserted by the Designer client. The score shows how many nodes each operator runs on, and the anticipated flow of the job as it goes from operator to operator through internal data sets.

To see the score in the job log, enable APT_DUMP_SCORE (by defining it and supplying a value) either on the job level, project level, or the system level. Enabling APT_DUMP_SCORE at the system level results in all parallel jobs providing a dump score report, and is ideal for development and quality assurance environments.

This document provides a start in understanding how your job works, and further analyzing job issues.

The following score example is a fairly small job. When you enable APT_DUMP_SCORE and then run a job, you might typically see in the log the following text:

main_program: This step has 10 datasets:
ds0: {op0[1p] (sequential PacifBaseMCES)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
[pp] eSame=>eCollectAny
op2[4p] (parallel RemDups)}
ds2: {op2[4p] (parallel RemDups)
[pp] eSame=>eCollectAny
op6[4p] (parallel buffer(0))}
ds3: {op3[1p] (sequential PacifGalaxyMember)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)}
ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
eOther(APT_HashPartitioner { key={ value=MBR_SYS_ID }
})#>eCollectAny
op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)}
ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds6: {op6[4p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds7: {op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)
[pp] eAny=>eCollectAny
op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)}
ds8: {op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
eSame=>eCollectAny
op9[4p] (parallel buffer(1))}
ds9: {op9[4p] (parallel buffer(1))
>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
subArgs={ asc }
}
})
op10[1p] (sequential APT_RealFileExportOperator in HP37_OvaWestmember_extract_dat)}
It has 11 operators:
op0[1p] {(sequential PacifBaseMCES)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel RemDups)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[1p] {(sequential PacifGalaxyMember)
on nodes (
node2[op3,p0]
)}
op4[4p] {(parallel IndvIdJoin.toIndvIdJoin_Sort)
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel buffer(0))
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)
on nodes (
node1[op7,p0]
node2[op7,p1]
node3[op7,p2]
node4[op7,p3]
)}
op8[4p] {(parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
on nodes (
node1[op8,p0]
node2[op8,p1]
node3[op8,p2]
node4[op8,p3]
)}
op9[4p] {(parallel buffer(1))
on nodes (
node1[op9,p0]
node2[op9,p1]
node3[op9,p2]
node4[op9,p3]
)}
op10[1p] {(sequential APT_RealFileExportOperator in HP37_OvaWestmember_extract_dat)
on nodes (
node2[op10,p0]
)}
It runs 35 processes on 4 nodes.

The dump score contains two sections — the data sets (DS) and the operators (OP).
Data sets – The data sets that are listed in the score are the same type of data sets that you create with the Data Set stage — in this context, they are temporary memory and/or disk storage during the job’s run.
Operators – Operators are individual parallel engine stages that you might see on the user interface.

In a typical job flow, operators are end-points, and data sets are the links between the operators. (An exception is when data sets are used to actually output to a file.)

Each and every link on the job design is potentially a data set. However, unlike the data set stage which you put in your resource disk group by using the specified node pool within the job’s configuration file (APT_CONFIG_FILE), these data sets are in memory. These temporary data sets are only placed in the scratch disk space when an imposed limit is reached. A limit can be imposed due to environment settings, or due to physical memory limitations.

Each operator listed in the score spawns a number of processes that are dependent on:

  • the job’s established configuration file (APT_CONFIG_FILE)
  • constrained by the node pool settings
  • the operator configuration in the parallel engine code
  • Several environment variables, such as APT_DISABLE_COMBINATION, being set/unset.

First, let us focus on the operators, which are listed after the data sets in the score:

op0[1p] {(sequential PacifBaseMCES)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}

In the proceeding example, the two operators are: op0 and op1. The operator name is prefixed with the code name “op” and appended with an incremental numeric value starting with zero (0). Next to the operator name, there is an enclosed bracket with a value that is followed by the letter “p”, for example, “[1p]”. The value indicates the number of partitions given to that operator by the engine. For the first operator, only one (1) partition is provided, and the second operator is given four (4) partitions.

Within the curly brackets, the execution mode (“parallel” or “sequential”) and the name of that operator is provided. The operator name is based on the name shown on the parallel canvas in the Designer client. The operator name is not the same as the operator type.

In the proceeding example, the first operator is listed as “PacifBaseMCES” and is the stage name in its entirety. However, the second operator, is listed as “remDups.IndvIDs_in_Sort”. The stage name”IndvIDs” is renamed to indicate that the sort process triggered by the remove duplicate stage occured.

Following each operator name are the specific nodes that the operators are tagged to run on. In the proceeding example, node1 is for the first operator, and node1, node2, node3, and node4 are for the second operator. The name of nodes are defined in your configuration file (APT_CONFIG_FILE).

Now let us focus on the data sets:

ds0: {op0[1p] (sequential PacifBaseMCES)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
[pp] eSame=>eCollectAny
op2[4p] (parallel RemDups)}

The name of the data set is provided first. Within the curly brackets, there are three stages:

  • the source of the data set – operator 0, sequential PacifBaseMCES
  • the activity of the data set – operator 1, parallel RemDups.IndvIDs_in_Sort
  • the target of the data set – operator 2, parallel RemDups

The source and target are usually operators, although you might see a specific file name provided that indicates that the operator is referencing and reading from a physical data set.

The first data set, ds0, partitions the data from the first operator (“op0” running in 1 partition). The data set uses APT_ModulusPartitioner (sometimes referred to as modulus) to partition. (Note: APT means Advanced Parallel Technology.) The modulus partitioning is using the key field MBR_SYS_ID in this scenario. The partitioning data is being sent to the second operator (op1 running in 4 partitions). Logically, this means that the data is being partitioned in 4 partitions using the modulus method.

The second data set, “ds1”, reads from the second operator (“op1” running 4 partitioning). The second data set uses the eSame method to partition and sends the data over to the third operator (“op2” running 4 partitioning). One special note: the value “[pp]” means preserved partitioning. Preserved partitioning is an option that is usually set by default in the GUI. If data must be repartitioned, the “[pp]” flag is overridden and a warning message is triggered.

In the example for the first dataset, you see “eOther” and “eCollectAny”.These are input and target read methods. The second method indicates the method that the receiving operator uses to collect the data.

In this example, “eOther” is the originating or input method for op0. It is an indication that something else is being imposed outside the expected partitioning option (and that you need to observe the string within the parenthesis — APT_ModulusPartitioner in this example — Modulus partitioning is imposed.)

“eCollectAny” is the target read method. Any records that are fed to this data set are collected in a round robin manner. The round robin behavior is less significant than the behavior that occurs for input partitioning method, which is eOther(APT_ModulusPartitioner) for ds0.

In the first example in this document, where the operator and stage uses the APT_SortedMergeCollector for ds9, the “eCollectOther”method indicates where actual partitioning occurs and is usually specified when you are referencing a sequential flat file. Shown again, in part, is the example:

ds8: {op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
eSame=>eCollectAny
op9[4p] (parallel buffer(1))}
ds9: {op9[4p] (parallel buffer(1))
>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
subArgs={ asc }

The symbols between the originating partitioning method and the target read method translates to the parallelism of the partitioning. The following is the list of the symbols, and their definition:

-> Sequential to Sequential
<> Sequential to Parallel
=> Parallel to Parallel (SAME)
#> Parallel to Parallel (NOT SAME)
>> Parallel to Sequential
> No Source or Target

In summary:

Operator “op0” runs first in sequential mode on node “node1”, and send data to data set “ds0”. Data set “ds0” is partitioned using the modulus partitioning method data provided from sequential to parallel (4 ways) where it is sent to operator “op1” running in parallel mode on “node1”, “node2”, “node3”, and “node4”. Operator “op1” then handles the collected data, and sends the results to data set “ds1”. Data set “ds1” just provides data in the same partitioning order for “op2” operator as it was for “op1” operator.

This is the basic concept of how the parallel engine works at runtime, and what is shown in the dump score report.

* * *

The dump score report also can tell you when the parallel engine inserted an operator based on its internal analysis of each operator’s requirements. For example, Join stages require that the data be sorted, but the GUI does not require that you supply the sort details. The engine is intelligent enough to realize that a sort is required, and supplies the requirement itself. You can see the behavior in the following example:

op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}

In the example, tsort is the name of the sort operator being used. As part of this insertion, the data is repartitioned based on the same key as the hash partitioning:

ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
eOther(APT_HashPartitioner { key={ value=MBR_SYS_ID }
})#>eCollectAny
op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)}

All of this partitioning and sorting provided in the proceeding example is for the Sort stage:

* * *

ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}

[…]

op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)
on nodes (
node1[op7,p0]
node2[op7,p1]
node3[op7,p2]
node4[op7,p3]
)}

* * *

If you have a keen eye, you might have already noticed a problem with this particular dump score report: one of the two input links for that sort stage (“op7”) is partitioned using modulus order (“ds0”), while the other input link is partitioned using hash partitioning (“ds4”). The hash partitioning overrode the initial modulus partitioning request (“ds3”).

The first modulus insertion was overridden because the engine detected that the job design did not supply the required fields. Quite frequently, the key fields are supplied in the wrong order, or the job is using different key fields that break the compatibility of the data order requirements for the downstream stages. It is important to review the APT_DUMP_SCORE report and confirm that your valid job design is interpreted correctly by the parallel engine. Ensure that your intended design is correctly implemented.

* * *

One additional detail from the above dump score you may be wondering about

op6[4p] {(parallel buffer(0))
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}

Buffer operators are specifically for instances where the downstream operator is at risk of getting overloaded with data while it is processing. Buffer operators are an attempt to produce a buffer zone where two things happen:

  1. The buffer operator communicates with the upstream operator to slow down its sending of data.
  2. The buffer operator holds on to the data until the downstream operator is ready for the next block of data.

If you are wondering why your job is running slower than other jobs, seeing the number of buffers in effects provides a clue. Buffer operators serve a valuable purpose: the prevention of race conditions between operators. Disabling buffering can cause severe and hard-to-analyze problems. However, better job design can reduce the amount of buffering that occurs.

* * *

The following detail is not seen above, but is possible:

op1[2p] {(parallel APT_CombinedOperatorController:
(APT_TransformOperatorImplV0S1_TrafoTest1_Transformer_1 in Transformer_1)
(Peek_2)
) on nodes (
node1[op1,p0]
node2[op1,p1]
)}

Data sets take up memory. As part of optimization, jobs try to combine multiple operators that handle data in the same way (for example, without any requirement to change the partition or sort order for the data flow). Data is immediately handed off to the next operator when processing is completed in the prior operator with significantly less memory impact.

In the immediately proceeding example, there are two operators that are combined: a Transformer and a Peek, running on two partitions.

When the job log indicates an error occurred in “APT_CombinedOperator”, the APT_DUMP_SCORE report can help you identify which of the combined operators is suspect. To better pinpoint the culprit, enable the environment variable APT_DISABLE_COMBINATION. The environment variable can help you to identify the stage with the error.

Self-training exercises:

  1. What does the dump score of your job looks like when you run it using 1 node configuration file compared to a multiple nodes configuration file?
  2. What happens to the dump score when you enable APT_DISABLE_COMBINATION on a job?
  3. How does the APT_DUMP_SCORE determine multiple inputs and outputs for the jobs, such as multiple references, and multiple output files?
  4. What happens to the dump score if, instead of using the internal sort option of the join stage, the customer uses a separate sort stage?
  5. What happens if the following environment variables are set: APT_NO_PART_INSERTION? APT_NO_SORT_INSERTION? For 8.7 and higher – APT_NO_PARTSORT_OPTIMIZATION?
  6. What happens to the symbol between the partitioning method and eCollectAny when you change the partitioning methods for a single link? How does it look like at the target data set?
  7. What happens when you add a sequential file for input and output with no other operators in between?

The above information is available in IBM site.

istool command line

September 15, 2012 Leave a comment

istool comand line is available on the client and engine tiers.

you can import and export many types of assets,and you can query and delete command metadata.

Location of the istool command line:

For UNIX or Linux, the istool command framework is located in:installation_directory/Clients/istools/cli, where installation_directory is the directory where you installed InfoSphere Information Server. For example,/opt/IBM/InformationServer

$ pwd
/opt/IBM/InformationServer/Clients/istools/cli

./istool -h
istool Command [ Command_Options ] [ Options ]

Possible values for: Command
workbench query : Executes Metadata Workbench query
deletecm : Delete command set
query : Query command set
workbench extension mapping import : Imports Metadata Workbench Extension Mappings
workbench extension source delete : Deletes Metadata Workbench Extension Data Sources
– : Provide usage information.
help : Provide command-line help information.
: Istool Console.
export : Export to a file.
import : Import from a file.
build package : Build a Deployment Package.
deploy package : Deploy a Deployment Package.
send package : Send package to a file
workbench automated services : Runs the Metadata Workbench automated services
workbench extension source import : Imports Metadata Workbench Extension Sources
workbench extension mapping delete : Deletes Metadata Workbench Extension Mappings

To list Command_Options, enter: istool Command -help

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-verbose , -v : display progress
-silent , -s : silence command output
-script , -script : file to read commands from.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Error reading from console
4 – Invalid command history index
5 – Error reading from script file
11 – Invalid command syntax

Export Help:

./istool export -h
istool export [ Options ]

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-domain , -dom : Name of Information Server domain
-username , -u : Name of user account on Information Server domain
-password , -p : Password for Information Server account
-verbose , -v : display progress
-silent , -s : silence command output
-archive , -ar : Archive file
-preview , -pre : Preview export task without executing
-abortIfError , -abort : Abort task on N errors
-updatearchive , -up : Update any existing Archive file
-commonmetadata , -cm : CM Assets

Options for: -commonmetadata
-base , -base : Prefix for Paths
-contactassignmentonly , -caonly : Contact assignment only
-includecontactassignment , -incca : Include contact assignment
-includeannotation , -incannot : Include annotation
-includedataconnection , -incdc : Include data connection
CM item paths
-datastage , -ds : DataStage Assets

Options for: -datastage
-includedependent , -incdep : Include dependent items
-nodesign , -nodes : Exclude design items
-includeexecutable , -incexec : Include executable items
-base , -base : Prefix for Paths
DataStage item paths
-report , -rep : Reporting Assets

Options for: -report
-reportName , -repName : Names of Reports in Reporting Service
-ownedByProduct , -prod : Include Reports belonging to a product
-reportRIDs , -rid : RIDS of the reports to be exported
-includeLastReportResults , -incLastResult : Include Last N Report Results for every report
-includeAllReportResults , -incAllResults : Include all referenced Report results for the Reports Exported
-includeReportResultName , -incResultName : Include Report Result for a given name.
Reporting item paths.
-security , -sec : Security Group Assets.

Options for: -security
-securityGroup , -sg : Security Group Assets.
-groupident , -grp : Specify list of groups. Can use * or ? wildchars.
-includeGroupUserMemberships , -incGrpUsrMems : To specify that group users and their references should be exported along with group.
-includeRoles , -incRole : To specify that user or group role relationships must be exported.
-securityUser , -su : Security User Assets.
-userident , -u : Specify list of users. Can use * or ? wildchars.
-includeUserGroupMemberships , -incUsrGrpMems : To specify that user groups and their references should be exported along with user.
-includeCredential , -incCred : To specify that IS credentials must be exported.
-includeCredentialMappings , -incMap : To specify that DataStage user credential mappings must be exported.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Export failed
4 – Export preview failed
5 – Invalid archive file
10 – Failed to connect to Information Server domain
11 – Invalid command syntax

Import Help:

./istool import -h
istool import [ Options ]

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-domain , -dom : Name of Information Server domain
-username , -u : Name of user account on Information Server domain
-password , -p : Password for Information Server account
-verbose , -v : display progress
-silent , -s : silence command output
-preview , -pre : Preview the command, listing items to be imported. Do not execute.
-archive , -ar : Name of source archive for imported items, on local filesystem
-replace , -r : Replace existing versions of items in Information Server
-abortAfter , -abort : Abort after N errors
-commonmetadata , -cm : Common Metadata
-datastage , -ds : DataStage specific options

Options for: -datastage
-nodes , -nodesign : Exclude DataStage design items
server/project : DataStage project
-report , -rep : Reporting Assets

Options for: -report
-replace , -r : Modify assets during the [import] command execution. Must be specified with [import].
Reporting item paths.
-security , -sec : Security Group Assets.

Options for: -security
-replace , -r : Modify assets during the import command execution. Must be specified with import.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Import failed
4 – Import preview failed
5 – Invalid archive file
10 – Failed to connect to Information Server domain
11 – Invalid command syntax

=====================================

Now we will see some working examples.

Export Command for Infosphere datastage and qualitystage assets.

Syntax:

istool export
authentication options
[generic options]
-archive “pathname” [-updatearchive]
[-preview ]
[-abortIfError=number_of_errors]
-datastage ‘ [ -base=”server[:port]/project”]
[-includedependent]
[-nodesign]
[-includeexecutable]
“dsServer[:port]/project/folder/name.type” ‘

Asset type

Asset types are identified by a type suffix. Type suffixes are not case-sensitive.

Asset type Type suffix
Data element det
IMS database idb
IMS viewset ivs
Mainframe job mjb
Parallel job pjb
Sequence job qjb
Server job sjb
Machine profile mcp
Mainframe routine mrt
Parallel routine prt
Server routine srt
Parallel shared container psc
Server shared container ssc
Table definition tbd
Transform tfm
Data quality specification dqs
Stage type stp
Data connection dcn
Parameter set pst

Examples:

export Parallel job:
istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -datastage ‘ “dsengine/dstage1/Jobs/oracle/Extract.pjb” ‘

Beginning Export
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Elapsed time: 00:00:01
Exported 1 assets

2)Preview export:

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘ “dsengine/dstage1/Jobs/oracle/Extract.pjb” ‘

Beginning Export Preview
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Previewed 1 assets

3)Export category:

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx”  -datastage ‘
“dsengine/dstage1/Jobs/oracle/*.*” ‘

Beginning Export
[1/8] dsengine/dstage1/Jobs/oracle/Extract.pjb
[2/8] dsengine/dstage1/Jobs/oracle/Extract1_Optimized1.pjb
[3/8] dsengine/dstage1/Jobs/oracle/Extract2_Oracle_Connector.pjb
[4/8] dsengine/dstage1/Jobs/oracle/Extract3_Using_DRS_Connector.pjb
[5/8] dsengine/dstage1/Jobs/oracle/Extract4.pjb
[6/8] dsengine/dstage1/Jobs/oracle/Load_Oracle_Amazon_Customers.pjb
[7/8] dsengine/dstage1/Jobs/oracle/Load_Oracle_Amazon_Customers2.pjb
[8/8] dsengine/dstage1/Jobs/oracle/Oracle_Connector_Stage_New_Features.pjb
Elapsed time: 00:00:11
Exported 8 assets

4)using the “-incdep”

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -datastage  ‘-incdep “dsengine/dstage1/Jobs/oracle//SrvJob.sjb” ‘

Beginning Export
[1/2] dsengine/dstage1/Jobs/oracle/SrvJob.sjb
[2/2] dsengine/dstage1/Routines/srvTest.srt
Elapsed time: 00:00:01
Exported 2 assets

The disadvantage of using above command line is it will expose password in the command line.

istool -script scriptfile

example of script file:

cat /tmp/export.txt

export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘
“dsengine/dstage1/Jobs/oracle/*.*” ‘

istool -script /tmp/export.txt

Import command for InfoSphere DataStage and QualityStage assets:

Syntax:

istool import
authentication options
[generic options]
-archive pathname
[-preview|-replace]
[-abortAfter=number_of_errors]
-datastage ‘[-nodesign]”Server/project” ‘

Examples:

The following command previews an import of assets from the file arc.isx

istool import -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘ -nodesign “dsengine/dstage1” ‘

Beginning import preview
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb ignored – already exists. Use -replace to overwrite
Previewed 0 assets

After the preview, the command is repeated without the -preview option to import the assets to the project named dstage1 and also use -replace to import the existing asset.

istool import -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -replace -datastage ‘ “dsengine/dstage1” ‘

Beginning import
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Elapsed time: 00:00:03
Imported 1 assets

dsadmin Command for administering Projects

September 15, 2012 Leave a comment

dsadmin  is the command used for administering projects and has large range of options.

Command Syntax:
dsadmin [-file <file> <domain> <DataStage server> | -domain <domain> -user <user> -password <password> -server <DataStage server>]
<primary command> [<arguments>]

Valid primary command options are:
-createproject
-deleteproject
-oshvisible
-enablercp
-enablejobadmin
-envadd
-envdelete
-envset
-advancedruntime
-basedirectory
-deploymentdirectory
-customdeployment
-listprojects
-listproperties
-listenv
-enablegeneratexml
-protectproject

FileFormat:

Format of the file entry should be like :

domain,server,user,password

and the dsadmin command will be like :

dsadmin -file filename domain server <command arguments>

where domain = Information server domain name

Server=Datastage Enginetier Server name

Creating Project:

dsadmin -file file  domain dsengine -createproject ProjectName [-location ProjectLocation] [-copyroles <sourceproject>]

ProjectName is the name of the project.

-location is the project location and is optional.

-copyroles if you want to copy roles from existing project and is optional

If project location is not specified, project is created in the Projects dir in the server install directory.

Ex:

dsadmin -file /tmp/dscredentials.txt  isdomain dsengine -createproject dstage2 -location /projects/dstage2
Status code = 0

dsjob -lprojects
dstage2

Status code = 0

/projects>ls -tlr
total 4
drwxrwxr-x 45 dsadm dstage 4096 Sep 15 20:30 dstage2

================

dsadmin -listproperties dstage2
OSHVisible=0
JobAdminEnabled=
RTCPEnabled=
PXAdvRTOptions=
PXRemoteBaseDirectory=
PXDeployJobDirectoryTemplate=
PXDeployCustomAction=
PXDeployGenerateXML=

Status code = 0

Enabling/disabling OSH display

dsadmin -oshvisible TRUE dstage2

Status code = 0

Enabling/disabling runtime column propagation

dsadmin -enablercp TRUE | FALSE ProjectName

ex:

dsadmin -enablercp TRUE dstage2

Enabling/disabling job administration from the Director client

dsadmin -enablejobadmin TRUE | FALSE ProjectName

Example:
dsadmin -enablejobadmin TRUE dstage2

Adding an environment variable

dsadmin can be used for created env variable and is added to the User Defined Category

Syntax:

dsadmin -envadd EnvVarName -type STRING | ENCRYPTED 
-prompt "PromptText" [-value "Value"] ProjectName

Ex:
Now i will add new env called "TEST" to the dstage project.

We use dsadmin -listenv dstage to check if this env is defined or not.

dsadmin -listenv dstage2 | grep TEST

Status code = 0

dsadmin -envadd TEST -type STRING  -prompt “TEST ADDING ENV” -value “Test” dstage2

dsadmin -listenv dstage2 | grep TEST
TEST=Test

Deleting an environment variable

dsadmin -envdelete EnvVarName ProjectName

dsadmin -envdelete TEST dstage2

dsadmin -envdelete TEST dstage2

Status code = 0
dsadmin -listenv dstage2 | grep TEST

Status code = 0

Setting the value of an environment variable

Syntax:

dsadmin -envset EnvVarName -value "Value" ProjectName

dsadmin -listenv dstage2 | grep TEST
TEST=Test

Status code = 0

Now we will change the value from Test to Test2

dsadmin -envset TEST -value “Test2” dstage2

dsadmin -listenv dstage2 | grep TEST
TEST=Test2

Listing projects:

dsadmin -listprojects

dstage2


Listing properties

dsadmin -listproperties ProjectName

dsadmin -listproperties dstage2
OSHVisible=1
JobAdminEnabled=1
RTCPEnabled=1
PXAdvRTOptions=
PXRemoteBaseDirectory=
PXDeployJobDirectoryTemplate=
PXDeployCustomAction=
PXDeployGenerateXML=

Status code = 0

Listing environment variables:

dsadmin -listenv ProjectName

EX:
dsadmin -listenv dstage2

Initially I struggled to use -file option in dsadmin and following good Technote from IBM.

On IBM Information Server 8, dsadmin line command with -file option will fail with error message “Failed to locate information for server”
Technote (FAQ)

Question
When trying to use the dsadmin line command with the -file option in order to avoid specifying the user/password in the line command, the dsadmin command will fail with the following error message : “Failed to locate information for server” (Similar issue will occur with dsjob command)
Cause
Incorrect syntax in the file specified for the -file option
Answer
This issue is caused by a missing or incorrect argument in the file, either the domain name or the server name.
Note that the dsadmin command is using a combination of both the domain name and the server name to lookup the connection details in the file.
Format of the file entry should be like :

domain,server,user,password

and the dsadmin command will be like :

dsadmin -file filename domain server <command arguments>

(This is also available for the dsjob command)

We don’t have command line options to setup Sequence compilation options and also to set Auto-purge of job log

If IBM enhance these two options in the command line then we can totally automate datastage project creating using script.

WorkAround:
Create a template project and copy the DSParams file from template project to the original project.

Initially i struggled using -file option and found the following Technote from IBM.


								

DSXImportService.sh — Infosphere Information Server 8.x

September 15, 2012 Leave a comment

Command : DSXImportService.sh

This command is located under ASBNode/bin in engine tier
You can import objects from .dsx files into the specified repository

Importing objects from a .dsx file

Syntax:
-ISFile <isFile> [-ISHost <isHost>[:<port>]] [-ISUser <isUser>] [-ISPassword <isPassword>] [-DSHost <dsHost>[:<port>]] -DSProject <dsProject> -DSXFile <dsxFile> [-Overwrite | -OverwriteReadOnly] [-Verbose] [-StopOnError] [<selected_import>]
|
-ISHost <isHost>[:<port>] -ISUser <isUser> -ISPassword <isPassword> [-DSHost <dsHost>[:<port>]] -DSProject <dsProject> -DSXFile <dsxFile> [-Overwrite | -OverwriteReadOnly] [-Verbose] [-StopOnError] [<selected_import>]
|
-List -DSXFile <dsxFile>

-ISFile isFile
Specifies the file name that contains the connection details. Using this option provides a level of security by hiding the login details from view. If you use this option, you do not have to provide the connection details on the command line. If any connection details are specified on the command line, however,
they override those defined within the file.

Example isFile:

cat isfile.txt

-ISHost isHost
-ISUser dsadm
-ISPassword password
Listing the contents of a .dsx file

DSXImportService.sh -List -DSXFile /tmp/Null_Handling_Test.dsx

LISTING CONTENTS OF DSX FILE:

*** Job ***
Design item = Null_Handling_Test

Total items = 1
Listed file = /tmp/Null_Handling_Test.dsx
DSXImportService.sh -ISFile /home/dsadm/isfile.txt -DSHost dshost -DSProject dstage1 -DSXFile /tmp/Null_Handling_Test.dsx -Overwrite -Verbose

Server import initiated.
Attempting authentication……
Authentication successful..
Processing Jobs….
All EXECJOBS will be ignored due to the Server Version 7.5.2 being too old..
Completing Import
Import completed.
Design item imported successfully. Item Type: Job. Identifier: Null_Handling_Test.

IS Host = ishost
IS Port = 9080
IS User = dsadm
DataStage Project = dstage1
Imported file = /tmp/Null_Handling_Test.dsx
Total items = 1
Items imported = 1
Items not imported = 0
Total import time = 00:00:08

selected_import

You can specify options here to import selected objects from a .dsx file. You specify the object type and the object name as specified in the following table. You can specify a full name or an abbreviated name for the object type.
Table 1. Selected import options Abbreviated option Full option Object type

-JB -JOB job
-EJ -EXECUTABLEJOB job executable
-DE -DATAELEMENT data element
-TD -TABLEDEFINITION table definition
-ST -STAGETYPE stage type
-TR -TRANSFORM transform
-RT -ROUTINE routine
-ID -IMSDATABASE IMS database
-IV -IMSVIEWSET IMS viewset
-MP -MACHINEPROFILE machine profiles
-SC -SHAREDCONTAINER shared container
-QR -QSRULEASSEMBLY QualityStage rule set
-PS -PARAMETERSET parameter set
-DC -DATACONNECTION data connection

If we have multiple objects in .dsx file and want to imprt selected one ,use the above specified options.

First list the contents of dsx file.

DSXImportService.sh -List -DSXFile /tmp/Null_Handling_Test.dsx

LISTING CONTENTS OF DSX FILE

*** Routine ***
Design item = JobControl
Runtime item = DSU.JobControl

*** Shared Container ***
Design item = PtlDeriveSegment

*** Job ***
Design item = Null_Handling_Test

Total items = 4
Listed file = /tmp/Null_Handling_Test.dsx

If you want to import the shared container only from the dsx file then use the following syntax

DSXImportService.sh -ISFile /home/dsadm/isfile.txt -DSHost dshost -DSProject dstage1 -DSXFile /tmp/Null_Handling_Test.dsx -Overwrite -Verbose -SC PtlDeriveSegment

Server import initiated.
Attempting authentication…….
Authentication successful…
Processing Shared Containers………
All EXECJOBS will be ignored due to the Server Version 7.5.2 being too old.
Completing Import.
Import completed.
Design item imported successfully. Item Type: Shared Container. Identifier: PtlDeriveSegment.

IS Host = ishost
IS Port = 9080
IS User = dsadm
DataStage Project = dstage1
Imported file = /tmp/Null_Handling_Test.dsx
Total items = 1
Items imported = 1
Items not imported = 0
Total import time = 00:00:16

JR42922: PROBLEMS WITH TRANSFORMER NULL TESTING EXPRESSIONS WHEN RUNNING WITH APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set

September 15, 2012 Leave a comment

Error description

  • nfoSphere Information Server 8.5 changed how the parallel engine
    Transformer stage handles columns that contain null values.
    These changes were described in Tech Note 1514921.
    
    After InfoSphere Information Server 8.5 shipped,  issues in the
    default null handling behavior of Version 8.5, and problems with
    the implementation of the InfoSphere Information Server 8.1
    compatibility mode  were discovered .
    Jobs that process nulls in Transformer stages show different
    behavior when migrated from 8.1 to 8.5, even when legacy null
    handling (APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING ) is set.
    
    In IS 8.1 Nulls can be converted to other values using any of
    the following 3 functions
     NullToEmpty()
     NullToZero()
     NullToValue()
    
    Even with  APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  set, in IS
    8.5 these functions do not work the same way they did in
    InfoSphere Information Server 8.1 .  In particular, without this
    fix some rows which were not rejected in IS 8.1  were being
    rejected in IS 8.5
    The fix for this APAR makes the treatment of rows containing
    NULL columns the same in IS 8.5 and IS 8.1, when
    APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set. .
    
    NOTE:S
    1. The patch for this APAR also includes the fix for APAR
    JR40637.
    APAR JR40637 concerns problems in transformer compilation  when
    APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.
    
    2. This APAR replaces  APAR JR42336.

     

Local fix

Problem summary

  •  After InfoSphere Information Server 8.5 shipped,  issues in the
    default null handling behavior of Version 8.5, and problems with
    the implementation of the InfoSphere Information Server 8.1
    compatibility mode  were discovered .
    Jobs that process nulls in Transformer stages show different
    behavior when migrated from 8.1 to 8.5, even when legacy null
    handling (APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING ) is set.
    
    In IS 8.1 Nulls can be converted to other values using any of
    the following 3 functions
    ú   NullToEmpty()
    ú NullToZero()
    ú NullToValue()
    
    Even with  APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  set, in IS
    8.5 these functions do not work the same way they did in
    InfoSphere Information Server 8.1 .  In particular, without this
    fix some rows which were not rejected in IS 8.1  were being
    rejected in IS 8.5
    The fix for this APAR makes the treatment of rows containing
    NULL columns the same in IS 8.5 and IS 8.1, when
    APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set. .
    
    NOTES
    1. The patch for this APAR also included the fix for APAR
    JR40637.
    APAR JR40637 concerns problems in transformer compilation  when
    APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.
    2. This APAR replaces  APAR JR42336.

     

Problem conclusion

  •   Install the patch associated with this APAR
    NOTES
    1. The patch for this APAR also included the fix for APAR
    JR40637.
    APAR JR40637 concerns problems in transformer compilation  when
    APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.
    2. This APAR replaces  APAR JR42336.

Issues with NULL values in input columns in Parallel Engine Transformer stage

September 13, 2012 Leave a comment

Problem(Abstract)

InfoSphere Information Server 8.5 changed how the parallel engine Transformer stage handles columns that contain null values. These changes were described in Tech Note 1514921.

Since InfoSphere Information Server 8.5 shipped, issues in the default null handling behavior of Version 8.5, and problems with the implementation of the InfoSphere Information Server 8.1 compatibility mode were discovered . This note describes the issue and what to do about it.

Symptom

Jobs that process nulls in Transformer stages show different behavior when migrated from 8.1 to 8.5, even when legacy null handling is set.

Resolving the problem

Quick Overview:

In InfoSphere Information Server version 8.1 and prior versions, the job design had to explicitly handle null column values in the Transformer stage. If the Parallel Engine encountered null column values outside of specific contexts, the entire row containing the null was dropped, or sent to a reject link if the Transformer stage had a reject link.
(NOTE: This note refers to SQL value NULL, not the character with value 0x00, and not an empty string.)
Customers complained that the need for explicit null handling made Transformer stage coding too complex and allowed inconsistent behavior. So, starting in InfoSphere Information Server version 8.5 the default behaviors were changed and explicit null handling was no longer required. It was recognized that some customers would want to retain the original null-handling behavior so an environment variable, APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING, was introduced. The environment variable, when defined, preserves compatibility with the behavior of pre-version 8.5 InfoSphere Information Server.
These changes were described in Tech Note 1514921. Since version 8.5 shipped, differences in the default null handling behavior of Version 8.5 and problems with the implementation of the InfoSphere Information Server 8.1 compatibility mode have been discovered .

There have been issues with null handling in InfoSphere Information Server 8.5 (with backward compatibility enabled), InfoSphere Information Server 8.1, and earlier versions. Most of these issues were due to lack of clear explanation about how null values should be handled in the Transformer stage.

This Tech Note is a more detailed description about how the null handling should be done in InfoSphere Information Server 8.5 (with backward compatibility mode set), 8.1 and earlier versions. This document also describes an inconsistency in the behavior of version 8.5 (with backward compatibility mode set) compared to the behavior of version 8.1.

1. InfoSphere Information Server 8.1 NULL handling: Explicit handling required.
If you use input columns in an output column expression, a NULL value in that input column will cause the row to be dropped, or rejected if a reject link is defined.

This applies where:
· An input column is used in an output column derivation expression
(for example, an expression like ?DSLink4.col1 + 1?).
· An input column is used in an output column constraint.
· An input column is used in a stage variable derivation.

It does not apply where an output column is mapped directly from an input column, with a straight assignment expression.

In InfoSphere Information Server 8.1, there are 3 legal ways to explicitly handle NULL values in input columns in the Transformer stage.

a. Conversion of NULL values:
Nulls can be converted to other values using any of the following 3 functions
· NullToEmpty()
· NullToZero()
· NullToValue()

Example :

    DSLink3.OrderCount + 1 –> If DSLink3.OrderCount is NULL, the whole record containing

      DSLINK3.OrderCount will be dropped or sent to the reject link

This expression can be changed to
NullToZero(DSLink3.OrderCount) + 1 –> If DSLink3.OrderCount is NULL, the target field will be assigned the integer value 1 .

b. Check for NULL value:

To test if a value is NULL in a logical expression, use one of these two functions.
· IsNotNull()
· IsNull()

Example:
DSLink3.OrderCount + 1
–> If DSLink3.OrderCount is NULL, record will be dropped or rejected.
This expression can be changed to:
If(IsNotNULL(DSLink3.OrderCount) Then DSLink3.OrderCount + 1 Else 1
–> If DSLink3.OrderCount is NULL, the target field will be the integer 1 .

Each nullable column in a given expression needs to be properly NULL checked or the NULL value needs to be converted to a concrete value.

c. IF-ELSE operations on NULL values:

Handling NULL values in IF-ELSE conditions can be complex. Consider the following examples to get familiar with using NULL checks in IF-ELSE statements.

Example 1: Simple IF ELSE statement
If (DSLink1.Col1 > 0) Then xxx Else yyy
In InfoSphere Information Server 8.5 code will be generated to drop records in case DSLink1.Col1 is NULL.

This needs to be written as

If (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 > 0)) Then xxx Else yyy
or
If (IsNull(DSLink1.Col1) or (DSLink1.Col1 > 0)) Then xxx Else yyy
based on the business requirement on how the NULL case should be treated

It can also be written as
If (NullToZero(DSLink1.Col1) > 0) Then xxx Else yyy
or
If (NullToZero(DSLink1.Col1) >= 0) Then xxx Else yyy
again based on the business requirement .

NUllToValue() or NullToEmpty() also can be used instead of NullToZero(), based on the requirement.

NULL conversion functions should not be used for checking a NULL
Example: If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy

NullToZero(DSLink1.Col1) = 0 is not considered a NULL check, code will be generated to drop records in case DSLink1.Col1 is NULL.

Example 2: IF ELSE statement with multiple conditions
If ((DSLink1.Col1 = 5) or (DSLink1.Col2 = 8)) Then xxx Else yyy –> Col2 is a nullable field

This cannot be written as
If (DSLink1.Col1 = 5 or IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8)) Then xxx Else yyy
This is not the proper way of checking.
If(condition1 or condition2 and condition3) will be treated as If((condition1 or condition2) and condition3))

It should be written as
If (DSLink1.Col1 = 5 or (IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8))) Then xxx Else yyy

The condition which contains the nullable column should be properly (order should be clearly specified using parentheses where ever needed) pre-“AND”ed with IsNotNull() check or pre-“OR”ed with IsNull() check.

Example 3: IF ELSE statement in which the nullable field is used multiple times
If ((DSLink1.Col1 = 3) or (DSLink1.Col1 = 5)) Then xxx Else yyy
Records will be dropped in case Col1 is NULL.

This cannot be written as
If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (DSLink1.Col1 = 5)) Then xxx Else yyy

This should be written as
If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 5)) Then xxx Else yyy

Each instance of nullable field needs to be pre-“AND”ed or pre-“OR”ed with NULL check

Example 4: Using 2 nullable columns in a condition.

If (DSLink1.Col1 = DSLink1.Col1) Then xxx Else yyy

Both the columns need to be NULL checked or NULL conversion functions should be used on both the columns

If (IsNotNull(DSLink1.Col1) and (IsNotNull(DSLink1.Col2) and (DSLink1.Col1 = DSLink1.Col1))) Then xxx Else yyy

2.InfoSphere Information Server 8.5 NULL handling: Explicit handling not required.

In brief, in InfoSphere Information Server 8.5, NULL value in that input column will NOT cause the row to be dropped nor will it be sent to reject link.
A NULL value in that input column will be handled by the Transformer stage, following specific logic. The job designer can skip explicit NULL handling.
Technote 1514921 has further details.
The environment variable APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING can be used, in case the designer wants to have 8.1 behavior in 8.5.
Enabling old NULL handling can be done at 3 different stages

      1. Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at project level.
      2. Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at job level.
    3. Checking “Legacy null processing” in the DataStage Designer for individual Transformer stages in a given job.

IBM has recently discovered a previously undocumented difference in behavior of InfoSphere Information Server 8.1 and InfoSphere Information Server 8.5 with old NULL handling enabled.
InfoSphere Information Server 8.5 allowed the three “NullToxxxx()” functions to be used as Null tests.

Example:
In InfoSphere Information Server 8.1 the following IF-ELSE condition

    If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy
    “(NullToZero(DSLink1.Col1) = 0” was considered as a NULL check and records were not dropped or sent to reject link. This is due to inconsistency in the code.

In InfoSphere Information Server 8.5 this code inconsistency was eliminated and only IsNull() and IsNotNull() can be used as Null checks. InfoSphere Information Server 8.1 Jobs which used NullToZer0(), NulltoValue(), or NulltoEmpty() for null checking must be changed to use IsNull() or IsNotNull().

Note:
!IsNull() can be used instead of IsNotNull()
!IsNotNull() can be used instead of IsNull()
in NULL checking.