Use of the tunable parameters in the UVCONFIG file

October 5, 2012 ukatru Leave a comment

Question

How do I use the tunable parameters in the UVCONFIG file?

Answer

The most commonly used parameters in the UVCONFIG file are as follows:

MFILES

This parameter defines the size of the server engine (DSEngine) rotating file pool. This is a per process pool for files such as sequential files that are opened by the DataStage server runtime. It does not include files opened directly at OS level by the parallel engine (PXEngine running osh).

The server engine will logically open and close files at the DataStage application level and physically close them at the OS level when the need arises.

Increase this value if DataStage jobs use a lot of files. Generally, a value of around 250 is suitable. If the value is set too low, then performance issues may occur, as the server engine will make more calls to open and close at the physical OS level in order to map the logical pool to the physical pool.

NOTE: The OS parameter of nofiles needs to be set higher than MFILES. Ideally, it would be recommended that nofiles be at least 512.

This will allow the DataStage process to open up to 512 – (MFILES + 8 ) files.

On most UNIX systems, the proc file system can be used to monitor the file handles opened by a given process; for example:

ps -ef|grep dsrpcd

root 23978 1 0 Jul08 ? 00:00:00 /opt/ds753/Ascential/DataStage/DSEngine/bin/accdsrpcd

ls -l /proc/23978/fd

lrwx------ 1 root dstage 64 Sep 25 08:24 0 -> /dev/pts/1 (deleted)
l-wx------ 1 root dstage 64 Sep 25 08:24 1 -> /dev/null
l-wx------ 1 root dstage 64 Sep 25 08:24 2 -> /dev/null
lrwx------ 1 root dstage 64 Sep 25 08:24 3 -> socket:[12928306]

So the dsrpcd process (23978) has four files open.

T30FILE

This parameter determines the maximum number of dynamic hash files that can be opened system-wide on the DataStage system.

If this value is too low, expect to find an error message similar to ‘T30FILE table full’.

The following engine command, executed from $DSHOME, shows the number of dynamic files in use:

echo "`bin/smat -d|wc -l` - 3"|bc

Use this command to assist with tuning the T30FILE parameter.

A Technote has already been issued on this topic (www.ibm.com/support/docview.wss?uid=swg21390117).

Every running DataStage job requires at least 3 slots in this table. (RT_CONFIG, RT_LOG, RT_STATUS). Note, however, that multi-instance jobs share slots for these files, because although each job run instance creates a separate file handle, this just increments a usage counter in the table if the file is already open to another instance.

Note that on AIX the T30FILE value should not be set higher than the system setting ulimit -n.

GLTABSZ

This parameter defines the size of a row in the group lock table.

Tune this value if the number of group locks in a given slot is getting close to the value defined.

Use the LIST.READU EVERY command from the server engine shell to assist with monitoring this value. LIST.READU lists the active file and record locks; the EVERY keyword lists the active group locks in addition.

For example, with a Designer client and a Director client both logged in to a project named “dstage0”:

Device A number that identifies the logical partition of the disk where the file system is located.
Inode A number that identifies the file that is being accessed.
Netnode A number that identifies the host from which the lock originated. 0 indicates a lock on the local machine, which will usually be the case for DataStage. If other than 0, then on Unix it is the last part of the TCP/IP host number specified in the /etc/hosts file; on Windows it is either the last part of the TCP/IP host number or the LAN Manager node name, depending on the network transport used by the connection.
Userno The phantom process that set the lock.
Pid A number that identifies the controlling process.
Item-ID The record ID of the locked record.
Lmode The number assigned to the lock, and a code that describes its use.

When the report describes file locks (not shown here), it contains the following additional information:

Lmode codes described shared or exclusive file locks, rarely seen in normal DataStage use:
FS, IX, CR Shared file locks.
FX, XU, XR Exclusive file locks.

When the report describes group locks, it contains the following additional information:Lmode codes are:

EX Exclusive lock
SH Shared lock.
RD Read lock.
WR Write lock.
IN System information lock

G-Address Logical disk address of group, or its offset in bytes from the start of the file, in hex.
Record Locks The number of locked records in the group.
Group RD Number of readers in the group.
Group SH Number of shared group locks.
Group EX Number of exclusive group locks.

When the report describes record locks, it contains the following additional information:Lmode codes are:

RL Shared record lock
RU Update record lock

RLTABSZ

This parameter defines the size of a row in the record lock table.

From a DataStage job point of view, this value affects the number of concurrent DataStage jobs that can be executed, and the number of DataStage Clients that can connect.

Use the LIST.READU command from the DSEngine shell to monitor the number of record locks in a given slot (see previous section for use of the EVERY keyword).

For example, with one Director client logged in to a project named “dstage0”, and 2 instances of a job in that project that are running:

In the above report, Item-ID=RT_CONFIG456 identifies that the running job is an instance of job number 456, whose compiled job file is locked while the instance is running so that, for example, it cannot be re-compiled in that time. A job’s number within its project can be seen in the Director job status view – Detail dialog – for a particular job.

The unnamed column in-between UserNo and Lmode relates to a row number within the Record Lock table. Each row can hold RLTABSZ locks. In the above example, 3 slots out of 75 (Default value for RLTABSZ) have been used for row 62. When the number of entries for a given row gets close to the RLTABSZ value, it is time to consider re-tuning the system.

Jobs can fail to start, or generate -14 errors, if RLTABSZ is being reached.

DataStage Clients may see an error message similar to ‘DataStage Project locked by Administrator’ when attempting to connect. Note that the error message can be misleading – it means in this case that a lock cannot be acquired because the lock table is full, and not because another user already has the lock.

MAXRLOCK

This should always be set to the value of RLTABSZ – 1.

Each DSD.RUN process takes a record lock on a key name <project>&!DS.ADMIN!& of the UV.ACCOUNT file in $DSHOME (as seen in the examples above). Each DataStage client connection (for example, Designer, Director, Administrator, dsjob command) takes this record lock as well. This is the mechanism by which DataStage determines whether operations such as project deletion are safe, operations cannot proceed while a project lock is held by any process.

MAXRLOCK needs to be set to accommodate the maximum # of jobs and sequences plus client connections that will be used at any given time. And RLTABSZ needs to be set to MAXRLOCK + 1.

Keep in mind that changing RLTABSZ greatly increases the amount of memory needed by the disk shared memory segment.

Current Recommended Settings

Customer Support has reported in the past that using settings of 130/130/129 (for RLTABSZ/GLTABSZ/MAXRLOCK, respectively) work successfully on most customer installations. There have been reports of high-end customers using settings of 300/300/299, so this is environment specific.

If sequencers or multi-instance jobs are used, start with the recommended settings of 130/130/129, and increase to 300/300/299 if necessary.

Prior to DataStage v8.5 the following settings were pre-defined:

MFILES = 150

T30FILE = 200

GLTABSZ = 75

RLTABSZ = 75

MAXRLOCK = 74 (that is, 75-1)

DataStage v8.5 has the following settings pre-defined:

MFILES = 150

T30FILE = 512

GLTABSZ = 75

RLTABSZ = 150

MAXRLOCK = 149 (that is, 150-1)

However, note that these are the lowest suggested values to accommodate all system configurations, so tuning of these values is often necessary.

DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF

These are the shared memory address offset values for each of the four DataStage shared memory segments (Disk, Printer, Catalog, NLS). Depending upon the platform, PMEMOFF, CMEMOFF & NMEMOFF will need to be increased to allow for a large disk shared memory to be used.

Where these values are set to 0x0 (on AIX for example), the OS takes care of managing these offsets. Otherwise, the PMEMOFF – DMEMOFF = largest disk shared memory segment size.

Additionally, on Solaris for example, these values will be increased to allow for a greater heap size for the running DataStage job.

Note that when running the shmtest utility, great care must be taken with interpreting its output. The utility tests the availability of memory that it can allocate at the time it runs, and this will be affected both by the current uvconfig settings, how much shared memory is already in use, and other activity on the machine at the time.

Categories: Datastage-Best Practices

InfoSphere DataStage parallel jobs: Understanding the content of the APT_DUMP_SCORE report

September 20, 2012 ukatru Leave a comment

Question

How do I read and interprete the results from the APT_DUMP_SCORE report?

Answer

The APT_DUMP_SCORE report documents activity within the InfoSphere Information Server parallel engine. The parallel engine is a runtime engine that uses a combination of interpreted and compiled code to extract, transformation and load of data. The InfoSphere DataStage Designer enables you to specify the data flow and processing steps in a job, without having to specify the degree and nature of parallel processing which happens during job run time.

A configuration file (APT_CONFIG_FILE) specifies the nature and amount of parallelism, and the specific resources that are used to run a job. When a job is to be executed, the data flow information in the compiled job is combined with the configuration file to produce a detailed execution plan called the score. The score is useful in analyzing job performance and diagnosing problems in jobs. If the user sets the environment variable APT_DUMP_SCORE, a text representation of the score (a report) is written to the job’s log.

The job score that populates the report includes information about all operators in the job, including operators that are automatically inserted by the Designer client. The score shows how many nodes each operator runs on, and the anticipated flow of the job as it goes from operator to operator through internal data sets.

To see the score in the job log, enable APT_DUMP_SCORE (by defining it and supplying a value) either on the job level, project level, or the system level. Enabling APT_DUMP_SCORE at the system level results in all parallel jobs providing a dump score report, and is ideal for development and quality assurance environments.

This document provides a start in understanding how your job works, and further analyzing job issues.

The following score example is a fairly small job. When you enable APT_DUMP_SCORE and then run a job, you might typically see in the log the following text:

main_program: This step has 10 datasets:
ds0: {op0[1p] (sequential PacifBaseMCES)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
[pp] eSame=>eCollectAny
op2[4p] (parallel RemDups)}
ds2: {op2[4p] (parallel RemDups)
[pp] eSame=>eCollectAny
op6[4p] (parallel buffer(0))}
ds3: {op3[1p] (sequential PacifGalaxyMember)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)}
ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
eOther(APT_HashPartitioner { key={ value=MBR_SYS_ID }
})#>eCollectAny
op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)}
ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds6: {op6[4p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}
ds7: {op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)
[pp] eAny=>eCollectAny
op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)}
ds8: {op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
eSame=>eCollectAny
op9[4p] (parallel buffer(1))}
ds9: {op9[4p] (parallel buffer(1))
>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
subArgs={ asc }
}
})
op10[1p] (sequential APT_RealFileExportOperator in HP37_OvaWestmember_extract_dat)}
It has 11 operators:
op0[1p] {(sequential PacifBaseMCES)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}
op2[4p] {(parallel RemDups)
on nodes (
node1[op2,p0]
node2[op2,p1]
node3[op2,p2]
node4[op2,p3]
)}
op3[1p] {(sequential PacifGalaxyMember)
on nodes (
node2[op3,p0]
)}
op4[4p] {(parallel IndvIdJoin.toIndvIdJoin_Sort)
on nodes (
node1[op4,p0]
node2[op4,p1]
node3[op4,p2]
node4[op4,p3]
)}
op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}
op6[4p] {(parallel buffer(0))
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}
op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)
on nodes (
node1[op7,p0]
node2[op7,p1]
node3[op7,p2]
node4[op7,p3]
)}
op8[4p] {(parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
on nodes (
node1[op8,p0]
node2[op8,p1]
node3[op8,p2]
node4[op8,p3]
)}
op9[4p] {(parallel buffer(1))
on nodes (
node1[op9,p0]
node2[op9,p1]
node3[op9,p2]
node4[op9,p3]
)}
op10[1p] {(sequential APT_RealFileExportOperator in HP37_OvaWestmember_extract_dat)
on nodes (
node2[op10,p0]
)}
It runs 35 processes on 4 nodes.

The dump score contains two sections — the data sets (DS) and the operators (OP).
Data sets – The data sets that are listed in the score are the same type of data sets that you create with the Data Set stage — in this context, they are temporary memory and/or disk storage during the job’s run.
Operators – Operators are individual parallel engine stages that you might see on the user interface.

In a typical job flow, operators are end-points, and data sets are the links between the operators. (An exception is when data sets are used to actually output to a file.)

Each and every link on the job design is potentially a data set. However, unlike the data set stage which you put in your resource disk group by using the specified node pool within the job’s configuration file (APT_CONFIG_FILE), these data sets are in memory. These temporary data sets are only placed in the scratch disk space when an imposed limit is reached. A limit can be imposed due to environment settings, or due to physical memory limitations.

Each operator listed in the score spawns a number of processes that are dependent on:

the job’s established configuration file (APT_CONFIG_FILE)
constrained by the node pool settings
the operator configuration in the parallel engine code
Several environment variables, such as APT_DISABLE_COMBINATION, being set/unset.

First, let us focus on the operators, which are listed after the data sets in the score:

op0[1p] {(sequential PacifBaseMCES)
on nodes (
node1[op0,p0]
)}
op1[4p] {(parallel RemDups.IndvIDs_in_Sort)
on nodes (
node1[op1,p0]
node2[op1,p1]
node3[op1,p2]
node4[op1,p3]
)}

In the proceeding example, the two operators are: op0 and op1. The operator name is prefixed with the code name “op” and appended with an incremental numeric value starting with zero (0). Next to the operator name, there is an enclosed bracket with a value that is followed by the letter “p”, for example, “[1p]”. The value indicates the number of partitions given to that operator by the engine. For the first operator, only one (1) partition is provided, and the second operator is given four (4) partitions.

Within the curly brackets, the execution mode (“parallel” or “sequential”) and the name of that operator is provided. The operator name is based on the name shown on the parallel canvas in the Designer client. The operator name is not the same as the operator type.

In the proceeding example, the first operator is listed as “PacifBaseMCES” and is the stage name in its entirety. However, the second operator, is listed as “remDups.IndvIDs_in_Sort”. The stage name”IndvIDs” is renamed to indicate that the sort process triggered by the remove duplicate stage occured.

Following each operator name are the specific nodes that the operators are tagged to run on. In the proceeding example, node1 is for the first operator, and node1, node2, node3, and node4 are for the second operator. The name of nodes are defined in your configuration file (APT_CONFIG_FILE).

Now let us focus on the data sets:

ds0: {op0[1p] (sequential PacifBaseMCES)
eOther(APT_ModulusPartitioner { key={ value=MBR_SYS_ID }
})<>eCollectAny
op1[4p] (parallel RemDups.IndvIDs_in_Sort)}
ds1: {op1[4p] (parallel RemDups.IndvIDs_in_Sort)
[pp] eSame=>eCollectAny
op2[4p] (parallel RemDups)}

The name of the data set is provided first. Within the curly brackets, there are three stages:

the source of the data set – operator 0, sequential PacifBaseMCES
the activity of the data set – operator 1, parallel RemDups.IndvIDs_in_Sort
the target of the data set – operator 2, parallel RemDups

The source and target are usually operators, although you might see a specific file name provided that indicates that the operator is referencing and reading from a physical data set.

The first data set, ds0, partitions the data from the first operator (“op0” running in 1 partition). The data set uses APT_ModulusPartitioner (sometimes referred to as modulus) to partition. (Note: APT means Advanced Parallel Technology.) The modulus partitioning is using the key field MBR_SYS_ID in this scenario. The partitioning data is being sent to the second operator (op1 running in 4 partitions). Logically, this means that the data is being partitioned in 4 partitions using the modulus method.

The second data set, “ds1”, reads from the second operator (“op1” running 4 partitioning). The second data set uses the eSame method to partition and sends the data over to the third operator (“op2” running 4 partitioning). One special note: the value “[pp]” means preserved partitioning. Preserved partitioning is an option that is usually set by default in the GUI. If data must be repartitioned, the “[pp]” flag is overridden and a warning message is triggered.

In the example for the first dataset, you see “eOther” and “eCollectAny”.These are input and target read methods. The second method indicates the method that the receiving operator uses to collect the data.

In this example, “eOther” is the originating or input method for op0. It is an indication that something else is being imposed outside the expected partitioning option (and that you need to observe the string within the parenthesis — APT_ModulusPartitioner in this example — Modulus partitioning is imposed.)

“eCollectAny” is the target read method. Any records that are fed to this data set are collected in a round robin manner. The round robin behavior is less significant than the behavior that occurs for input partitioning method, which is eOther(APT_ModulusPartitioner) for ds0.

In the first example in this document, where the operator and stage uses the APT_SortedMergeCollector for ds9, the “eCollectOther”method indicates where actual partitioning occurs and is usually specified when you are referencing a sequential flat file. Shown again, in part, is the example:

ds8: {op8[4p] (parallel APT_TransformOperatorImplV22S14_ETLTek_HP37FMember_PMR64262_Test1_SplitTran2 in SplitTran2)
eSame=>eCollectAny
op9[4p] (parallel buffer(1))}
ds9: {op9[4p] (parallel buffer(1))
>>eCollectOther(APT_SortedMergeCollector { key={ value=MBR_SYS_ID,
subArgs={ asc }

The symbols between the originating partitioning method and the target read method translates to the parallelism of the partitioning. The following is the list of the symbols, and their definition:

-> Sequential to Sequential
<> Sequential to Parallel
=> Parallel to Parallel (SAME)
#> Parallel to Parallel (NOT SAME)
>> Parallel to Sequential
> No Source or Target

In summary:

Operator “op0” runs first in sequential mode on node “node1”, and send data to data set “ds0”. Data set “ds0” is partitioned using the modulus partitioning method data provided from sequential to parallel (4 ways) where it is sent to operator “op1” running in parallel mode on “node1”, “node2”, “node3”, and “node4”. Operator “op1” then handles the collected data, and sends the results to data set “ds1”. Data set “ds1” just provides data in the same partitioning order for “op2” operator as it was for “op1” operator.

This is the basic concept of how the parallel engine works at runtime, and what is shown in the dump score report.

* * *

The dump score report also can tell you when the parallel engine inserted an operator based on its internal analysis of each operator’s requirements. For example, Join stages require that the data be sorted, but the GUI does not require that you supply the sort details. The engine is intelligent enough to realize that a sort is required, and supplies the requirement itself. You can see the behavior in the following example:

op5[4p] {(parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
on nodes (
node1[op5,p0]
node2[op5,p1]
node3[op5,p2]
node4[op5,p3]
)}

In the example, tsort is the name of the sort operator being used. As part of this insertion, the data is repartitioned based on the same key as the hash partitioning:

ds4: {op4[4p] (parallel IndvIdJoin.toIndvIdJoin_Sort)
eOther(APT_HashPartitioner { key={ value=MBR_SYS_ID }
})#>eCollectAny
op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)}

All of this partitioning and sorting provided in the proceeding example is for the Sort stage:

* * *

ds5: {op5[4p] (parallel inserted tsort operator {key={value=MBR_SYS_ID, subArgs={asc}}}(0) in IndvIdJoin)
[pp] eSame=>eCollectAny
op7[4p] (parallel APT_JoinSubOperatorNC in IndvIdJoin)}

[…]

op7[4p] {(parallel APT_JoinSubOperatorNC in IndvIdJoin)
on nodes (
node1[op7,p0]
node2[op7,p1]
node3[op7,p2]
node4[op7,p3]
)}

* * *

If you have a keen eye, you might have already noticed a problem with this particular dump score report: one of the two input links for that sort stage (“op7”) is partitioned using modulus order (“ds0”), while the other input link is partitioned using hash partitioning (“ds4”). The hash partitioning overrode the initial modulus partitioning request (“ds3”).

The first modulus insertion was overridden because the engine detected that the job design did not supply the required fields. Quite frequently, the key fields are supplied in the wrong order, or the job is using different key fields that break the compatibility of the data order requirements for the downstream stages. It is important to review the APT_DUMP_SCORE report and confirm that your valid job design is interpreted correctly by the parallel engine. Ensure that your intended design is correctly implemented.

* * *

One additional detail from the above dump score you may be wondering about

op6[4p] {(parallel buffer(0))
on nodes (
node1[op6,p0]
node2[op6,p1]
node3[op6,p2]
node4[op6,p3]
)}

Buffer operators are specifically for instances where the downstream operator is at risk of getting overloaded with data while it is processing. Buffer operators are an attempt to produce a buffer zone where two things happen:

The buffer operator communicates with the upstream operator to slow down its sending of data.
The buffer operator holds on to the data until the downstream operator is ready for the next block of data.

If you are wondering why your job is running slower than other jobs, seeing the number of buffers in effects provides a clue. Buffer operators serve a valuable purpose: the prevention of race conditions between operators. Disabling buffering can cause severe and hard-to-analyze problems. However, better job design can reduce the amount of buffering that occurs.

* * *

The following detail is not seen above, but is possible:

op1[2p] {(parallel APT_CombinedOperatorController:
(APT_TransformOperatorImplV0S1_TrafoTest1_Transformer_1 in Transformer_1)
(Peek_2)
) on nodes (
node1[op1,p0]
node2[op1,p1]
)}

Data sets take up memory. As part of optimization, jobs try to combine multiple operators that handle data in the same way (for example, without any requirement to change the partition or sort order for the data flow). Data is immediately handed off to the next operator when processing is completed in the prior operator with significantly less memory impact.

In the immediately proceeding example, there are two operators that are combined: a Transformer and a Peek, running on two partitions.

When the job log indicates an error occurred in “APT_CombinedOperator”, the APT_DUMP_SCORE report can help you identify which of the combined operators is suspect. To better pinpoint the culprit, enable the environment variable APT_DISABLE_COMBINATION. The environment variable can help you to identify the stage with the error.

Self-training exercises:

What does the dump score of your job looks like when you run it using 1 node configuration file compared to a multiple nodes configuration file?
What happens to the dump score when you enable APT_DISABLE_COMBINATION on a job?
How does the APT_DUMP_SCORE determine multiple inputs and outputs for the jobs, such as multiple references, and multiple output files?
What happens to the dump score if, instead of using the internal sort option of the join stage, the customer uses a separate sort stage?
What happens if the following environment variables are set: APT_NO_PART_INSERTION? APT_NO_SORT_INSERTION? For 8.7 and higher – APT_NO_PARTSORT_OPTIMIZATION?
What happens to the symbol between the partitioning method and eCollectAny when you change the partitioning methods for a single link? How does it look like at the target data set?
What happens when you add a sequential file for input and output with no other operators in between?

The above information is available in IBM site.

Categories: Datastage-Best Practices

istool command line

September 15, 2012 ukatru Leave a comment

istool comand line is available on the client and engine tiers.

you can import and export many types of assets,and you can query and delete command metadata.

Location of the istool command line:

For UNIX or Linux, the istool command framework is located in:installation_directory/Clients/istools/cli, where installation_directory is the directory where you installed InfoSphere Information Server. For example,/opt/IBM/InformationServer

$ pwd
/opt/IBM/InformationServer/Clients/istools/cli

./istool -h
istool Command [ Command_Options ] [ Options ]

Possible values for: Command
workbench query : Executes Metadata Workbench query
deletecm : Delete command set
query : Query command set
workbench extension mapping import : Imports Metadata Workbench Extension Mappings
workbench extension source delete : Deletes Metadata Workbench Extension Data Sources
– : Provide usage information.
help : Provide command-line help information.
: Istool Console.
export : Export to a file.
import : Import from a file.
build package : Build a Deployment Package.
deploy package : Deploy a Deployment Package.
send package : Send package to a file
workbench automated services : Runs the Metadata Workbench automated services
workbench extension source import : Imports Metadata Workbench Extension Sources
workbench extension mapping delete : Deletes Metadata Workbench Extension Mappings

To list Command_Options, enter: istool Command -help

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-verbose , -v : display progress
-silent , -s : silence command output
-script , -script : file to read commands from.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Error reading from console
4 – Invalid command history index
5 – Error reading from script file
11 – Invalid command syntax

Export Help:

./istool export -h
istool export [ Options ]

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-domain , -dom : Name of Information Server domain
-username , -u : Name of user account on Information Server domain
-password , -p : Password for Information Server account
-verbose , -v : display progress
-silent , -s : silence command output
-archive , -ar : Archive file
-preview , -pre : Preview export task without executing
-abortIfError , -abort : Abort task on N errors
-updatearchive , -up : Update any existing Archive file
-commonmetadata , -cm : CM Assets

Options for: -commonmetadata
-base , -base : Prefix for Paths
-contactassignmentonly , -caonly : Contact assignment only
-includecontactassignment , -incca : Include contact assignment
-includeannotation , -incannot : Include annotation
-includedataconnection , -incdc : Include data connection
CM item paths
-datastage , -ds : DataStage Assets

Options for: -datastage
-includedependent , -incdep : Include dependent items
-nodesign , -nodes : Exclude design items
-includeexecutable , -incexec : Include executable items
-base , -base : Prefix for Paths
DataStage item paths
-report , -rep : Reporting Assets

Options for: -report
-reportName , -repName : Names of Reports in Reporting Service
-ownedByProduct , -prod : Include Reports belonging to a product
-reportRIDs , -rid : RIDS of the reports to be exported
-includeLastReportResults , -incLastResult : Include Last N Report Results for every report
-includeAllReportResults , -incAllResults : Include all referenced Report results for the Reports Exported
-includeReportResultName , -incResultName : Include Report Result for a given name.
Reporting item paths.
-security , -sec : Security Group Assets.

Options for: -security
-securityGroup , -sg : Security Group Assets.
-groupident , -grp : Specify list of groups. Can use * or ? wildchars.
-includeGroupUserMemberships , -incGrpUsrMems : To specify that group users and their references should be exported along with group.
-includeRoles , -incRole : To specify that user or group role relationships must be exported.
-securityUser , -su : Security User Assets.
-userident , -u : Specify list of users. Can use * or ? wildchars.
-includeUserGroupMemberships , -incUsrGrpMems : To specify that user groups and their references should be exported along with user.
-includeCredential , -incCred : To specify that IS credentials must be exported.
-includeCredentialMappings , -incMap : To specify that DataStage user credential mappings must be exported.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Export failed
4 – Export preview failed
5 – Invalid archive file
10 – Failed to connect to Information Server domain
11 – Invalid command syntax

Import Help:

./istool import -h
istool import [ Options ]

Possible values for: Options
Long Name: Short Name: Description:
-help , -h : print command usage
-domain , -dom : Name of Information Server domain
-username , -u : Name of user account on Information Server domain
-password , -p : Password for Information Server account
-verbose , -v : display progress
-silent , -s : silence command output
-preview , -pre : Preview the command, listing items to be imported. Do not execute.
-archive , -ar : Name of source archive for imported items, on local filesystem
-replace , -r : Replace existing versions of items in Information Server
-abortAfter , -abort : Abort after N errors
-commonmetadata , -cm : Common Metadata
-datastage , -ds : DataStage specific options

Options for: -datastage
-nodes , -nodesign : Exclude DataStage design items
server/project : DataStage project
-report , -rep : Reporting Assets

Options for: -report
-replace , -r : Modify assets during the [import] command execution. Must be specified with [import].
Reporting item paths.
-security , -sec : Security Group Assets.

Options for: -security
-replace , -r : Modify assets during the import command execution. Must be specified with import.

Exit codes:

0 – Success
1 – Warning
2 – Partial failure
3 – Import failed
4 – Import preview failed
5 – Invalid archive file
10 – Failed to connect to Information Server domain
11 – Invalid command syntax

=====================================

Now we will see some working examples.

Export Command for Infosphere datastage and qualitystage assets.

Syntax:

istool export
authentication options
[generic options]
-archive “pathname” [-updatearchive]
[-preview ]
[-abortIfError=number_of_errors]
-datastage ‘ [ -base=”server[:port]/project”]
[-includedependent]
[-nodesign]
[-includeexecutable]
“dsServer[:port]/project/folder/name.type” ‘

Asset type

Asset types are identified by a type suffix. Type suffixes are not case-sensitive.

Asset type	Type suffix

Data element	det
IMS database	idb
IMS viewset	ivs
Mainframe job	mjb
Parallel job	pjb
Sequence job	qjb
Server job	sjb
Machine profile	mcp
Mainframe routine	mrt
Parallel routine	prt
Server routine	srt
Parallel shared container	psc
Server shared container	ssc
Table definition	tbd
Transform	tfm
Data quality specification	dqs
Stage type	stp
Data connection	dcn
Parameter set	pst

Examples:

export Parallel job:
istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -datastage ‘ “dsengine/dstage1/Jobs/oracle/Extract.pjb” ‘

Beginning Export
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Elapsed time: 00:00:01
Exported 1 assets

2)Preview export:

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘ “dsengine/dstage1/Jobs/oracle/Extract.pjb” ‘

Beginning Export Preview
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Previewed 1 assets

3)Export category:

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -datastage ‘
“dsengine/dstage1/Jobs/oracle/*.*” ‘

Beginning Export
[1/8] dsengine/dstage1/Jobs/oracle/Extract.pjb
[2/8] dsengine/dstage1/Jobs/oracle/Extract1_Optimized1.pjb
[3/8] dsengine/dstage1/Jobs/oracle/Extract2_Oracle_Connector.pjb
[4/8] dsengine/dstage1/Jobs/oracle/Extract3_Using_DRS_Connector.pjb
[5/8] dsengine/dstage1/Jobs/oracle/Extract4.pjb
[6/8] dsengine/dstage1/Jobs/oracle/Load_Oracle_Amazon_Customers.pjb
[7/8] dsengine/dstage1/Jobs/oracle/Load_Oracle_Amazon_Customers2.pjb
[8/8] dsengine/dstage1/Jobs/oracle/Oracle_Connector_Stage_New_Features.pjb
Elapsed time: 00:00:11
Exported 8 assets

4)using the “-incdep”

istool export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -datastage ‘-incdep “dsengine/dstage1/Jobs/oracle//SrvJob.sjb” ‘

Beginning Export
[1/2] dsengine/dstage1/Jobs/oracle/SrvJob.sjb
[2/2] dsengine/dstage1/Routines/srvTest.srt
Elapsed time: 00:00:01
Exported 2 assets

The disadvantage of using above command line is it will expose password in the command line.

istool -script scriptfile

example of script file:

cat /tmp/export.txt

export -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘
“dsengine/dstage1/Jobs/oracle/*.*” ‘

istool -script /tmp/export.txt

Import command for InfoSphere DataStage and QualityStage assets:

Syntax:

istool import
authentication options
[generic options]
-archive pathname
[-preview|-replace]
[-abortAfter=number_of_errors]
-datastage ‘[-nodesign]”Server/project” ‘

Examples:

The following command previews an import of assets from the file arc.isx

istool import -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -pre -datastage ‘ -nodesign “dsengine/dstage1” ‘

Beginning import preview
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb ignored – already exists. Use -replace to overwrite
Previewed 0 assets

After the preview, the command is repeated without the -preview option to import the assets to the project named dstage1 and also use -replace to import the existing asset.

istool import -domain ishost:9080 -username dsadm -password password -archive “/tmp/arc.isx” -replace -datastage ‘ “dsengine/dstage1” ‘

Beginning import
[1/1] dsengine/dstage1/Jobs/oracle/Extract.pjb
Elapsed time: 00:00:03
Imported 1 assets

Categories: Command Line Interface

dsadmin Command for administering Projects

September 15, 2012 ukatru Leave a comment

dsadmin is the command used for administering projects and has large range of options.

Command Syntax:
dsadmin [-file <file> <domain> <DataStage server> | -domain <domain> -user <user> -password <password> -server <DataStage server>]
<primary command> [<arguments>]

Valid primary command options are:
-createproject
-deleteproject
-oshvisible
-enablercp
-enablejobadmin
-envadd
-envdelete
-envset
-advancedruntime
-basedirectory
-deploymentdirectory
-customdeployment
-listprojects
-listproperties
-listenv
-enablegeneratexml
-protectproject

FileFormat:

Format of the file entry should be like :

domain,server,user,password

and the dsadmin command will be like :

dsadmin -file filename domain server <command arguments>

where domain = Information server domain name

Server=Datastage Enginetier Server name

Creating Project:

dsadmin -file file domain dsengine -createproject ProjectName [-location ProjectLocation] [-copyroles <sourceproject>]

ProjectName is the name of the project.

-location is the project location and is optional.

-copyroles if you want to copy roles from existing project and is optional

If project location is not specified, project is created in the Projects dir in the server install directory.

Ex:

dsadmin -file /tmp/dscredentials.txt isdomain dsengine -createproject dstage2 -location /projects/dstage2
Status code = 0

dsjob -lprojects
dstage2

Status code = 0

/projects>ls -tlr
total 4
drwxrwxr-x 45 dsadm dstage 4096 Sep 15 20:30 dstage2

================

dsadmin -listproperties dstage2
OSHVisible=0
JobAdminEnabled=
RTCPEnabled=
PXAdvRTOptions=
PXRemoteBaseDirectory=
PXDeployJobDirectoryTemplate=
PXDeployCustomAction=
PXDeployGenerateXML=

Status code = 0

Enabling/disabling OSH display

dsadmin -oshvisible TRUE dstage2

Status code = 0

Enabling/disabling runtime column propagation

dsadmin -enablercp TRUE | FALSE ProjectName

ex:

dsadmin -enablercp TRUE dstage2

Enabling/disabling job administration from the Director client

dsadmin -enablejobadmin TRUE | FALSE ProjectName

Example:
dsadmin -enablejobadmin TRUE dstage2

Adding an environment variable

dsadmin can be used for created env variable and is added to the User Defined Category

Syntax:

dsadmin -envadd EnvVarName -type STRING | ENCRYPTED 
-prompt "PromptText" [-value "Value"] ProjectName

Ex:
Now i will add new env called "TEST" to the dstage project.

We use dsadmin -listenv dstage to check if this env is defined or not.

dsadmin -listenv dstage2 | grep TEST

Status code = 0

dsadmin -envadd TEST -type STRING -prompt “TEST ADDING ENV” -value “Test” dstage2

dsadmin -listenv dstage2 | grep TEST
TEST=Test

Deleting an environment variable

dsadmin -envdelete EnvVarName ProjectName

dsadmin -envdelete TEST dstage2

dsadmin -envdelete TEST dstage2

Status code = 0
dsadmin -listenv dstage2 | grep TEST

Status code = 0

Setting the value of an environment variable

Syntax:

dsadmin -envset EnvVarName -value "Value" ProjectName

dsadmin -listenv dstage2 | grep TEST
TEST=Test

Status code = 0

Now we will change the value from Test to Test2

dsadmin -envset TEST -value “Test2” dstage2

dsadmin -listenv dstage2 | grep TEST
TEST=Test2

Listing projects:

dsadmin -listprojects

dstage2

Listing properties

dsadmin -listproperties ProjectName

dsadmin -listproperties dstage2
OSHVisible=1
JobAdminEnabled=1
RTCPEnabled=1
PXAdvRTOptions=
PXRemoteBaseDirectory=
PXDeployJobDirectoryTemplate=
PXDeployCustomAction=
PXDeployGenerateXML=

Status code = 0

Listing environment variables:

dsadmin -listenv ProjectName

EX:
dsadmin -listenv dstage2

Initially I struggled to use -file option in dsadmin and following good Technote from IBM.

On IBM Information Server 8, dsadmin line command with -file option will fail with error message “Failed to locate information for server”
Technote (FAQ)

Question
When trying to use the dsadmin line command with the -file option in order to avoid specifying the user/password in the line command, the dsadmin command will fail with the following error message : “Failed to locate information for server” (Similar issue will occur with dsjob command)
Cause
Incorrect syntax in the file specified for the -file option
Answer
This issue is caused by a missing or incorrect argument in the file, either the domain name or the server name.
Note that the dsadmin command is using a combination of both the domain name and the server name to lookup the connection details in the file.
Format of the file entry should be like :

domain,server,user,password

and the dsadmin command will be like :

dsadmin -file filename domain server <command arguments>

(This is also available for the dsjob command)

We don’t have command line options to setup Sequence compilation options and also to set Auto-purge of job log

If IBM enhance these two options in the command line then we can totally automate datastage project creating using script.

WorkAround:
Create a template project and copy the DSParams file from template project to the original project.

Initially i struggled using -file option and found the following Technote from IBM.

Categories: Command Line Interface

DSXImportService.sh — Infosphere Information Server 8.x

September 15, 2012 ukatru Leave a comment

Command : DSXImportService.sh

This command is located under ASBNode/bin in engine tier
You can import objects from .dsx files into the specified repository

Importing objects from a .dsx file

Syntax:
-ISFile <isFile> [-ISHost <isHost>[:<port>]] [-ISUser <isUser>] [-ISPassword <isPassword>] [-DSHost <dsHost>[:<port>]] -DSProject <dsProject> -DSXFile <dsxFile> [-Overwrite | -OverwriteReadOnly] [-Verbose] [-StopOnError] [<selected_import>]
|
-ISHost <isHost>[:<port>] -ISUser <isUser> -ISPassword <isPassword> [-DSHost <dsHost>[:<port>]] -DSProject <dsProject> -DSXFile <dsxFile> [-Overwrite | -OverwriteReadOnly] [-Verbose] [-StopOnError] [<selected_import>]
|
-List -DSXFile <dsxFile>

-ISFile isFile
Specifies the file name that contains the connection details. Using this option provides a level of security by hiding the login details from view. If you use this option, you do not have to provide the connection details on the command line. If any connection details are specified on the command line, however,
they override those defined within the file.

Example isFile:

cat isfile.txt

-ISHost isHost
-ISUser dsadm
-ISPassword password
Listing the contents of a .dsx file

DSXImportService.sh -List -DSXFile /tmp/Null_Handling_Test.dsx

LISTING CONTENTS OF DSX FILE:

*** Job ***
Design item = Null_Handling_Test

Total items = 1
Listed file = /tmp/Null_Handling_Test.dsx
DSXImportService.sh -ISFile /home/dsadm/isfile.txt -DSHost dshost -DSProject dstage1 -DSXFile /tmp/Null_Handling_Test.dsx -Overwrite -Verbose

Server import initiated.
Attempting authentication……
Authentication successful..
Processing Jobs….
All EXECJOBS will be ignored due to the Server Version 7.5.2 being too old..
Completing Import
Import completed.
Design item imported successfully. Item Type: Job. Identifier: Null_Handling_Test.

IS Host = ishost
IS Port = 9080
IS User = dsadm
DataStage Project = dstage1
Imported file = /tmp/Null_Handling_Test.dsx
Total items = 1
Items imported = 1
Items not imported = 0
Total import time = 00:00:08

selected_import

You can specify options here to import selected objects from a .dsx file. You specify the object type and the object name as specified in the following table. You can specify a full name or an abbreviated name for the object type.
Table 1. Selected import options Abbreviated option Full option Object type

-JB -JOB job
-EJ -EXECUTABLEJOB job executable
-DE -DATAELEMENT data element
-TD -TABLEDEFINITION table definition
-ST -STAGETYPE stage type
-TR -TRANSFORM transform
-RT -ROUTINE routine
-ID -IMSDATABASE IMS database
-IV -IMSVIEWSET IMS viewset
-MP -MACHINEPROFILE machine profiles
-SC -SHAREDCONTAINER shared container
-QR -QSRULEASSEMBLY QualityStage rule set
-PS -PARAMETERSET parameter set
-DC -DATACONNECTION data connection

If we have multiple objects in .dsx file and want to imprt selected one ,use the above specified options.

First list the contents of dsx file.

DSXImportService.sh -List -DSXFile /tmp/Null_Handling_Test.dsx

LISTING CONTENTS OF DSX FILE

*** Routine ***
Design item = JobControl
Runtime item = DSU.JobControl

*** Shared Container ***
Design item = PtlDeriveSegment

*** Job ***
Design item = Null_Handling_Test

Total items = 4
Listed file = /tmp/Null_Handling_Test.dsx

If you want to import the shared container only from the dsx file then use the following syntax

DSXImportService.sh -ISFile /home/dsadm/isfile.txt -DSHost dshost -DSProject dstage1 -DSXFile /tmp/Null_Handling_Test.dsx -Overwrite -Verbose -SC PtlDeriveSegment

Server import initiated.
Attempting authentication…….
Authentication successful…
Processing Shared Containers………
All EXECJOBS will be ignored due to the Server Version 7.5.2 being too old.
Completing Import.
Import completed.
Design item imported successfully. Item Type: Shared Container. Identifier: PtlDeriveSegment.

Categories: Command Line Interface

JR42922: PROBLEMS WITH TRANSFORMER NULL TESTING EXPRESSIONS WHEN RUNNING WITH APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set

September 15, 2012 ukatru Leave a comment

Error description

nfoSphere Information Server 8.5 changed how the parallel engine
Transformer stage handles columns that contain null values.
These changes were described in Tech Note 1514921.

After InfoSphere Information Server 8.5 shipped,  issues in the
default null handling behavior of Version 8.5, and problems with
the implementation of the InfoSphere Information Server 8.1
compatibility mode  were discovered .
Jobs that process nulls in Transformer stages show different
behavior when migrated from 8.1 to 8.5, even when legacy null
handling (APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING ) is set.

In IS 8.1 Nulls can be converted to other values using any of
the following 3 functions
￺ NullToEmpty()
￺ NullToZero()
￺ NullToValue()

Even with  APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  set, in IS
8.5 these functions do not work the same way they did in
InfoSphere Information Server 8.1 .  In particular, without this
fix some rows which were not rejected in IS 8.1  were being
rejected in IS 8.5
The fix for this APAR makes the treatment of rows containing
NULL columns the same in IS 8.5 and IS 8.1, when
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set. .

NOTE:S
1. The patch for this APAR also includes the fix for APAR
JR40637.
APAR JR40637 concerns problems in transformer compilation  when
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.

2. This APAR replaces  APAR JR42336.

Local fix

Problem summary

 After InfoSphere Information Server 8.5 shipped,  issues in the
default null handling behavior of Version 8.5, and problems with
the implementation of the InfoSphere Information Server 8.1
compatibility mode  were discovered .
Jobs that process nulls in Transformer stages show different
behavior when migrated from 8.1 to 8.5, even when legacy null
handling (APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING ) is set.

In IS 8.1 Nulls can be converted to other values using any of
the following 3 functions
ú   NullToEmpty()
ú NullToZero()
ú NullToValue()

Even with  APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  set, in IS
8.5 these functions do not work the same way they did in
InfoSphere Information Server 8.1 .  In particular, without this
fix some rows which were not rejected in IS 8.1  were being
rejected in IS 8.5
The fix for this APAR makes the treatment of rows containing
NULL columns the same in IS 8.5 and IS 8.1, when
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is set. .

NOTES
1. The patch for this APAR also included the fix for APAR
JR40637.
APAR JR40637 concerns problems in transformer compilation  when
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.
2. This APAR replaces  APAR JR42336.

Problem conclusion

  Install the patch associated with this APAR
NOTES
1. The patch for this APAR also included the fix for APAR
JR40637.
APAR JR40637 concerns problems in transformer compilation  when
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING  is not set.
2. This APAR replaces  APAR JR42336.

Categories: InfosphereDatastage-Installation

Issues with NULL values in input columns in Parallel Engine Transformer stage

September 13, 2012 ukatru Leave a comment

Problem(Abstract)

InfoSphere Information Server 8.5 changed how the parallel engine Transformer stage handles columns that contain null values. These changes were described in Tech Note 1514921.

Since InfoSphere Information Server 8.5 shipped, issues in the default null handling behavior of Version 8.5, and problems with the implementation of the InfoSphere Information Server 8.1 compatibility mode were discovered . This note describes the issue and what to do about it.

Symptom

Jobs that process nulls in Transformer stages show different behavior when migrated from 8.1 to 8.5, even when legacy null handling is set.

Resolving the problem

Quick Overview:

In InfoSphere Information Server version 8.1 and prior versions, the job design had to explicitly handle null column values in the Transformer stage. If the Parallel Engine encountered null column values outside of specific contexts, the entire row containing the null was dropped, or sent to a reject link if the Transformer stage had a reject link.
(NOTE: This note refers to SQL value NULL, not the character with value 0x00, and not an empty string.)
Customers complained that the need for explicit null handling made Transformer stage coding too complex and allowed inconsistent behavior. So, starting in InfoSphere Information Server version 8.5 the default behaviors were changed and explicit null handling was no longer required. It was recognized that some customers would want to retain the original null-handling behavior so an environment variable, APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING, was introduced. The environment variable, when defined, preserves compatibility with the behavior of pre-version 8.5 InfoSphere Information Server.
These changes were described in Tech Note 1514921. Since version 8.5 shipped, differences in the default null handling behavior of Version 8.5 and problems with the implementation of the InfoSphere Information Server 8.1 compatibility mode have been discovered .

There have been issues with null handling in InfoSphere Information Server 8.5 (with backward compatibility enabled), InfoSphere Information Server 8.1, and earlier versions. Most of these issues were due to lack of clear explanation about how null values should be handled in the Transformer stage.

This Tech Note is a more detailed description about how the null handling should be done in InfoSphere Information Server 8.5 (with backward compatibility mode set), 8.1 and earlier versions. This document also describes an inconsistency in the behavior of version 8.5 (with backward compatibility mode set) compared to the behavior of version 8.1.

1. InfoSphere Information Server 8.1 NULL handling: Explicit handling required.
If you use input columns in an output column expression, a NULL value in that input column will cause the row to be dropped, or rejected if a reject link is defined.

This applies where:
· An input column is used in an output column derivation expression
(for example, an expression like ?DSLink4.col1 + 1?).
· An input column is used in an output column constraint.
· An input column is used in a stage variable derivation.

It does not apply where an output column is mapped directly from an input column, with a straight assignment expression.

In InfoSphere Information Server 8.1, there are 3 legal ways to explicitly handle NULL values in input columns in the Transformer stage.

a. Conversion of NULL values:
Nulls can be converted to other values using any of the following 3 functions
· NullToEmpty()
· NullToZero()
· NullToValue()

Example :

DSLINK3.OrderCount will be dropped or sent to the reject link

This expression can be changed to
NullToZero(DSLink3.OrderCount) + 1 –> If DSLink3.OrderCount is NULL, the target field will be assigned the integer value 1 .

b. Check for NULL value:

To test if a value is NULL in a logical expression, use one of these two functions.
· IsNotNull()
· IsNull()

Example:
DSLink3.OrderCount + 1
–> If DSLink3.OrderCount is NULL, record will be dropped or rejected.
This expression can be changed to:
If(IsNotNULL(DSLink3.OrderCount) Then DSLink3.OrderCount + 1 Else 1
–> If DSLink3.OrderCount is NULL, the target field will be the integer 1 .

Each nullable column in a given expression needs to be properly NULL checked or the NULL value needs to be converted to a concrete value.

c. IF-ELSE operations on NULL values:

Handling NULL values in IF-ELSE conditions can be complex. Consider the following examples to get familiar with using NULL checks in IF-ELSE statements.

Example 1: Simple IF ELSE statement
If (DSLink1.Col1 > 0) Then xxx Else yyy
In InfoSphere Information Server 8.5 code will be generated to drop records in case DSLink1.Col1 is NULL.

This needs to be written as

If (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 > 0)) Then xxx Else yyy
or
If (IsNull(DSLink1.Col1) or (DSLink1.Col1 > 0)) Then xxx Else yyy
based on the business requirement on how the NULL case should be treated

It can also be written as
If (NullToZero(DSLink1.Col1) > 0) Then xxx Else yyy
or
If (NullToZero(DSLink1.Col1) >= 0) Then xxx Else yyy
again based on the business requirement .

NUllToValue() or NullToEmpty() also can be used instead of NullToZero(), based on the requirement.

NULL conversion functions should not be used for checking a NULL
Example: If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy

NullToZero(DSLink1.Col1) = 0 is not considered a NULL check, code will be generated to drop records in case DSLink1.Col1 is NULL.

Example 2: IF ELSE statement with multiple conditions
If ((DSLink1.Col1 = 5) or (DSLink1.Col2 = 8)) Then xxx Else yyy –> Col2 is a nullable field

This cannot be written as
If (DSLink1.Col1 = 5 or IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8)) Then xxx Else yyy
This is not the proper way of checking.
If(condition1 or condition2 and condition3) will be treated as If((condition1 or condition2) and condition3))

It should be written as
If (DSLink1.Col1 = 5 or (IsNotNull(DSLink1.Col2) and (DSLink1.Col2 = 8))) Then xxx Else yyy

The condition which contains the nullable column should be properly (order should be clearly specified using parentheses where ever needed) pre-“AND”ed with IsNotNull() check or pre-“OR”ed with IsNull() check.

Example 3: IF ELSE statement in which the nullable field is used multiple times
If ((DSLink1.Col1 = 3) or (DSLink1.Col1 = 5)) Then xxx Else yyy
Records will be dropped in case Col1 is NULL.

This cannot be written as
If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (DSLink1.Col1 = 5)) Then xxx Else yyy

This should be written as
If ((IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 3)) or (IsNotNull(DSLink1.Col1) and (DSLink1.Col1 = 5)) Then xxx Else yyy

Each instance of nullable field needs to be pre-“AND”ed or pre-“OR”ed with NULL check

Example 4: Using 2 nullable columns in a condition.

If (DSLink1.Col1 = DSLink1.Col1) Then xxx Else yyy

Both the columns need to be NULL checked or NULL conversion functions should be used on both the columns

If (IsNotNull(DSLink1.Col1) and (IsNotNull(DSLink1.Col2) and (DSLink1.Col1 = DSLink1.Col1))) Then xxx Else yyy

2.InfoSphere Information Server 8.5 NULL handling: Explicit handling not required.

In brief, in InfoSphere Information Server 8.5, NULL value in that input column will NOT cause the row to be dropped nor will it be sent to reject link.
A NULL value in that input column will be handled by the Transformer stage, following specific logic. The job designer can skip explicit NULL handling.
Technote 1514921 has further details.
The environment variable APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING can be used, in case the designer wants to have 8.1 behavior in 8.5.
Enabling old NULL handling can be done at 3 different stages

1. Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at project level.

2. Setting APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING at job level.

3. Checking “Legacy null processing” in the DataStage Designer for individual Transformer stages in a given job.

IBM has recently discovered a previously undocumented difference in behavior of InfoSphere Information Server 8.1 and InfoSphere Information Server 8.5 with old NULL handling enabled.
InfoSphere Information Server 8.5 allowed the three “NullToxxxx()” functions to be used as Null tests.

Example:
In InfoSphere Information Server 8.1 the following IF-ELSE condition

If ((NullToZero(DSLink1.Col1) = 0) or (DSLink1.Col1 > 0)) Then xxx Else yyy

“(NullToZero(DSLink1.Col1) = 0” was considered as a NULL check and records were not dropped or sent to reject link. This is due to inconsistency in the code.

In InfoSphere Information Server 8.5 this code inconsistency was eliminated and only IsNull() and IsNotNull() can be used as Null checks. InfoSphere Information Server 8.1 Jobs which used NullToZer0(), NulltoValue(), or NulltoEmpty() for null checking must be changed to use IsNull() or IsNotNull().

Note:
!IsNull() can be used instead of IsNotNull()
!IsNotNull() can be used instead of IsNull()
in NULL checking.

Categories: InfosphereDatastage-Installation

Important Technotes — IBM Infosphere Information Server 8.5/8.7

September 12, 2012 ukatru Leave a comment

Determine PXEngine Version for IBM InfoSphere DataStage

Question

How do you determine the exact version of DataStage installed based on the PXEngine?

Answer

Perform the following–directory path may be different based on install location

cd /opt/IBM/InformationServer/Server/DataStage/PXEngine/lib
and run the following script:

for i in *
do
strings $i | grep \$Version
done

Example output:

Release 8.5
$Version: AIX64 Torrent 2_3 2010/09/20 23:45:58 GMT aixbld05:/builds/IS_8_5_RC_1_T1/PXEngine.src/orch_master/orchestrate//builds/IS_8_5_RC_1
_T1/PXEngine.src/orch_master/Libs/liborchaix64.a root Exp $

Release 7.5.2
$Version: AIX3 Torrent 2_3 2006/03/24 05:00:47 GMT aixbld01:/builds/PXEngine.src/orch_master/orchestrate//builds/PXEngine.src/
orch_master/Libs/liborchaix3.a root Exp $
$Version: AIX3 Torrent 2_3 2007/05/25 18:16:45 GMT aixbld01:/tmp_mnt/apt/aixrelease/patches/px752-102719.752.1/orch_master/ase
sybase//tmp_mnt/apt/aixrelease/patches/px752-102719.752.1/orch_master/Libs/liborchasesybaseaix3.a syamalat Exp $

Note that other methods include review of /opt/IBM/InformationServer/Version.xml and /opt/IBM/InformationServer/Server/DataStage/.dsrel file, however, these methods do not provide the in depth information you get from running the script.

Datastage server job using Oracle connector errors with DataStage Phantom Aborting with @ABORT.CODE = 3

Problem(Abstract)

In a server job that is using the Oracle Connector stage, the following error occurs:

DataStage Job xxxx Phantom yyyy
Program “JOB.1875607158.DT.1629731972.TRANS1”: Line nn, Exception
raised in GCI subroutine:
Access violation.
Attempting to Cleanup after ABORT was raised in stage
CopyOfOp_Region_426_Org_Hier_Outbound2..Xfm_Op_Region
DataStage Phantom Aborting with @ABORT.CODE = 3

Resolving the problem

To resolve the issue:

If the Connector stage is used, one will need to do the following when loading the columns:

Use Import Table Definitions
Use Start Connector Import Wizard to bring in the columns

This will ensure that the data type and length are correctly defined.

How can I use external C++ code with DataStage as a transformer function (PX Routine)

Question

How can I use external C++ code with DataStage as a transformer function (Parallel Job Transform Routine – see as PX Routine)

Answer

PX Routine Tips and Tricks

The PX Routine supports two types of external object linking.

Object Type:
Choose Library or Object. This specifies how the C function is linked in the job. If you choose Library, the function is not linked into the job and you must ensure that the shared library is available at run time. For the Library invocation method the routine must be provided in a shared library rather than an object file. If you choose Object the function is linked into the job, and so does not need to be available at run time. Note that, if you use the Object option, and subsequently update the function, the job will need to be recompiled to pick up the update. <From 7.5.2 help screen>

Notes:
-Some compilers require that the source code extension be “C” not “c”. “C” depicts a c++ compile which is required for linking into DataStage.
-Make sure you are using the SAME compiler and options to compile your code that are defined in the administrator in APT_COMPILER/APT_COMPILEOPT and APT_LINKER/APT_LINKOPT, this should be the native compiler and options set by the installer.

This first example was done on RHEL 4 OS and dsenv was sourced to set environment to be DataStage aware.

Steps to use Object code: (Simplest)

1. Compile the external C++ code with -c option:

g++ -c myTest.C -o myTest.o

2. Add a new PX Routine in Designer.
-Routine Name: This is the name used in the Transformer stage to call your function
-Select Object Type
-External subroutine name: This is the actual function name in the C++ code
-Put the full path of the object in the routine definition
-Return Type: Match this datatype to the actual return type of your C++ function
-Arguments: create any arguments that are required by your external C++ function

3. Create a job with a transformer that calls your routine , Compile the job and run.

Steps to use Library option: (more complex but allows for linking in other libraries)

1. Compile the code with the -shared option:

g++ -shared myTest.C -o libmyTest.so ( notice the library must begin with “lib” )

2. Same as Step 2 From Object above, except:
-Select library option
-use the new libmyTest.so for library name.
-you are required to put the new shared object (libmyTest.so) in a directory in the Library Path:
LD_LIBRARY_PATH or LIBPATH or SHLIBPATH depending on your OS.

3. Compile the job and run

Example Code….

int my_funct(int x)

{
return x+1;
}

$ g++ -c myTest.C -o myTest.o

$ g++ -shared myTest.C -o libmyTest.so

-rwxrwxr-x 1 dsadm dsadm 4477 Feb 17 14:40 libmyTest.so

-rw-rw-r– 1 dsadm dsadm 699 Feb 17 14:40 myTest.o

Notice the size and permission difference.

For Solaris you are required to link to get the shared object….

$ /opt/SUNWspro/bin/CC -dalign -O -PIC -library=iostream -c myTest.C -o myTest.o
$ /opt/SUNWspro/bin/CC -G myTest.o -o libmyTest.so

First command gets you the object file.
Second gets you the shared object.

$ ls -l libmyTest.so myTest.o
-rwxr-xr-x 1 dsadm dstage 4064 Feb 17 17:54 libmyTest.so
rw-rr- 1 dsadm dstage 820 Feb 17 17:54 myTest.o

Advanced:

You can link to other libraries to open up the functionality of your routine to call internal DataStage functionality.

For example to include the functionality of dsapi.h you can use the “library” method listed above except in the linking step you include the required library.

This example was compiled on a Solaris OS.

Example Code: (myProjects.C)

#############CUT#######################
#include<stdio.h>
#include<dsapi.h>

char* projects()
{
char* prlist;
prlist=DSGetProjectList();
return prlist;
}
############PASTE#########################

/opt/SUNWspro/bin/CC -I/opt/IBM/InformationServer/Server/DSEngine/include -dalign -O -PIC -library=iostream -c myProjects.C -o myProjects.o
/opt/SUNWspro/bin/CC -L/opt/IBM/InformationServer/Server/DSEngine/include -lvmdsapi -G myProjects.o -o libmyProjects.so

Copy libmyProjects.so to a path in your library path:

cp libmyProjects.so $APT_ORCHHOME/user_lib <note: you may have to create this directory>

You will need sufficient skills in compiling and linking for your environment. This document is meant as a guide to get you started in understanding the requirements external to DataStage or Information Server products.

Why would an expired Datastage userid be able to log in to the DataStage server but not to the UNIX system?

Problem(Abstract)

The DataStage userid is expired on the UNIX system. This userid is a LDAP userid. This userid cannot log in to the UNIX system but is able to log in to the DataStage Designer..

Cause

The issue is due to the AUTHORIZATION variable being set to 0 in the uvconfig file. Being set to 0 bypasses the password authorization code.

The AUTHORIZATION variable in the uvconfig file deals with whether the UNIX password is authorized.

Valid values are the following:

0 = the default value, password authorization is not done.
1 = password authorization is done.

When a client connects to the DataStage server (i.e. dsrpcd process in DataStage engine) with mapped OS credentials, the normal behavior for the DataStage engine would be to do password verification (i.e. check whether the mapped OS credentials matches with that of the UNIX user credentials).
Password verification is done for all the valid values of AUTHORIZATION.
When the AUTHORIZATION variable is set to 1, it specifically checks for the expired password.

Diagnosing the problem

To diagnose the issue:

Log in as the DataStage admin user (eg. dsadm)
Go to the /../InformationServer/Server/DSEngine directory
Source the dsenv file (ie. . ./dsenv)
From $DSHOME, run the command, bin/smat -t
Look for the setting called AUTHORIZATION and its value:

AUTHORIZATION =

Resolving the problem

To resolve the issue, perform the following steps:

In $DSHOME, issue the command, bin/uv -admin -stop (this will stop the DataStage engine)
In $DSHOME, make a copy of the uvconfig file
Edit the uvconfig file and modify AUTHORIZATION to 1
Save the changes made to the uvconfig file
Issue the command, bin/uvregen
Issue the command, bin/uv -admin -start (to start the DataStage engine)

DataStage Parallel framework changes may require DataStage job modifications

Problem(Abstract)

This technote documents changes made in the DataStage parallel framework which may require job modifications when DataStage jobs are upgraded from earlier releases.

Resolving the problem

IBM tries to avoid making code changes that require customers to modify their existing DataStage jobs. However, sometimes it is necessary to make such changes in order to introduce new features or to fix errors. This technote documents areas where changes have taken place in DataStage releases which may require customers to make changes to jobs that were created in earlier versions.

Partitioning and Sort Insertion

Information Server releases affected: 8.0.1 Fix Pack 1 and higher, 8.1 GA and higher, 8.5 GA

For any stage that requires data to be hash partitioned and sorted (such as Join, Merge, Difference, Compare, ChangeCapture, ChangeApply) the parallel framework automatically inserts a hash partitioner and a sort on each input link to ensure input data is partitioned and sorted properly. Prior to Information Server 8.0.1 Fix Pack 1, if the Preserve Partitioning flag was set on the input link, the parallel framework would not automatically insert the partitioner or sort. Not re-partitioning or re-sorting could result in unexpected results because input data might be partitioned and sorted using different keys from those specified by the stage.

To avoid this problem, the parallel framework was changed in Information Server 8.0.1 Fix Pack 1 so that a hash partitioner and sort would automatically be inserted even in the presence of a (framework inserted) Preserve Partitioning flag, but not in the case of user-specified partitioning, as the latter takes higher precedence. However, the problem could still occur if the user-specified partitioning and sort keys don’t match those required by the stage. Here are some example scenarios that may experience problems as a result of this change:

A Join stage has two keys “A” and “B”. The user explicitly specifies a hash partitioning method and inserts a sort stage on the producing side of the primary link. The hash key is “A”, and the sort keys are “A” and “B”. Input data of the reference link has been partitioned or sorted upstream or in another job. The partitioning method on both the primary link and the reference link of Join is set to Auto. When the parallel framework analyzes partitioning and sort requirements at job startup time, it inserts hash and tsort stages on the reference link using the same two keys as specified by Join, and keeps what the user has defined on the primary link. This can cause data to be distributed to wrong partitions.
A Join stage has one key. The user explicitly specifies a hash partitioning method and inserts a sort stage upstream of the primary link. The hash and sort key is “A” with the case-sensitive property. Input data of the reference link is not pre-partitioned and pre-sorted. The partitioning method on both the primary link and the reference link of Join is set to Auto. When the parallel framework analyzes partitioning and sort requirements at job startup time, it inserts a hash partitioner on both links and the hash key does not have the case-sensitive property. The framework also inserts a tsort on the reference link, but not on the primary link because data has already been sorted. This can break the sort order of input data on the primary link.
A sequential stage or a parallel stage running in a sequential mode will produce this warning message if its producing stage is hash partitioned: “Sequential operator cannot preserve the partitioning of the parallel data set on input port 0. “

These issues can be worked around by setting the environment job parametersAPT_NO_PART_INSERTION=True and APT_NO_SORT_INSERTION=True and then modifying the job to ensure that the partitioning and sorting requirements are met by explicit insertion.

Default Decimal Separator

Information Server releases affected: 8.0.1 Fix Pack 1 and higher, 8.1 Fix Pack 1 and higher, 8.5 GA

Prior to Information Server Version 8.0.1 Fix Pack 1, the default decimal separator specified via Job Properties->Defaults was not recognized by the APT_Decimal class in the parallel framework. This caused problems for the DB2 API stage where decimals with a comma decimal point could not be processed correctly. This issue was fixed in release 8.0.1 Fix Pack 1, as APAR JR31597. The default decimal separator can be specified via a job parameter (e.g. #SEPARATOR#). However, if the job parameter does not contain any value, '#' will be taken as the decimal separator. This can cause the following error if the actual decimal separator is not '#':

"Fatal Error: APT_Decimal::assignFromString: invalid format for the source string."

If you encounter this problem after upgrading, please make sure the job parameter representing the default decimal separator contains the actual decimal separator character used by input data. If changing the job parameter is not an option, you can set the environment variableAPT_FORCE_DECIMAL_SEPARATOR. The value of APT_FORCE_DECIMAL_SEPARATOR overrides the value set for the “Decimal separator” property. If more than 1 character is set for this environment variable, the decimal separator will default to a dot character, '.'

Embedded Nulls in Unicode Strings

Information Server releases affected: 8.1 Fix Pack 1 and higher, 8.5 GA

Prior to Information Server 8.1 Fix Pack 1, nulls embedded in Unicode strings were not treated as data, but rather they were treated as string terminators. This caused data after the first null to be truncated. The issue was fixed in Fix Pack 1, as APAR JR33408 for Unicode strings that were converted to or from UTF-8 strings. As a result of this change, you may observe a change in job behavior where a bounded-length string is padded with trailing nulls. These extra nulls can change the comparison result of two string fields, generate duplicate records, make data conversion fail, etc depending on the job logic. To solve this problem, the job should be modified to set APT_STRING_PADCHAR=0x20 and call Trim() in transformer stage if needed.

Null Handling at column level

Information Server releases affected: 8.1 GA and higher, 8.5 GA

In parallel jobs, nullability is checked at runtime. It is possible for the user to set a column as nullable in the DataStage Designer, but at runtime the column is actually mapped as non-nullable (to match the actual database table) for example. Prior to 8.1 GA the parallel framework issued a warning for this mismatch, but the job would potentially crash with a segmentation violation as a result. The warning was changed to a fatal error in 8.1 GA as ECASE 124987 to prevent the job from aborting with SIGSEGV. After this change, jobs that used to run with this warning present will now abort with a fatal error. For an example, this problem is often seen in the lookup stage. To solve the problem, modify the job to make sure the nullability of each input field of the lookup stage matches the nullability of the same output field of the stage which is upstream to the lookup.

Transformer Stage: Run-Time Column Propagation (RCP)

DataStage releases affected: 7.5 and higher

Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA

When RCP is enabled at any DataStage 7.x release prior to 7.5, for an input field “A” which is mapped to an output field “B”, both “A” and “B” are present in the output record. Starting with DataStage 7.5, it appears that “A” is simply being renamed to “B”, so that only “B” appears in the output. In order to improve transform performance, a straight assignment like “B=A” is considered as renaming “A” to “B”. Prior to the change, the straight assignment was considered as creating an additional field by copying “A” to “B”. With this change in place, the user now needs to explicitly specify both “A” and “B” in the output schema in order to prevent “A” from being renamed to “B” and to create a new field “B”. Refer to the following Transformer stage screen-shot that shows how to ensure that both values are propagated to the output link.

Transformer Stage: Decimal Assignment

Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA

The parallel framework used to issue a warning if the target decimal had smaller precision and scale than the source decimal. The warning was changed to an error in Information Server 8.0 GA, and as a result the input record will be dropped if a reject link is not present. This behavior change was necessary to catch the error earlier to avoid data corruption. The user should modify the job to make sure the target decimal is big enough to hold the decimal value. Alternatively, the user can add a reject link to prevent records from being dropped.

Important: This change in behavior does not apply to any Linux platforms (Redhat, Suse or zLinux.) The parallel framework does not enable exception handling on Linux platforms, so the behavior remains the same as it was prior to 8.0 GA.

Transformer Stage: Data Conversion

Information Server releases affected: 8.0 GA and higher, 8.1 GA and higher, 8.5 GA

Prior to Information Server 8.0 GA, an invalid data conversion in the transformer would result in the following behavior:

A warning message is issued to the DataStage job log
A default value was assigned to the destination field according to its data type
The record was written to the output link.
If a reject link was present, nothing was sent to the reject link.

The behavior has changed in the 8.0 GA release when a reject link is present. Instead of the record being written to the output link with a default value, it will be written to the reject link instead. This may lead to data loss if the job is expecting those records to be passed through to the output. To get to the original behavior of passing the records through, the job would need to be modified to remove the reject link.

Note: there is an environment variable which was added along with this change, to add the capability of aborting the job. To use this option, ensure that there is no reject link and then set the environment variable APT_TRANSFORM_ABORT_ON_CONVERSION_ERROR=True. The job will now abort from an invalid data conversion scenario.

Surrogate Key Generator

Information Server releases affected: 8.0.1 Fix Pack1 and higher, 8.1 Fix Pack 1 and higher, 8.5 GA

The surrogate key stage reserves keys in blocks. Prior to Information Server 8.1 Fix Pack 1, if only one record (suppose it was value 5, because an initial value was set) was generated, the surrogate key generator would use values beginning with 6 and greater as available keys for incoming records. The surrogate key generator was changed in 8.1 Fix Pack 1, as APAR JR29667. With this change, DataStage will now consider values 1 to 4 as well as any value 6 and greater as available keys. This behavior change may cause the SCD stage to produce incorrect results in the database or generate the wrong surrogate keys for the new records of the dimension. If required, the job can be modified to revert back to the old behavior (start generating keys from the highest key value last used) by setting option ‘Generate Key From Last Highest Value’ to Yes. This approach however may result in gaps in used keys. It is recommended that the user understand how the key file is initialized and decide if it is necessary to modify job based on business logic.

Sequential File Format on Windows

Information Server releases affected: (Windows Platforms) 8.1 GA and higher, 8.5 GA

Prior to Information Server 8.1 GA, the default format for sequential files was Unix format which requires a newline character as the delimiter of a record. The default format for the Sequential File stage was changed to Windows format in the Information Server 8.1 GA release. Due to this change, data files previously created with UNIX format will not import properly. To solve this issue, set the environment variable APT_USE_CRLF=FALSE at the DataStage project level or within the system environment variables (requires a Windows reboot).

/tmp filled with large DYNLUT* files during DataStage job runs

Problem(Abstract)

When running DataStage jobs, the /tmp system dierctory becomes full with large files having names that begin with dynlut*

Cause

DYNLUT* files are are dynamic lookup table files created by the parallel engine lookup operator. DataStage writes these files to /tmp directory unless environment variable TMPDIR is defined with an alternate location.

Resolving the problem

If the /tmp directory is small, consider increasing the space allocation.

If you need to redirect the DYNLUT* files (and other DataStage temporary files) to another directory, set the TMPDIR environment variable to new location in the dsenv script in the DSEngine directory. After making this change be sure to restart both the DataStage engine and also the node agents to ensure all components look for files in the new location.

Also, note that while the DataStage parallel engine primarily uses the TMPDIR environment variable to determine the location of temporary files, other variables may also need to be set if you wish to redirect all DataStage temporary files currently going to the /tmp directory. Refer to the following technote for additional environment variables to set:
http://www-01.ibm.com/support/docview.wss?uid=swg21423485

IBM InfoSphere DataStage Compare Stage: Fatal Error: Attempt to Drop Schema Variable

Problem(Abstract)

Error produced by compare stage: Message Id: IIS-DSEE-TFIP-00099: Message: main_program: Fatal Error: Attempt to drop schema variable: “first.rec”

Cause

The Compare stage does not change the table definition, partitioning, or content of the records in either input data set. It transfers both data sets intact to a single output data set generated by the stage. The comparison results are also recorded in the output data set.

Resolving the problem

Use runtime column propagation in this stage and allow DataStage to define the output column schema.

The compare stage outputs a data set with three columns:

result – Carries the code giving the result of the comparison.
first – A sub-record containing the columns of the first input link.
second – A sub-record containing the columns of the second input link.

If the output link metadata is specified, it is necessary to fully qualify names for the column definitions (e.g. first.col1, second.col1 etc.), because DataStage will not allow the specification of two lots of identical column names.

If it is desired to specify the output link metadata manually, the input below is needed. Also, it needs to be considered the fact that Compare stage output takes a specific schema and it cannot be altered and also that first and second columns are of the type sub-record.

Or select the “Runtime column propagation” option.

NOTE: This option can only be seen if the below option (Enable Runtime Column Propagation for Parallel Jobs) is selected in the Administrator client. Restart the Designer client once this option is selected in the Administrator client.

Characters conversion issue when importing an InfoSphere DataStage job containing a Connector type stage with the Designer

Problem(Abstract)

Attempts to import a dsx file with a job including an Oracle Connector with IBM InfoSphere DataStage Designer can result in the following error message type :
“Unicode character ‘6D75’ appears not to be supported by code page 1252. ignoring it …”

Cause

The .dsx file includes the following passage:
BEGIN DSSUBRECORD
Name "Password"
Value "\(6D75)"
END DSSUBRECORDinstead of BEGIN DSSUBRECORD
Name "Password"
Value "/Connection/Password"
END DSSUBRECORD
The source of this behaviour is explained in APAR JR39074.
The string “\(6D75” is read and interpreted as if it was Unicode by the “Import” function of the Designer.

Resolving the problem

Apply the patch JR39074.

This patch concerns the Connector stage Editor.
Bad property value will need to be fixed by re-editing the Password property in the Stage Editor.

There are however a number of work-arounds to transfer a job edited before the patch installation:

1) Acknowledge the message box and manually re-enter the password through the stage editor after the import has completed.

2) Perform the job transfer with istool via an isx file. This uses a Java mechanism that does not have the same restriction.

3) Import the dsx file on the server using ASBNode/bin/DSXImportService.sh. This is another Java written tool that allows dsx files to be imported without the use of the client side Visual Basic code.

4) Update the .dsx file with the “-NOCHECK” suffix for the “CharacterSet” property:

BEGIN HEADER
CharacterSet "CP1252-NOCHECK"

DataStage jobs abort due to Internal data error in SYS.MESSAGE file

Problem(Abstract)

DataStage jobs abort because the SYS.MESSAGE file has become corrupted.
The error in the job log may contain the words “Internal data error” and “Computed blink”.

Symptom

You see a message like this in the DataStage job log:

Internal data error.
File ‘/opt/IBM/InformationServer/Server/DSEngine/SYS.MESSAGE’:
Computed blink of 0x53747167 does not match expected blink of 0x0!
Detected within group starting at address 0x45000!

Cause

The SYS.MESSAGE file may become corrupted because a file system ran out of space or a DataStage process crashed.

Resolving the problem

The SYS.MESSAGE file does not change once installed. You can restore it from a backup, even if the backup is not current.

Also, the file is not unique to a particular machine, so you can take the SYS.MESSAGE file from another DataStage server running the same version of DataStage.

The SYS.MESSAGE file resides in the DSEngine directory.

DataStage project creation problem

Problem(Abstract)

DataStage project creation works using root user and fails using dsadm user

Symptom

When creating a new project using the DataStage Administrator client using the user dsadm. The following error is reported:

Error calling subroutine: *DataStage*DSR.ADMIN (Action=20); check DataStage is set up correctly in project UV (subroutine failed to complete successfully (30107)

This is error is reported during the Setting SQL Permissions part of the project creation.

This same error is NOT reported if using the root user to create the project.

Resolving the problem

This problem can occur if there is an issue with the UV_USERS file. Try to run the following steps to resize the UV_USERS file:

NOTE: Suggest that you take a backup of the UV_USERS file prior to running the below steps

>cd $DSHOME
>bin/uv (or bin/uvsh)
>LOGTO UV
>RESIZE UV_USERS 30
>QUIT

Using the ps command to check for memory utilization for DataStage processes on AIX

Question

How can the ps command be used to check on the memory utilization for running DataStage processes?

Answer

The following processes are started on the DataStage Engine server as follows:

dsapi_slave – server side process for DataStage clients like Designer

osh – Parallel Engine process

DSD.StageRun – Server Engine Stage process

DSD.RUN – DataStage supervisor process used to initialize Parallel Engine and Server Engine jobs. There is one DSD.RUN process for each active job

If a memory leak is suspected, a command like the following can be invoked periodically during a task’s execution to check on memory utilization:
ps auxw | head -1;ps auxw | grep dsapi_slave | sort -r -n +5 | head -10

The example shown lists the top 10 dsapi_slave processes from a memory utilization perspective. You should substitute an appropriate argument for grep like osh, DSD.RUN or even the user name that was used to invoke a DataStage task to get a list that matches your criteria.

Sample execution:

The column with the RSS heading (sort column) specifies the real memory (resident set) size of the process in pages.

You can use the pagesize command to determine the page size used by the system:
$ pagesize
4096

Connection refused during DataStage job startup

Problem(Abstract)

Connection to a remote node fails with a connection refused message during DataStage job startup

Symptom

The observed behavior in the job log is the following series of warning messages followed with a fatal error message, where {hostname} is the actual host name of the conductor node.

Type: Warning
Message: **** Startup error on db2node0 connecting with conductor on {hostname} with cookie 1336156580.429846.195058: unable to connect to port 10001 on conductor; socket = 3, Connection refused retries left: 3

Type: Warning
Message: **** Startup error on db2node0 connecting with conductor on {hostname} with cookie 1336156580.429846.195058: unable to connect to port 10001 on conductor; socket = 3, Connection refused retries left: 2

Type: Fatal
Message: **** Startup error on db2node0 connecting with conductor on {hostname} with cookie 1336156580.429846.195058: unable to connect to port 10001 on conductor; socket = 3, No such file or directory

Cause

DataStage uses an initial ssh or rsh connection to run a process called “section leader” on each remote server machine. This section leader then tries to connect back via TCP (not ssh/rsh) to the section leader process running locally on the conductor machine. This problem is caused because this TCP connection is being refused by the server.

Diagnosing the problem

The “connection refused” message is important because it means that a connection IS being made to a server. If the connection could not be made, it would return a different message such as a host unreachable or a timeout message. The following should be checked by the system and/or network administrator. The DataStage software does not contain tools to check these items automatically.

Is the connection actually getting routed to the correct server ( the DataStage conductor) ?
Is there a firewall blocking the connection?

Resolving the problem

In the majority of situations, one of the two items above (firewall or routing issue) will be the cause of the problem. Verify these things first. If these are determined not to be the problem, then set the following environment variable job parameters and run the job again (you may need to create some of these manually if they do not already exist.)

APT_STARTUP_STATUS=True
APT_PM_SHOWRSH=True
APT_PM_CONNECTIONTRACE=1

Export the job log with the print to file option in the DataStage Director, and be sure to use the All Entries and Full Details options. Open a service request with IBM Technical support and supply the job log containing the output when these three variables are set.

Using the AIX proctree command to determine which client processes are attached to a specific DataStage Engine instance

Question

In an environment with multiple DataStage Engine instances that includes release 7.5.3 and Information Server 8.7 is there a way to determine how many DataStage client processes are connected to each instance?

Answer

You can determine how many clients are connected to a specific instance by using the ps -ef command to get a list of running processes then use grep to find the PID for the dsrpcd process that is associated with the instance of interest. In the following example we determine the PID for the dsrpcd process that has an DataStage Engine instance tag of eee

$ ps -ef | grep eeedsrpcd
root 2121770 1 0 Nov 08 - 0:00 /opt/IS87/IBM/InformationServer/Server/DSEngine/bin/eeedsrpcd
dsadm 3616958 700498 0 07:30:04 pts/4 0:00 grep eedsrpcd

We then use the proctree command with the PID of the dsrpcd process from the prior step to get a listing of all DataStage clients that are established for the specified instance:

$ proctree 2121770
2121770 /opt/IS87/IBM/InformationServer/Server/DSEngine/bin/eeedsrpcd
1056770 eeedscs 4 7200 0
2379936 dsapi_slave 7 6 0
2916408 eeedscs 4 7200 0
1069254 dsapi_slave 7 6 0
3485886 eeedscs 4 7200 0
1228814 dsapi_slave 7 6 0
2592832 eeedscs 4 7200 0
3449004 dsapi_slave 7 6 0
2707492 eeedscs 4 7200 0
2846728 dsapi_slave 7 6 0

DataStage job fails with code=-14 [Timed out while waiting for an event]

Problem(Abstract)

DataStage job fails with code=-14 [Timed out while waiting for an event] – this generally happens when jobs are called from sequencers

Resolving the problem

Follow steps 1-3 in resolving this error.

1) DSWaitStartup – environment variable –
When multiple instances of a job are run from a sequence, and one or more of the job instances are set to reset, the sequence might report a status=99 or -14. This can occur because the controlling sequence did not give the job instances enough time to reset before polling its status. The start up time for a job reset must be increased. The environment variable DSWaitResetStartup can be used for this purpose.

(The maximum value that can be set for DSWaitResetStartup is the value of DSWaitStartup (default is 60). For example, if a value of 120 is required for DSWaitResetStartup, then ensure that DSWaitStartup is also set to a minimum of 120.)

2) uvconfig changes:
Unix.Linux:
a. Please log on as dsadm or root.
b. Next, make sure there are no connections to DataStage
(Unix/Linux) specific instructions – Windows will need to use Task Manager
ps -ef | grep dsapi –>This will test for open client connections
ps -ef | grep dscs –> This will test for running jobs.if you find any client connections
and you know everyone is out, they are probably dormant. Please kill them
using kill -9 <pid>.
c. If all is clear, from the DSEngine directory, source the dsenv and
shut down the server. ./dsenv./bin/uv -admin -stop
d. edit the uvconfig file and make the necessary changes
MFILES to 150
T30FILES to 300
RLTABSZ 200
MAXRLOCK to 199.
e. To save the changes to uvconfig and generate a new .uvconfig binary
file you will need to run regen:./bin/uv -admin -regen
f. Now restart server:./bin/uv -admin -start

Windows:
a. Make sure you are the Administrator on this machine
b. Next, make sure there are no connections to DataStage
c. If there are no connections, stop the dsrpc services.
d. edit the uvconfig file in the DSEngine directory and make the necessary changes
MFILES to 150
T30FILES to 300
RLTABSZ 200
MAXRLOCK to 199.
Syncaloc to 0
e. To save the changes to uvconfig and generate a new .uvconfig binary file you will need to run regen from the DSEngine directory: bin\uvregen
f. Now restart dsrpc services

3) If the above 2 steps do not resolve the -14 error and you are using DataStage/Information Server 8.1 – verify that either patch JR30015 is installed in the Version.xml or that you have FixPack1 installed for 8.1. NOTE: You may need to schedule your job runs so there are not so many processes running at the same time.

Datastage command using dsjob -ljobs does not display jobs for a project

Problem(Abstract)

When using dsjob -ljobs to display job in a project you receive the following output:
<none>
Status code = 0

Diagnosing the problem

Because the output displays the list of jobs for the other version of DataStage that is using the default port 31538, with the same project name, and that project had no , therefore the display shows no jobs. In resolving the problem, make sure to know the port your version is running on so that you get correct data.

Resolving the problem

When there is more than one instance of DataStage on the same server use dsjob with the following syntax. Make sure to provide the server:port# dsjob -domain dname:port# -server sname:port# -user uname -password XXXXXXXX -ljobs projectname

For example;
dsjob -domain dname:9081 -server sname:31539 -user uname -password XXXXXXXX -ljobs projectname

How do I uninstall a plugin in Information Server DataStage for Version 8

Question

How do I uninstall a plugin. For example, I installed the MQSeries Client plugin but I really want the MQSeries server plugin.

Answer

1. Make sure everyone is out of the DataStage clients to be sure nothing is locked or the uninstall will fail. ps -ef | grep dsapi_slave to confirm.

2. First you need to figure out what the correct component name is for the uninstall (It is not necessarily the same as the plugin name that you see within DataStage)

a. cd into IBM/InformationServer/Server/StagingArea/Installed.

b. List out the directory and it will show you all of the components installed.

c. The names of the directories listed are what you will use for the component name. For this example, MQSeries is what I want to uninstall

3. Next cd to IBM/InformationServer/Server/DSComponents/bin and run dscompinst:
For 8.1:
./dscompinst –username <username> -password <password> –v –remove –log /tmp/removeMQ.log MQSeries
For 8.0.1
./dscompinst –v –remove –log /tmp/removeMQ.log MQSeries
For 8.5 and 8.7
./dscompinst -username <isadmin> -password <isadminPassword> -locale <en_US> -remove –log /tmp/removeMQ.log mqseries.jar|mqclient.jar

4. This will remove the plugin out of all projects and it will put the plugin components, including its jar file, into a directory called IBM/InformationServer/Server/StagingArea/MothBalled

5. The plugin is now removed.

6. If you want to reinstall a previously uninstalled plugin, you can use the jar file that gets put in the MothBalled directory.

7. This works the same way on windows

How to turn on ODBC tracing to determine issue with DataStage Job

Question

How do I turn on ODBC tracing to determine issue with DataStage Job?

Answer

Depending on the platform you are using, use the procedure below to initiate trace for ODBC activity.

For Windows®, use the Tracing tab of the ODBC function, as follows:
- Click Start –> Settings –> Control Panel –> Administrative Tools.
- Double-click Data Sources.
- Select the Tracing tab. The Log File Path displays the default path and file name (/sql.log) trace information is written to.
- Click the Start Tracing Now button.
- Click OK.
- To stop ODBC tracing, go to the same panel and click the Stop Tracing Now button, then click OK.
For UNIX® platforms, edit the odbc.ini file to initiate trace for ODBC activity (if you are using the installed .odbc.ini file, it is located at /$DSHOME/.odbc.ini):
- Under the stanza entry [ODBC] change Trace=0 to Trace=1.
- Optionally, modify the TraceFile file path value to a preferred value. All trace records go to one file, which can be confusing with multiple flows or execution groups.
- The datastage engine does not have to be stopped and restarted
Unix examples of the ODBC section

8.5, 8.1, 8.0 and 7.5 versions:

[ODBC]
IANAAppCodePage=4
InstallDir=/opt/IBM/InformationServer/Server/branded_odbc
Trace=0
TraceDll=/opt/IBM/InformationServer/Server/branded_odbc/lib/odbctrac.so
TraceFile=/tmp/odbctrace.out
UseCursorLib=0

8.7 version:

[ODBC]
IANAAppCodePage=4
InstallDir=/opt/IBM/InformationServer/Server/branded_odbc
Trace=0
TraceDll=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMtrc25.so
TraceFile=/tmp/odbctrace.out
UseCursorLib=0

InfoSphere DataStage : disconnected dsapi_slave processes on the engine tier

Problem(Abstract)

Disconnected or orphaned InfoSphere DataStage dsapi_slave processes might be left running on systems where the InfoSphere Information Server engine tier is installed. In general disconnected dsapi_slave processes are not a major issue. But follow these instructions when several orphaned processes cause degradation in system performance, or cause problems administering the system.

Cause

A dsapi_slave process is required for communication between an InfoSphere DataStage client and the engine tier. For example, clients such as Designer, Director, Administrator, and DSJob command connect to the engine tier to design and control jobs.

If the client is unexpectedly terminated during a period of inactivity, it is possible for the dsapi_slave to become orphaned. If the current connection inactivity time out is set to “Do not timeout,” the process does not terminate until the machine is rebooted. If an inactivity time out is specified: the process terminates after the time out period.

Note. Normal client program termination does not result in orphaned dsapi_slave processes.

Environment

The dsapi_slave processes run only on the engine tier.

Diagnosing the problem

There should not be any dsapi_slave processes running if there are no clients connected. Otherwise, the number of dsapi_slave processes should roughly correspond to the number of clients.

On Linux/UNIX based operating systems:

Run “ps -ef | grep dsapi_slave” to determine the number of processes

On Microsoft Windows operating systems:

Use the “Processes” tab of the Window’s Task Manager to manually count the number of processes that have an image name: dsapi_slave.exe.

Some clients are capable of using more than one connection and therefore more than one dsapi_slave process can be associated to a client.

Resolving the problem

The dsapi_slave processes can be stopped using the “kill” command on Linux/UNIX based systems or by using the Task Manager “End Process” button on Microsoft Windows systems.

To prevent this issue, you can set an inactivity time out for all connections. Use the InfoSphere Information Server Administrator client to configure connections:

On the Administrator client “General” tab, uncheck “Do not timeout”, and set the “Inactivity timeout” value to an appropriate number of seconds for your environment.

Note. As the name suggests, if a DataStage client application is left inactive for the specified inactivity time out period, the client connection is terminated and the corresponding dsapi_slave process ends.

Subsequent attempts to use the client application results in connection errors being displayed. You can no longer use the client. Any work that has not been saved is lost.

In order use the client application, close it in the normal fashion and ignore any errors. Proceed by running it again.

When open stage properties in DataStage Designer client, users see warning: No configuration file set.

Problem(Abstract)

When using DataStage with the Grid Toolkit, users who open stage properties for a job in the Datastage Designer client may see the following warning on the Stage tag:
“Warning: No configuration file set”

and the value of field “Configuration file” is blanked out instead of containing the name of configuration file defined in APT_CONFIG_FILE.

Resolving the problem

If site has setup DataStage environment variable APT_CONFIG_FILE in the project or job properties, then that configuration file should be visible in the stage tab of stage properties dialog.

If the user receives “Warning: No configuration file set”, and the configuration file field contains filename (i.e. 4node.apt) then this warning message usually means either that the full path to the configuration file defined in APT_CONFIG_FILE does not point to a valid fail, or the userid for the Designer Client session with error does not have permission to view that configuration file. Ensure that the file exists (usually in the IBM/InformationServer/Server/Configurations directory) and confirm the file is either set to public read permission, or that the file has assigned read privileges to group dstage which all users should be a member of.

If the user receives the warning message and the value of the configuration file field is blank (even though APT_CONFIG_FILE has a valid value), then this is a symptom that site may have enabled conflicting grid settings in the failing project. DataStage grid configurations use either the Grid Toolkit (enabled via APT_GRID_* environment variables), or IBM Tivoli Loadleveler (enabled via the Grid enabled checkbox on the Parallel tab of Project properties).

You can setup either one of these GRID methods, but not both at once. If site is setup to use the Grid Toolkit, then Login to the DataStage Administrator client, open the project properties for the failing project, and then on the Parallel tab, de-select the grid enabled checkbox and save changes. On next login the DataStage Designer client users should no longer receive the configuration file warning.

Information Server DataStage Generate Report function in Designer fails with error “Failed to import report”

Problem(Abstract)

Generating a report for the DataStage job using the Generate Report option under File menu in DataStage Designer client issues an error “Failed to import report”

Symptom

Login to DataStage Designer client and open an existing job. On the File menu click Generate Report. In the dialog window specify a Report name and Report Description and click OK. DataStage Designer client issues an error “Failed to import report”

This error can also occur when trying to generate a HTML report from the Advanced Find in the DataStage Designer client.

Cause

There are two possible causes for this issue:

The user running WebSphere does not have write permissions to the DataStage Reporting workspace directory
The Reporting workspace directory is missing and needs to be recreated

Diagnosing the problem

Verify the Reporting workspace directory exists and has proper file level permissions.

Inspect the /IBM/WebSphere/AppServer/profiles/<ProfileName>/logs/<server>/SystemOut.log for an entry similar to:
[SystemOut Caused by: javax.ejb.EJBException: Failed to initialize reporting service due to following reason: [Failed to initialize the following reporting engine: [JREPORT]. Stop the application server, delete the folder and its contents located at[/tmp/informationServer/Reporting<Machinename><Node><server>/engine/JREPORT], and then restart the application server.]; nested exception is: com.ascential.acs.reporting.ReportingEngineException: Failed to initialize the following reporting engine: [JREPORT]. Stop the application server, delete the folder and its contents located at [/tmp/informationServer/Reporting<Machinename><Node><server>/engine/JREPORT], and then restart the application server.

Resolving the problem

When Information Server services tier is installed on unix systems, WebSphere is by default installed and configured to run as root. The WebSphere processes on the services tier runs as root and the Information Server reporting workspace directory is owned by root with permissions of 755 (drwxr-xr-x).
After installation of Information Server, WebSphere can also be configured to run using a non-root user ID. The following command can be issued to identify if WebSphere has been configured to run as non-root user ID.

ps -ef | grep server1

If the owner of the WebSphere process server1 returned by the command is owned by a non-root user ID, the owner of the Reporting workspace directory should also be modified to the same non-root user ID.

Following is the path where Information Server creates the Reporting workspace directory by default:

DataStage V8.0.1 the Reporting workspace directory is created in
/tmp/WDISuite/Reporting/

DataStage V8.1 the Reporting workspace directory is created in
<install-folder>/IBM/InformationServer/ASBServer/apps/Reporting/

DataStage V8.5 and V8.7 the Reporting workspace directory is created in
/tmp/informationServer/Reporting<machine_name><Node><server>/

Example:
To modify the Reporting workspace directory for DataStage Version 8.5, issue the following commands:

cd /tmp
chown -R <WAS-non-root user> informationServer

Additionally, this directory will be recreated every time WebSphere is restarted. If you receive an error in the SystemOut.log:

“Failed to initialize the following reporting engine: [JREPORT]. Stop the application server, delete the folder and its contents located at [/tmp/informationServer/Reporting<Machinename><Node><server>/engine/JREPORT], and then restart the application server.”

remove the informationServer directory and restart WebSphere to have it regenerated. If you would like to change the default location of this directory please reference:

InfoSphere Information Server: Configuring the location of the Reporting workspace

How to find patch history for IBM InfoSphere DataStage and Information Server

Question

Where can I find the history of patches applied to my DataStage or Information Server installation?

Answer

Customers often need to know the history of patches that have been applied to their DataStage or Information Server installations.

For 7.5.x versions of DataStage, there is not a required place where patch history is kept. However, there is a customary default location, namely $DSHOME/../../patches that is referenced in the patch installation notes for most 7.5 patches. So this is the first place to look for the patch history of your 7.5.x system. You should be aware that this is not guaranteed to have all of the patches that have been applied to your system. You should confirm this with your DataStage administrator.

For 8.x versions of DataStage / Information Server, patch history is retained in the <History> section of the Version.xml file. Every 8.x version maintains a Version.xml file that includes information about what modules have been installed and other relevant information including what patches and fix packs have been applied. This file exists on any machine where DataStage / Information Server is installed, including servers and clients. There is one case where you may not have a Version.xml file, and that is any 8.0 system where no patches or fix packs have been applied. Then there will not be a Version.xml file, and that will tell you that no patches or fix packs have been applied. In all other cases, the Version.xml file will exist and will have a history of patches and fix packs applied, and will also indicate whether the patch was successfully applied.

The default locations for IS 8.x versions are:

Unix/Linux:

/opt/IBM/InformationServer/Version.xml

Windows (including clients):

C:\IBM\Information Server\version.xml

After increasing DataStage tunable parameter in uvconfig receive an error when running uvregen

Problem(Abstract)

After increasing DataStage tunable parameter in uvconfig receive the following error when running uvregen:

Disk shared memory size of 27236016 exceeds maximum allowed size of 27230208

Symptom

After increasing DataStage tunable parameter in uvconfig, for example T30FILE or RLTABSZ, the following error is received when uvregen is run.

$DSHOME/bin/uv -admin -regen
Disk shared memory size of 27236016 exceeds maximum allowed size of 27230208
uvregen: error during regen, process aborted
uvregen: error during regen, process aborted

Resolving the problem

The uvconfig needs to be updated to reduce the size of the shared memory segment that will be created.

One of the items in uvconfig that can be checked is GLTABSZ. The default for GLTABSZ is 75. This value should be sufficient. If GLTABSZ is set higher consider reducing to 75.
Other items that you can check are T30FILE, MFILES, or RLTABSZ. See if any of these can be reduced.
If all tunables are needed at the current size then consider running shmtest to get the correct MEMOFF settings. See shmtest utility syntax and options.

Failure to load libraries after restoring IBM InfoSphere DataStage from backup

Problem(Abstract)

After successfully restoring your system from a backup copy, jobs that were working before fail to load the required libraries and abort.

Sample error messages, there are others you may see but all have an underlying cause of failure to load a library:

INFO: com.ibm.is.install.core.CommonProcess
Exit Code: 255, Elapsed Time(ms) = 205, ErrorBuffer:
Could not load program /opt/IBM/InformationServer/Server/DSEngine/bin/uvsh:
Could not load module /opt/IBM/InformationServer/Server/PXEngine/lib/libicui18n.a.
Dependent module /opt/IBM/InformationServer/Server/PXEngine/lib/libicuuc32.a could not be loaded.
The module has too many section headers or the file has been truncated.
Could not load module uvsh.
Dependent module /opt/IBM/InformationServer/Server/PXEngine/lib/libicui18n.a could not be loaded.
Could not load module .

Fatal Error: Duplicate class registration for: APT_DBSequenceAdminOp

Cause

When you restore your system from a backup copy, the soft links are not restored. Therefore, the required soft links in the /opt/IBM/InformationServer/Server/PXEngine/lib directory are not there and DataStage jobs fail because they cannot load the expected libraries.

Environment

Linux and Unix systems with IBM InfoSphere DataStage

Diagnosing the problem

Look for errors such as those shown above, or any failure to load a library. If you recently restored from backup and are now seeing these errors, the soft links may be missing.

Resolving the problem

To resolve this issue you must restore the soft links.

Below is a sample list of soft links in the /opt/IBM/InformationServer/Server/PXEngine/lib directory from an AIX system with Information Server 8.1 installed:
libcdebugaix64.so -> cdebug.o
liborchbuildopaix64.so -> orchbuildop.o
liborchcommondbaix64.so -> orchcommondb.o
liborchfreqdistaix64.so -> orchfreqdist.o
liborchgeneralaix64.so -> orchgeneral.o
liborchio64aix64.so -> orchio64.o
liborchmonitoraix64.so -> orchmonitor.o
liborchresestaix64.so -> orchresest.o
liborchresourceaix64.so -> orchresource.o
liborchsasop9aix64.so -> orchsasop9.o
liborchsortaix64.so -> orchsort.o
liborchstatsaix64.so -> orchstats.o
liboshaix64.so -> osh.o

All of these must be present in order for the system to function properly.

Note: the above is a sample list for one architecture at one release level. The names of the files or links may be different on your system.

Recommended actions:

If you have another system with the same architecture or release level, use that as a guide for restoring your soft links
If you kept a list of the soft links in your /opt/IBM/InformationServer/Server/PXEngine/lib directory, use that list as a guide for restoral

Jobs missing from IBM InfoSphere DataStage Director Client

Problem(Abstract)

No jobs or logs showing in IBM DataStage Director Client, however jobs are still accessible from the Designer Client.

Cause

This behavior indicates that the Project has been corrupted, and cannot display the job information correctly in the Client.

Resolving the problem

We recommend using the SyncProject tool that is installed with DataStage 8.5 and newer by default. It is located under the directory

For Linux/Unix:

/opt/IBM/InformationServer/ASBNode/bin

Windows:

C:/IBM/InformationServer/ASBNode/bin

The SyncProject tool can be run to analyze and recover projects.

The tool does not require root authority (e.g. it can be run as dsadm) but the ISUser and ISPassword provided should be a defined Information Server userid, not necessarily an operating system id.

The options are:

-Fix [filename]

Specifies that any inconsistencies are fixed. You can optionally specify a file name, and information about the fix, and its success or failure, is written to this file.

-Reconstruct

Specifies that a project is to be reconstructed.

-Project Projectname…

Specifies one or more projects to be checked.

-Job Jobname…

If you specify only one project in your command, specifies a list of jobs within that project to check.

-Report [filename]

Specifies that a report is required. You can optionally specify a file name, and the report is written to this file.

-Backup [filename]

Optionally, specify a file name for the tar archive file that is created. If you do not specify a file name, the tar archive file is called Projectname.tar

-Restore [filename]

Specifies the name of the tar archive file that you previously created. This restores the project from the .tar backup.

If you are trying to repair a corrupted project and -fix does not repair the project, the next best option is to run the script with -Reconstruct.

Be aware that the -Reconstruct might not work to restore your project to 100% of what it was before the corruption. Also you will need to test each and every job to check that they are working properly.

Note that if your project is corrupted, the best outcomes arise from a reimport of dsx or restore from a previous backup.

For additional information on the SyncProject tool, refer to:

Commands for checking and repairing projects

IBM InfoSphere Information Server DataStage job fails with error: Not enough space on node

Problem(Abstract)

IBM Information Server DataStage job fails with fatal error message:
Fatal Error: APT_Communicator::shmemInitBuffers: createMapFile (/tmp) failed: Not enough space on node.

Cause

This message can be caused by either:

The specified scratch directories or the TMPDIR is not local to the node (e. g. it is an NFS
file system).
The temporary directory filling up and is out of disk space. To change this set the TMPDIR to another temporary directory which has sufficient disk space.
This message can also be caused by system-wide limit on the number of mmap’ed shared memory segments. Often this issue occurs when the value for APT_DEFAULT_TRANSPORT_BLOCK_SIZE is set too high. Please check the environmental variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE. The default for this variable is 131072 or 128 KB. The maximum value should be is 1048576 or 1MB. Please see page 79 on the Parallel Job Advanced Developer Guide for additional information regarding this environmental variable.

Resolving the problem

To resolve this issue, ensure that there is adequate space in the assigned TMPDIR (temporary directory) or verify the settings on the environmental variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE.

Does IBM InfoSphere DataStage 8.5 Oracle EE Stage Support Parameters in the SQL Text File?

Question

The parameters in the sql text file used in the query in the DataStage Oracle EE Stage are not being recognized as parameters and converted to the parameter value. Are parameters supported in the SQL Text file?

Answer

Parameters are not supported in the SQL Query text files

Disable automatic startup for all Information Server Services

Question

How do I disable autostart on server reboot for all the Information Server Components? These would include the Engine, Agents, WAS, DB2

Answer

There are a couple steps to disable automatic startup of all the services.

Disable Engine Autostart:
Log into server as root and go to $DSHOME (DSEngine directory and type:
. ./dsenv

cd to scripts directory and type:
DSEdisable_autostartup.sh

Disable WAS and Agents Autostart:
Go to :

/etc/rc.d/rc2.dYou may want to take a backup of these files if you decide to enable auto-startup again. Delete the files:
SISFAgents
SISFServer

Disable DB2 Autostart:
Depending on where DB2 is installed i.e. /IBM/db2/V9.5/bin run the command:
db2iauto -off db2inst1

DataStage job started via dsjob received Status code = -14 DSJE_TIMEOUT

Starting IBM InfoSphere Information Server node agents as a non-root user

Question

Can we start the node agents (ASB and logging agents) for IBM InfoSphere Information Server as a non-root user on Linux and UNIX?

Answer

The node agents (mainly the ASB and logging agents) can be started as the DataStage administrator user.
Before you begin

Important: Before doing these steps, back up your system so that you can use the backup to restore the original state, if necessary.

About this task
These steps must be completed on all engine tier systems by a system administrator who has root access.
The task needs to be completed only once, but to maintain the configuration, you must redo steps 4-8 after applying any fix packs.

Default installation locations
Instructions in this document use the default installation locations. Your path varies if you installed InfoSphere Information Server in a different location.

The following directory is the Linux and UNIX default InfoSphere Information Server installation location:

/opt/IBM/InformationServer

Instructions
1. Install IBM InfoSphere Information Server, the available fix packs, and the required patches.

2. Verify that the DataStage administrator account that originally installed the engine tier (by default, dsadm) exists and belongs to the DataStage primary group (by default, dstage).

Note: In the case that you did not install InfoSphere DataStage or InfoSphere Information Analyzer, you can select any trusted user of your choice because the DataStage administrator account will not exist in the system.

3. On IBM AIX computers, make sure that the stack_hard variable in the /etc/security/limits file is set to -1 for the user that you selected in the previous step.

4. Make sure that the user can write to the temporary directory.

5. Log in as a system administrator with root access and go to /opt/IBM/InformationServer/ASBNode/bin

6. Run this command to stop the node agents:
./NodeAgents.sh stop
Note: If you are using InfoSphere Information Services Director, you should verify that all the related jobs are stopped. Typically, stopping the node agents should stop all Information Services Director jobs.

7. Remove any remaining *.out, *.err, and *.pid files from the /opt/IBM/InformationServer/ASBNode and /opt/IBM/InformationServer/ASBNode/bin directories.

Note: If the agents were intentionally or accidentally launched by root and subsequently started as a non-root user, the process output files, such as *.out and *.err files, that are located in the /opt/IBM/InformationServer/ASBNode and ASBNode/bin folder must be deleted to allow the new owners of the agent processes to regenerate those output files. This could have occurred during an install process if the agents also had been configured to run as non-root and were restarted as root during the installation or if the agents were ever intentionally started as root.

8. If neither InfoSphere DataStage nor InfoSphere Information Analyzer are installed, change the ownership of the /opt/IBM/InformationServer/ASBNode directory to the trusted user selected in step 2, by running the command: chown -R <user> /opt/IBM/InformationServer/ASBNode

9. Do the following steps to configure the node agents to start as the non-root user when a computer restarts. You must locate and change the content of the ISFAgents files on the engine tiers. Note: The location and file name are different for each operating system, but the content of the file is the same.

On HP-UX, run the command: cd /sbin
On Linux, AIX, or Solaris, run the command: cd /etc

b. Run the command:

find . -name "*ISFAgents*"

Note:

This step might return multiple files with various prefixes in the name. Some files might link to other files and might reflect the change that you made in the original file so that you do not have to edit each file found. The main file is typically located in

rc.d/init.d/ISFAgents

c. Change and save the content of these files from:

#!/bin/sh # chkconfig: 2345 85 60 # description: Information Services Framework server. IS_INIT_D=true;export IS_INIT_D "/opt/IBM/InformationServer/ASBNode/bin/NodeAgents.sh" "$@" to:

#!/bin/sh # chkconfig: 2345 85 60 # description: Information Services Framework server. IS_INIT_D=true;export IS_INIT_D/usr/bin/su - dsadm -c “/opt/IBM/InformationServer/ASBNode/bin/NodeAgents.sh $*"

Note: dsadm should be changed to the alternate user that you selected in step 2, in case you did not install InfoSphere DataStage or InfoSphere Information Analyzer.

Remember: Instructions in this document use the default installation locations. Your path varies if you installed InfoSphere Information Server in a different location.
10. Log in as dsadm (or the user that you selected in step 2) and go to:
/opt/IBM/InformationServer/ASBNode/bin
11. Run the following command to start the node agents:
./NodeAgents.sh start
Verification steps

To ensure that your system is configured correctly, do the following steps. Once verified, restart your system and run the following commands again to assure that the startup scripts were properly modified:

If the default ports for logging and ASB agents are 31531 and 31533, as specified during the installation, run this command:
netstat –a | grep 3153If the agents are not running, you must stop and start the node agents again.
Run the following command to verify that the agents are up and running as the specified user:ps –ef | grep Agent

How do you report the list of users having access to InfoSphere Information Server components based on Role type?

Question

How do you report the list of users having access to InfoSphere Information Server components based on Role type?

Cause

The Information Server administrator often needs to report on the list of users having access to specific components for compliance and security reasons.

Answer

Information Server Reporting provides a report template named “List of users” to report on the users having access to specific Information Server components.

For example,

To report the list of users having access to the Business Glossary Component:

1. Login to InfoSphere Information Server web console and navigate to Reporting -> Report Templates -> Administration -> Security -> View report Templates

2. Select report List of Users and Create New Report. Provide a Name and Description of the New report

3. Select the Role Type and the InfoSphere Information Server Component

4. Click on Retrieve Values to populate the Role ID based on the selected Role Type and Component

5. Select the Output format, Expiration, and History Policy for the report

6. Select Finish -> Save and Run now

7. Select the report name -> View Report Result to get the list of users having access to Business Glossary Component

The above steps can be repeated to report the list of users for other Information Server Components such as DataStage, QualityStage, Fast Track, and Information Analyzer.

For additional information on Administration report templates follow the link:

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/index.jsp?topic=%2Fcom.ibm.swg.im.iis.found.moz.reporting.doc%2Ftopics%2Fadministrationreporttemplates.html

Passing data between jobs with a parameter set file in IBM InfoSphere Information Server

Question

How do I pass parameters between jobs in a sequence?

Answer

When a file based parameter set is used, the parameter set picks up the contents of the file at the time the job is started. This provides excellent flexibility as long job could manipulate a parameter set file, and leave new parameters for a subsequent job. The key is that the parameters are not compiled into the job or the parameter set. The following steps detail the creation and use of file based parameter sets.

File > New > Other > Parameter Set
In the general tab name the parameter set.(Alphanumbeic and Underscore characters only. No spaces.)For example: my_file_parm_set
In the Parameters tab create the parameters and defaults
In the Values tab put the name of the file in the “Value File Name” column.Specify a file name only. No path component
Use alpha numneic characters etc. No spaces.For example: parm_file.txtIn the same row, list values for each parameter defined in the Parameters tab. The table should look like this:
Click OK and save the the parameter set.
At this point the parameter set file should have been created using the following system:Projects<project_name>/ParameterSets/<parm_nat_name>/<parm_file_name>In our example the file name would be:InformationServer\Server\Projects\foo2\ParameterSets\my_file_parm_set\parm_file.txt
The contents of the file looks like this:foo=Parm file value for foo
bar=Parm file value for bar
In the job, navigate to Edit > Job Properties and select the Parameters tab.Click Add Parameter Set and select the newly created parameter set:======================= Create the Sequence Job ===========================
Create a new sequence and add a Job Activity stage to it. Double click on the stage to edit its properties, and click on the button to select a Job name. Once the job is loaded, its parameters become visible in the Parameters section of the Job Activity screen. The new parameter set should be in the list. Click on the drop down selector in the Value Expression column and select the parameter file:Click OK and save the sequence.
Now every time the sequence is run, parameters are read from the file at run time and passes to the job. Any preceding jobs in the sequence can change the file to dynamically modify the parameters.

Note: When the number of parameters in a parameter set is changed, all jobs using the parameter set must be recompiled.

How to configure ODBC connection to SQLServer using dynamic ports in IBM Information Server

Question

SQLServer is setup to dynamically assign ports. In the .odbc.ini file, the Address parameter is usually set to hostname colon port number (Address=HostName:1433), but the port may change. How should we handle this?

Answer

For the Address parameter value, instead of entering the hostname colon port, enter the hostname a backslash and the server instance name.

For example, in Unix/Linux, use the IBM SQLServer Wire Protocol driver and enter the following in the .odbc.ini file in the DSN definition for the connection to the SQLServer data source:

Address=HostName\Server_Instance_Name

For Windows, use the ODBC Data Sources Administrator to configure a System DSN for the data source using the IBM SQLServer Wire Protocol driver.

Note: The parameter is Server

How to find patch history for IBM InfoSphere DataStage and Information Server

Question

Where can I find the history of patches applied to my DataStage or Information Server installation?

Answer

Customers often need to know the history of patches that have been applied to their DataStage or Information Server installations.

The default locations for IS 8.x versions are:

Unix/Linux:

/opt/IBM/InformationServer/Version.xml

Windows (including clients):

C:\IBM\Information Server\version.xml

How do I repair a corrupt DataStage project?

Question

How do I repair a corrupt DataStage project?

Cause

The DataStage project is corrupt.

Answer

Use the ISALite utilities option to Detect issues, view issues, and fix invalid Information server 8.5/8.7 DataStage projects.

You will see output similar to the following when using this ISALite utilities option:

DSEngine Restorer Report

Sun May 20 00:15:28 CDT 2012

IS Host = test
IS Port = 9080
IS User = isadmin

DS Host = test
DS Port = 31538

DataStage Project: TestProject
——————————-

1 Issue Found.
ISSUE: DS Engine Project file ‘DS_STAGETYPES’ is corrupt.

Overall Summary
—————

1 Issue Found.

…….

DSEngine Restorer Fix Results

Sun May 20 00:20:19 CDT 2012

IS Host = test
IS Port = 9080
IS User = isadmin

DS Host = test
DS Port = 31538

DataStage Project: TestProject
——————————-

RESOLVED: DS Engine Project file ‘DS_STAGETYPES’ is corrupt.

Overall Summary
—————

1 Issue Resolved.
0 Issues Unresolved.

Please refer to page 63 of the ISALite User Guide for additional documentation:

http://www.ibm.com/support/docview.wss?uid=swg24022700&aid=10

How do you configure the Information Server installation temporary directory for base, fix pack, and patch installations?

Question

How do you configure the Information Server installation temporary directory for base, fix pack, and patch installations?

Answer

Information Server temporary directory controls where installation logs are written, where the installation tools generate temporary files and scripts, and where the installation tools unpack archives prior to installing their contents. It does not affect the use of temporary storage space by the run time Information Server products.

Information Server 8.5 and 8.7:

Base Install:
Create a file .jvm_args in the is-suite or is-client directory The top level directory of the expanded software distribution archive is either is-suite or is-client. The archive is a .ZIP file on Windows, but for UNIX and Linux platforms it is a .tar.gz compressed tar file. Add the line -Djava.io.tmpdir=temp_dir
Example: -Djava.io.tmpdir=/isinstalls/temp

Update Installer:
Create a file .jvm_args in the InformationServer/_uninstall directory and add the line
-Djava.io.tmpdir=temp_dir

Information Server 8.0.1 and 8.1:

Base Install:
supply the command line arg -is:tempdir temp_dir
Example:
install.exe -is:tempdir temp_dir
./install.sh -is:tempdir temp_dir

Update Installer:
java-path/java -Djava.io.tmpdir=temp_dir -jar updater.jar updater-args

Alternatively
-Djava.io.tmpdir=temp_dir argument can be added to the InformationServer/Updates/bin/InstallUpdate.bat or InstallUpdate.sh script to make the temporary directory permanent.

Using istool script file in Information Server when job folders contain a space

Question

How can I use an istool script file in Information Server to export jobs when the job folder contains a space or without a space?

Answer

Any of the following 3 commands can work in the script file for both type of folders(with a space or without a space):

export -dom <domainname:9080> -u <user> -p <password> -ar <path_to_isx_file>/<filename>.isx -ds
‘”<ServerName>/<ProjectName>/Shared Containers/<common_part_of_job_name>*.*”‘
( Please note the space in Shared Containers folder) Sample entry with server name of “ABC” and project name of “Test”: export -dom <domain_name>:9080 -u <user> -p <password> -ar /home/dsadm/Test.isx -ds
‘”ABC/Test/Shared Containers/ScGetKey*.*”‘
export -dom <domain_name:9080> -u <user> -p <password> -ar <path_to_isx_file>/<filename>.isx -ds
‘”<ServerName>/<ProjectName>/Jobs/<common_part_of_job_name>*.*”‘ Sample entry: export -dom <domain_name>:9080 -u <user> -p <password> -ar /home/dsadm/Test.isx -ds
‘”ABC/Test/Jobs/CopyOfHistoryProcessing*.*”‘
export -dom <domain_name:9080> -u <user> -p <password> -ar <path_to_isx_fie>/<filename>.isx -ds
‘”<Servername>/<ProjectName>/Jobs/<category_folder_name>/*/*.*”‘ Sample entry: export -dom <domain_name>:9080 -u <user> -p <password> -ar /home/dsadm/Test.isx -ds
‘”ABC/Test/Jobs/Test BatchJob/*/*.*”‘

JR41712: AFTER INSTALLATION OF 8.5 FP2 DERIVATION EDITOR IN TRANSFORMER STAGE EMITS WINDOWS ERROR “UNEXPECTED ERROR; QUITTING”

APAR status

Closed as program error.

Error description

After upgrade to 8.5 FP2 Datastge Designer derivation editor in
the transformer stage emits error "Unexpected error; quitting"

Local fix

Re register vmdsflt.exe, that is

start a command prompt and change directory to
<INSTALL_DIR>\Clients\Classic and issue the command

vmdslft.exe  /REGSERVER

Note if using Windows Vista or Windows 7 you need will to run
the command cmd.exe with elevated Administrator permissions, to
do this when invoking cmd.exe vai run on the start menu, do not
hit enter but ctrl+shift+enter.

Problem summary

****************************************************************
USERS AFFECTED:

DataStage Designer users using the Transformer, Derivations,
using the right click "input column" would not show any editor
window.
****************************************************************
PROBLEM DESCRIPTION:

After upgrade to 8.5 FP2 Datastge Designer derivation editor in
the transformer stage emits error "Unexpected error; quitting"

****************************************************************
RECOMMENDATION:

Install patch JR41712

****************************************************************

Problem conclusion

```
Install patch JR41712
```

Temporary fix

Issue the command vmdsflt.exe /REGSERVER in clients\classic

New DB2 Connector Properties

Technote (FAQ)

Question

What are the new properties that are added to the DB2 Connector?

Answer

The new DRS Connector stage uses the DB2 Connector stage to access DB2 databases. The following properties have been added, or changed, in the DB2 Connector in order to support the DRS Connector. (Some of these new DB2 Connector properties have equivalent properties in the legacy DRS stage, DB2 UDB API stage or DB2/UDB Enterprise stage.) These new properties in the DB2 Connector are available after installing the DRS Connector patch on top of InfoSphere Information Server 8.5.

“Insert new rows only” Write mode

The

Write mode

property specifies the mode of work for the connector when writing records to the target database. The property provides a list of modes for the user to choose from. This list was enhanced with the new write mode option called

Insert new rows only

The new write mode behaves similarly to the existing

Insert

write mode. The difference between the two is in how they handle records that can not be written to the database due to a primary key or unique constraint:

In case of the

Insert

write mode this condition is treated as an error. If a reject link is defined and configured to accept records that failed for this reason then the records are rejected, otherwise a fatal error message is logged and the job stops.

In case of the

Insert new rows only

write mode the records that could not be written to the database because of a primary key or unique constraint are ignored and the connector proceeds to process the remaining records. Any error other than a primary key or unique constraint violation still results in logging a fatal error message and stopping the job.

The following example compares the

Insert

Insert then update

and

Insert new rows only

write modes and illustrates their similarities and differences.

Presume that the table TABLEX in the database has primary key column COLA, and that it contains the following rows:

COLA	COLB
2	OLD VALUE TWO
5	OLD VALUE FIVE
6	OLD VALUE SIX

Presume that the records on the input link to the DB2 Connector stage are arriving as follows:

COLA	COLB
1	NEW VALUE ONE
2	NEW VALUE TWO
3	NEW VALUE THREE
6	NEW VALUE SIX

If the

Write mode

property is set to

Insert

, the job will fail when it tries to insert row COLA=2 because a row with the same primary key already exists in the table.

If the

Write mode

property is set to

Insert then update

then the rows COLA=1 and COLA=3 will be inserted, and the rows COLA=2 and COLA=6 will be updated. When the job completes, the table will contain the following rows:

COLA	COLB
1	NEW VALUE ONE
2	NEW VALUE TWO
3	NEW VALUE THREE
5	OLD VALUE FIVE
6	NEW VALUE SIX

If the

Write mode

property is set to

Insert new rows only

then the rows COLA=1 and COLA=3 will be inserted but the existing rows COLA=2 and COLA=6 will be ignored. When the job completes the table will contain the following rows:

COLA	COLB
1	NEW VALUE ONE
2	OLD VALUE TWO
3	NEW VALUE THREE
5	OLD VALUE FIVE
6	OLD VALUE SIX

Notice that the

Insert new rows only

write mode produced the same results in the target table that would have been produced with the

Insert

write mode had the reject link been defined for the stage and configured to accept rows that cannot be inserted to the table due to a constraint violation. Namely, with the

Insert new rows only

write mode such rows are ignored and with the

Insert

write mode such rows need to be sent to the reject link in order for the job to be able to continue to process the remaining rows.

Select Statement Read select statement from file

The new

Read select statement from file

property has been added as a sub-property of the

Select statement

property, which is only available when reading data (i.e. using the DB2 Connector as a source). The

Read select statement from file

property can be set to

Yes

No.

The default is

When the

Read select statement from file

property is set to

, the behavior of the

Select statement

property is unaffected. You should enter into the

Select statement

property the SQL SELECT statement you want run.

When the

Read select statement from file

property is set to

Yes

, you should enter into the

Select statement

property the absolute path to a file that contains the SQL SELECT statement you want run.

Fail On Row Error

A new property called

Fail on row error

has been added to the

Usage

section of the DB2 Connector properties. The

Fail on row error

property specifies whether the connector should log a fatal error message and stop the job when an error occurs writing a record to the database. The property is not available if the

Write mode

property is set to

Bulk load

. The allowed values for the property are

Yes

and

. Their meaning is as follows:

When the property is set to

and a record could not be written to the database, the connector logs a warning message and continues processing the remaining input records.

When the property is set to

Yes

and a record could not be written to the database, the connector logs a fatal error and the job stops.

The default value for the property depends on the type of job in which the connector stage is running:

For parallel jobs the default value is

Yes

. Note that if a reject links is defined for the stage this property is not available and automatically defaults to

Yes

For server jobs the default value is

. By default if an error occurs when writing a record to the database, a warning message is logged and the job continues. If the input link to the DB2 Connector stage is coming from a Transformer stage which has been configured to reject the rows that the DB2 Connector stage could not write to the database, the

Fail on row error

property must be set to

in order to allow the Transformer to route those rows to the reject link.

Logging multiple matches

A new property called

Logging multiple matches

has been added to the

Usage

section of the DB2 Connector properties. The

Logging multiple matches

property is available in the DB2 Connector stage only when the stage is running in a server job and in the lookup mode of operation. In this mode one or more reference links connect the DB2 Connector stage with the Transformer stage.

This property specifies whether the connector should log a message when the lookup statement returns multiple matching records for the input key record. The checking is performed for each input record separately. Note that even if the lookup statement in the connector returns multiple rows, only the first row is provided by the connector on the reference link.

The supported values for the property are:

None

– No message is logged for multiple matches.

Informational

– Message of informational severity is logged.

Warning

– Message of warning severity is logged.

Fatal

– Message of fatal severity is logged and the job stops.

Note that when the DB2 Connector stage is running in a parallel job and in lookup, it is connected with a reference link to the Lookup stage, and the Lookup stage provides support for handling multiple lookup matches.

Before SQL -> Read Before SQL statement from file

The new

Read Before SQL statement from file

property has been added as a sub-property of the

Before SQL

property. The

Read Before SQL statement from file

property can be set to

Yes

. The default is

When the

Read Before SQL statement from file

property is set to

, the behavior of the

Before SQL

property is unaffected. The user should enter into the

Before SQL

property the SQL statement(s) they wish executed.

When the

Read Before SQL statement from file

property is set to

Yes

, the user should enter into the

Before SQL

property the absolute path to a file that contains the SQL statement(s) they wish executed.

Before SQL (node) -> Read Before SQL (node) statement from file

The new

Read Before SQL (node) statement from file

property has been added as a sub-property of the

Before SQL (node)

property. The

Read Before SQL (node) statement from file

property can be set to

Yes

. The default is

. The property is only available when using the DB2 Connector stage in a parallel job.

When the

Read Before SQL (node) statement from file

property is set to

, the behavior of the

Before SQL (node)

property is unaffected. The user should enter into the

Before SQL (node)

property the SQL statement(s) they wish executed.

When the

Read Before SQL (node) statement from file

property is set to

Yes

, the user should enter into the

Before SQL (node)

property the absolute path to a file that contains the SQL statement(s) they wish executed.

After SQL -> Read After SQL statement from file

The new

Read After SQL statement from file

property has been added as a sub-property of the

After SQL

property. The

Read After SQL statement from file

property can be set to

Yes

. The default is

When the

Read After SQL statement from file

property is set to

, the behavior of the

After SQL

property is unaffected. The user should enter into the

After SQL

property the SQL statement(s) they wish executed.

When the

Read After SQL statement from file

property is set to

Yes

, the user should enter into the

After SQL

property the absolute path to a file that contains the SQL statement(s) they wish executed.

After SQL (node) -> Read After SQL (node) statement from file

The new

Read After SQL (node) statement from file

property has been added as a sub-property of the

After SQL (node)

property. The

Read After SQL (node) statement from file

property can be set to

Yes

. The default is

. The property is only available when using the DB2 Connector stage in a parallel job.

When the

Read After SQL (node) statement from file

property is set to

, the behavior of the

After SQL (node)

property is unaffected. The user should enter into the

After SQL (node)

property the SQL statement(s) they wish executed.

When the

Read After SQL (node) statement from file

property is set to

Yes

, the user should enter into the

After SQL (node)

property the absolute path to a file that contains the SQL statement(s) they wish executed.

The following new properties are only available when the “Write mode” property is set to “Bulk load”:

New load modes: Restart and Terminate

Two new modes have been added to the

Load mode

property. The new values are

Restart

and

Terminate

. These modes have been added to support situations where a previous load operation did not complete successfully (and the table is in an abnormal state such as LOAD PENDING state.)

Specify the

Restart

load mode if you wish to resume an interrupted load operation. (Setting the

Load mode

Restart

will enable the new

Restart phase

property where you can specify which phase of the load should be resumed.)

Specify the

Terminate

load mode if you wish to roll back a failed load operation.

For more information on the DB2 load modes, please refer to the “Load overview” topic in the DB2 InfoCenter:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/c0004587.html
Restart phase

The new

Restart phase

property specifies which DB2 load phase is to be restarted. The options are

Build

Load

and

Delete

. (Index copy phase is not supported.) Specifying the

Build

Delete

phases means the DB2 Connector will ignore the input data. Specifying the

Load

phase means the input data must be the same as it was when the load was originally started. This means the data on the input link to the DB2 Connector stage must be identical to the data provided on the input link when the load was first attempted.

The

Restart phase

property is only enabled when the

Load mode

property is set to the new

Restart

value. The default value is

Load

For more information on the DB2 load phases, please refer to the “Load overview” topic in the DB2 InfoCenter:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/c0004587.html
Command file generation (“Input data files directory” property renamed to “Directory for data and command files”)

The property formerly named

Input data files directory

was renamed to

Directory for data and command files

to more closely match the equivalent property in the legacy DRS stage (DRS Plugin). The DB2 Connector now generates a command file which can be used to execute an equivalent load using the DB2 command line LOAD utility. This command file, along with a batch file (on Windows) or shell script (on Unix) for launching the command file, will be generated in the directory specified by this property.

Remove intermediate data file

The new

Remove intermediate data file

property specifies whether the sequential file used for bulk loading should be deleted by the DB2 Connector when the bulk load operation completes. This property can be set to

Yes

If set to

Yes

, the sequential file will be deleted when the bulk load operation completes.

If set to

, the sequential file will be left on disk. Leaving the sequential file behind can be useful for investigating bulk load problems.

This property is enabled when the

Bulk load to LOB or XML column(s)

is set to

and the

Load method

property is set to

Sequential File(s)

and the

Files only

property is set to

File type

The new

File type

property specifies the format used by the DB2 Connector to pass data to DB2. Set this property to

ASC

if you want the connector to pass data in non delimited ASCII format. Set this property to

DEL

if you want the connector to pass data in delimited ASCII format. The

ASC

format is the default, as this format generally has better performance. The

DEL

format is preferable when you want to inspect or use the data file generated by the connector for bulk loading, as the

DEL

format is more readable. (To preserve the data file generated by the connector for bulk loading, see the topic in this technote for the

Remove intermediate data file

property.)

For more information on these formats, please refer to the DB2 “Export/Import/Load utility file formats” topic:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/r0004646.html
LOB path list

The new

LOB path list

property specifies a comma delimited list of fully qualified paths where the LOB files to be loaded are located. These LOB files must be local to the node(s) where the DB2 Connector runs. The LOB columns in the input record(s) must contain LOB Location Specifier(s) (LLS) that instruct DB2 how to locate the LOB in the path(s) you supplied in the

LOB path list

property.

For example, suppose I have a table of the format:

create table LOBTABLE (COL1 integer, COL2 clob(256))

The table is described on the Columns tab of the DB2 Connector stage as:

Name	Type	Length
COL1	Integer
COL2	LongVarChar	256

The new

File type

property is set to

DEL

and the

LOB path list

property contains the following path:

C:\mylobs

In the rows sent to the DB2 Connector stage, there is an LLS in COL2:

COL1	COL2
1	myfile.del.001.lob.0.3
2	myfile.del.002.lob.0.4
3	myfile.del.003.lob.0.4

In the C:\mylobs directory there are three files:

myfile.del.001.lob

myfile.del.002.lob

myfile.del.003.lob

The file myfile.del.001.lob contains:

IBM

The file myfile.del.002.lob contains:

IBM2

The file myfile.del.003.lob contains:

IBM3

Using the LLS in COL2, DB2 would read the data from the myfile.del.00n.lob files in C:\mylobs and write it to COL2 in LOBTABLE. When the job completes the table LOBTABLE contains the following rows:

COL1	COL2
1	IBM
2	IBM2
3	IBM3

The

LOB path list

property is enabled when the

Bulk load with LOB or XML column(s)

property is set to

. (When the

Bulk load with LOB or XML column(s)

property is set to

, the

LOB path list

property must be used if you wish to load LOB columns.)

For more information on LOB Location Specifiers (LLS), please refer to the following topics in the DB2 InfoCenter:

LOB import considerations

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/c0004581.html

LOB export considerations:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/c0004562.html

LOB and XML file behavior when importing and exporting:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.dm.doc/doc/c0004562.html
Without prompting

The new

Without prompting

property controls the appearance of the “WITHOUT PROMPTING” phrase in the command file generated for bulk loads (as described under the

Command file generation

topic in this technote.) Setting this property to

Yes

causes the DB2 Connector to add the phrase “WITHOUT PROMPTING” to the load command in the generated command file. Setting this property to

means the “WITHOUT PROMPTING” phrase will be left out of the generated command file. This property only affects the command file and does not impact any other aspect of the DB2 Connector’s behavior.

This property is only available when the

Bulk load with LOB or XML column(s)

property is set to

and the

Load method

property is set to

Sequential File(s)

For more information on the WITHOUT PROMPTING parameter of the DB2 load utility, please refer to the DB2 load utility InfoCenter topic:

https://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.cmd.doc/doc/r0008305.html
Row count
The new Row count property specifies the number of records to be loaded. This property can be used to limit the number of records loaded. For example, if there are 100 input records, but the Row count property is set to 10, only 10 records will be loaded. The default value is zero, which means load all input records, i.e. no restriction.

Troubleshooting problems with creating InfoSphere DataStage projects

Problem(Abstract)

This document describes common problems and useful troubleshooting techniques when the creation of a new InfoSphere DataStage project fails.

Symptom

Creating a project fails during the installation of InfoSphere Information Server, when using the Administrator client, or when using the dsadmin command.

Cause

There can be various causes of these types of problems, such as environmental issues, firewall configuration and repository database configuration.

Diagnosing the problem

When project creation fails, the first step is to locate the error message. The error message location is based on how the project creation was started:

Installation – search in the installation logs (for example, /opt/IBM/InformationServer/logs/) for “RUN BP DSR_QUICKADD.B”. The error message is a few lines following this text.
Administrator client – the error message is displayed in a message box.
dsadmin command – the error message is displayed on the console.

After you locate the error message, there might be more detailed error information in the following log file on the DataStage Server system:

/home/_Credential_Mapped_Username_/ds_logs/dstage_wrapper_trace_N.log

where N is a number from 1 to 20 for the log files for the last 20 DataStage sessions
where _Credential_Mapped_Username_ is the DataStage Server user name that credentials are mapped to in Web Administration console

Note: The logs go to ${user.home}/ds_logs where ${user.home} is the home directory of the credential mapped user or the user running the installation for projects created at installation time. On Windows computers, the user home directory is usually C:\Documents and Settings\_Credential_Mapped_UserName_

Tracing DataStage Server processes
Sometimes tracing the dsrpcd and child processes can produce useful information from which IBM support can narrow down the causes of a project creation problem. Operating systems differ in how to trace processes but a few examples are shown here:

AIX: "truss -o /tmp/dsrpcd_truss.out -fp <dsrpcdPID>"
RedHat: "strace -fp <dsrpcdPID> > /tmp/dsrpcd_strace.out"

The above commands attach to the running dsrpcd process and will record all of the system calls made by that process and its children during subsequent client-server sessions; for example creating a project from the Administrator client or the dsadmin command line.

Enabling JVM startup tracing
To produce extra diagnostic information for the JVM initialization (after all of its libraries have been successfully loaded), you can add the following lines to /opt/IBM/InformationServer/Server/DSEngine/dsenv:

XMOG_TRACE_LEVEL=TraceVerbose
XMOG_TRACE_FILE=/tmp/xmogtrace.txt
export XMOG_TRACE_LEVEL
export XMOG_TRACE_FILE

On Windows these tracing options can be set as System Environment Variables by using the System Control panel.

Remember to restart the DataStage Server engine processes after adding these variables, and to remove these environment variables after they are no longer needed.

Enabling repository database tracing
To enable tracing of the code that populates the repository database follow these steps:

Create a file on the DataStage Server system in /opt/IBM/InformationServer/ASBNode/conf/ called NewRepos.debug.properties (the file name is case sensitive)
In the file add the following three lines:
log4j.logger.com.ascential.dstage=DEBUG log4j.logger.com.ibm.datastage=DEBUG NewRepos.spy.trace=true

The dstage_wrapper_trace_N.log will then contain extra tracing information the next time a project creation is attempted. Ensure that you delete the NewRepos.debug.properties file when finished. In addition, spy trace files, such as dstage_wrapper_spy_N.log, are produced in the same directory as the log files. These files contain a detailed record of low level method calls and can grow quite large.

Running project creation manually
The project creation code runs in the context of a dsapi_slave process which does not have any console output. Locate the full “RUN BP DSR_QUICKADD.B” command line from the domain installation log files on /opt/IBM/InformationServer/logs/.

Use the following commands to run the project creation code so that you can view the console output:
Linux and UNIX

cd /opt/IBM/InformationServer/Server/DSEngine
. ./dsenv
bin/uvsh
RUN BP DSR_QUICKADD.B <arguments from log file> <newProjectName> /opt/IBM/InformationServer/Server/Projects/<newProjectName> CREATE
QUIT

Windows

cd C:\IBM\InformationServer\Server\DSEngine
bin\uvsh
RUN BP DSR_QUICKADD.B <arguments from log file> <newProjectName> C:\IBM\InformationServer\Server\Projects\<newProjectName> CREATE
QUIT

Resolving the problem

The dstage_wrapper_trace_N.log usually gives an indication of where the problem lies. These log files might require IBM customer support, but some errors can be interpreted to attempt further diagnostics.

In some cases the error message explicitly says what the problem is, for example: “There is a mis-match between the code on the client and the code on the server. Please ensure that patch JRXXXXX (or later) is installed on the client machine.”

The following is a summary of the known issues that can cause project creation to fail.

1) Incorrectly configured repository database.

8.0.x message:

“Error creating DR elements, Error was -1”

“Invalid node name: %1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was Unique constraint violation.”

These types of errors usually occur because the repository database returns an error when attempting to make an update. The dstage_wrapper_trace_N.log file may contain more specific details about the exact database error.

There may also be a database log, depending on what type of database the repository is running in, which contains more information. For example, DB2 has the db2diag tool which can be run to find out the exact reason why an update failed. Typical failures are: out of disk space, memory configuration problems etc.

For repository database errors it is important to confirm that the database has been created using the scripts supplied on the installation media. These scripts configure important database parameters which if missed may cause project creation problems.

It is also important that the database has been created using the correct character set, as per the database creation script documentation on the installation media (typically UTF16/32). If a different character set was used some of the metadata stored can become corrupted or may cause unexpected primary key violations. If the wrong character set has been used the product needs to be reinstalled.

For errors at this level the WebSphere Application Server logs may contain additional information. The files SystemOut.log and SystemErr.log can be found in:

...WebSphere/AppServer/profiles/_profile_name_/logs/server1/
2) Leftover metadata in repository database from a previously failed project creation

8.0.x message:

“Error creating DR elements, Error was -1”This problem only occurs on 8.0.x systems and can be identified by looking in the dstage_wrapper_trace_N.log file for a “unique constraint violation” error. This can occur when a project creation failed and did not remove all of its metadata from the repository. Even though the project cannot be seen in DataStage, attempting to create a new project of the same name will result in this error.

To work around this problem, you can simply create a project with a different name. Alternatively, IBM support can provide a tool and instructions for how to remove the leftover data from the repository.
3) Unable to create log file on the DataStage Server

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was log4j:ERROR setFile(null,true) call failed.”

Just before the metadata repository is populated with the default project contents, a log file is created on the DataStage Server system in /home/_Credential_Mapped_Username_/ds_logs/. If this log file cannot be created, the project creation will fail. On Windows computers, the user home directory is usually C:\Documents and Settings\_Credential_Mapped_UserName_

The usual reasons why this log could not be created are either because the user has no home directory at all or they do not have appropriate permissions on it.
4) Incorrectly configured locale on the DataStage Server

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was Unmatched quotation marks”

This problem is ultimately caused by bad locale configuration on the DataStage Server system. This manifests itself because the “hostname” command is run during project creation, and instead of returning the correct host name it returns a string such as “couldn’t set locale correctly”.
5) Failed to load JVM into the DataStage Server process (dsapi_slave)

8.0.x message:

“(The connection is broken (81002))”8.1 and later message: “CDSRUserAccess.EnterCS: Internal error – Failed to open file : UV.ACCOUNT (81002)“

The JVM (Java Virtual Machine) can fail to load for several reasons. If it does fail to load, the dsapi_slave process will be terminated, resulting in connection broken errors (81002) being produced on the client.

A core file may be produced which can be used to determine what caused the process to be terminated.

Possible causes of this problem are:

The LIBPATH (or equivalent) is too long and caused a buffer overflow. This can be confirmed by using the Administrator client to execute the “env” command using the Command button. If the contents of LIBPATH appear to have been duplicated then it is probable that dsenv has been sourced twice (dsenv does not need to be sourced when starting the DataStage Server engine processes via uv -admin -start).
Incompatible or missed patches on the Client, Server and Domain systems. By looking in the version.xml file of each system you can confirm what patches have been installed. Ensure that patches have been installed on all appropriate systems.
Environment variables such as LDR_CNTRL may have been added or modified in the IBM/InformationServer/Server/DSEngine/dsenv file. Generally speaking LDR_CNTRL settings in dsenv should not be modified unless otherwise directed by IBM.
Incompatible operating system kernel parameters.

6) Firewall configuration

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was com.ascential.xmeta.exception.ServiceException”
The DataStage Server system needs to communicate with the domain system, which means that certain ports need to be open between these systems if they are located on separate machines.

This sort of problem can be confirmed by looking in the dstage_wrapper_N.log file for errors of the kind “Connection refused:host=<hostname>,port=2809”. Ensure the firewall is correctly configured and use telnet <hostname> <port> from the DataStage Server machine to confirm the port is accessible.

The necessary firewall configuration can be found in the installation guide.
7) Trusted authentication between DataStage Server system and the Domain system failed

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was Mapping failed to copy attributes: MetaTable -> DSTableDefinition (EObject: null, MetaTable)”

The DataStage Server system authenticates with the Domain system by a process called trusted authentication. This uses a secure certificate exchange rather than explicit user name and password authentication. If this fails, the project creation will not succeed. Trusted authentication failure is usually identified by multiple exceptions in the DataStage Server ds_logs that says “Null session”.

This can fail for a number of reasons:

If the DataStage Server is installed onto a Windows system (say C:\IBM\InformationServer), installing the clients into a different directory (say C:\IBM\InformationServer2) will cause the certificate exchange to fail, ultimately causing the project creation to fail. See technote #1409412 and APAR JR34441 for more information.
The number of trusted sessions reaches a maximum limit, so a new session cannot be started. This is usually identified by an entry in the WebSphere logs that says the limit has been reached. If so, restarting WepSphere Application Server will clear everything so that new sessions can be created and project creation can succeed.

8) DataStage was not installed on the Domain system

8.0.x message:

When installing the Domain and DataStage Server onto different physical systems, the installation of DataStage Server will fail to create projects specified in the installer if DataStage has not been installed onto the Domain. These errors can be found in the installation logs.

Furthermore, attempting to create projects using the Administrator client or command line will also fail. In both these cases, the exceptions will say that “The package with URI “http:///1.1/DataStageX.ecore” is not registered”.

DataStage can be added to the Domain system by re-running the installer, selecting DataStage and deselecting the other components.
9) Locale (regional settings) customized on the Client system

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “Invalid Node Name %1“

If the regional language settings have been modified to use a customized short date format (for example “ddd dd/MM/yyyy”) it can cause the DataStage Administrator client to send the wrong date information to the DataStage Server, causing project creation to fail. A patch for this issue has been created, under APAR JR34770.
10) Disk / partition full or user quota reached on DataStage Server system

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, error was log4j: ERROR failed to flush writer.“

The project creation operation will create a log file on the DataStage Server system, called dstage_wrapper_trace_N.log, in the path indicated at the beginning of this document. This will fail if the disk /partition is full or the user to which credentials have been mapped to has reached their disk quota. Free up space as necessary and retry the operation.
11) Project creation fails at “Initializing demo files…” within the Administrator client.

8.5 message: “Errors were detected during project creation that may render project <name> unstable.

Caused by: DSR.ADMIN: Error creating DR elements, Error was <date timestamp> java.utils.prefs.FileSystemPreferences$2 run.

This error states that there was a problem with being able to write Java preference data.

There are two things that can cause this error to occur.

i) SE (Security Enhanced) Linux is enabled.

If SELinux is enabled, please disable it.

check the /etc/sysconfig/selinux file
run the sestatus command
check the /var/log/messages file for SELinux notices (Notice format might differ between RHEL 4 and RHEL 5.)

To disable SELinux, you can do one of the following:

set it in permissive mode and run the setenforce 0 command as a superuser
modify /etc/sysconfig/selinux and reboot the machine.

ii) The user ID that is trying to create the project does not have a local home directory to write to.

If there is no home directory for the user ID, create a local home directory with write permissions

(766) and have the group as part of the local dstage group.
12) Stack Execution Disable (SED) is enabled (AIX only)

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “Unable to confirm the JVM can be loaded into the DataStage server process ‘DSR_CREATE.PROJECT.B TestJVM’ failed”

If Stack Execution Disable (SED) is enabled in AIX, the JIT compiler fails when trying to execute code it generated in the process data area. This occurs with all of the DataStage executables that have embedded JVMs. Solutions is to turn off the SED at the system level and reboot the machine. To turn off the SED use the command:

sedmgr -m off

(The above is also covered in APAR JR40821)

13) Unable to increase the tablespace for the metadata repository (XMETA)

8.0.x message:

“Error creating DR elements, Error was -1”8.1 and later message: “DSR.ADMIN: Error creating DR elements, Error was unable to save“

The SystemOut.log may show an error like:

Oracle¨ORA-01653: unable to extend table XMETAUSER.LOGGING_LOGGINGEVENT1466CB5F by 128 in tablespace XMETA

If DB2 is used for the metadata repository (XMETA) look in the <db2instance_home>/sqllib/db2dump/db2diag.log for errors.

To resolve this problem, increase the tablespace, and retry the operation. Note that it may be necessary to manually delete any partially created project, which can be done by following the material here: http://www-01.ibm.com/support/docview.wss?uid=swg27021312.

14) Error updating secondary indices.

Error message:

“DSR.ADMIN: Error updating secondary indices. Status code = -135 DSJE_ADDPROJECTFAILED”

A known cause for the error “error updating secondary indices” is one or more missing I_* directories in the

/opt/IBM/InformationServer/Server/Template

directory. If the there is another DataStage engine installation (of the same version and patch level) available it is possible copy the Template directory from the working engine and use it to replace the Template directory on the broken engine. However be careful the backup the existing Template directory first. Note that if a Template directory is taken from a working engine of a different patch level, some of the patches on the broken engine may be rendered ineffective.

How to determine the number of connections for each Teradata Connector stage in Information Server?

Question

How to determine the number of connections for each Teradata Connector stage in Information Server?

Answer

The Teradata database has a limit on the number of connected sessions. This is controlled by number of PEs configured in Teradata database server. Basically, each PE-module in Teradata database configuration supports up to 120 sessions. So, if you are connecting to a Teradata database that is configured with only 1 PE, then you will be able to run only 120 jobs concurrently. All other concurrent jobs beyond 120 would fail, because they can not get a connection toTeradata database.
Please check the number of PEs configured in Teradata database server.

One connection (at least) for one Teradata Connector stage. For example, if a job is having four Teradata Connector stages, each Teradata Connector stage would require (at least) one connection to Teradata database and hence such job would consumes 4 connections.

NFS mount options required when installing the Information Server Engine tier

Question

What NFS mount options are required when installing the Information Server Engine tier into an NFS mounted filesystem?

Answer

There are specific mount options which must be used when installing the Information Server Engine tier into an NFS mounted filesystem. Note: these options are only required when you are performing a full Engine tier installation. They are not required when you are cross-mounting the Engine tier onto additional hosts to be used in a cluster or GRID configuration.

When the Engine will be installed into an NFS mounted filesystem, you must ensure that the following options are used to mount the file system before you begin the installation. If you are not currently using these options, you must unmount it, and then re-mount it with these options before you begin the Information Server Engine tier installation.

no_root_squash (enabled)
This option allows the nfs client to maintain root file permission. This option is required because there are files which must be owned by root and accessed by root for the Information Server software to function properly. These files must not have ownership squashed.

nosuid (disabled)
This option will suppress suid execution mode. This option must NOT be present. Information Server has a requirement to use suid executable files for installation and post-installation operation.

These mount options must be used to mount the filesystem before installation, and must remain in use permanently in order for the Information Server software to function properly.

In addition to these mount options, it is also a requirement that the NFS lockd must be running. The lockd is part of the network lock manager, which allows record locking on NFS files. Information Server has a requirement to perform file record locking for installation and post-installation operation.

Note:

From a performance perspective we do not recommend the use of NFS for scratchdisk storage (or for temporary storage as defined by the TMPDIR environment variable.) It is preferred to use dedicated high speed SAN or local disk storage. If using shared SAN, it is possible that other applications will be impacted when the Parallel Engine is used for disk intensive operations like sorting. Likewise, the activity of other applications could impact DataStage performance.

How DataStage TDMLoad plugin works with the Teradata FastExport utility

Question

How does DataStage Teradata MultiLoad (TDMLoad) plugin work with the Teradata FastExport utility?

Answer

The TDMLoad plugin does not directly create any connections to Teradata. The TDMLoad plugin generates the script file and the data file. Then the plugin invokes the Teradata Fastexport utility with the script file as input. At this point, the TDMLoad plugin spawns the “Fastexport” as a separate process and waits for the completion of the Fastexport.

When the sessions connected by the Fastexport process get disconnected, then the “Fastexport” process should exit. If it does not, contact Teradata support.

If for some reason, the “Fastexport” process is still running, but the TDMLoad process died due to some other failure in the job, the session may be left open. This is an expected behavior.

Always check to see if there is a Fastexport process that is still running. To clear the connection to the database, this process will need to be terminated.

Parameters passed to the dsjob command can expose userids and passwords. How can I workaround this issue in DataStage?

Question

Parameters passed to the dsjob command can expose userids and passwords
when ps command is run. How can I workaround this issue in DataStage?

Cause

When a password is passed with -param in dsjob command, the password shows up as clear text using the ps command.

Answer

To workaround it, use -paramfile option within dsjob command.

In DataStage 7.5.3, patch JR33752 added the new option -paramfile.
In Information Server version 8.0.1 or higher, this functionality is included.

Here is how -paramfile works:
The format of the file is:
<param>=<value>.

The dsjob -run command can read job parameter values from both a file and command line. The values are set depending upon the order specified on the command line. Later ones will overwrite matching ones specified before.

For example, the contents of myfile is:

jobparam1=value1
jobparam2=value2

The following command is executed:
dsjob -run -paramfile myfile -param jobparam2=cmdvalue2 -param jobparam3=cmdvalue3…

This will set parameter values as:

jobparam1=value1
jobparam2=cmdvalue2
jobparam3=cmdvalue3

Connectivity Guide for Sybase – new environment variables and options

Question

What are the new environment variables and options that are available for Sybase Enterprise Edition stage?

Answer

The following environment variables are available for Sybase Enterprise Edition stage:

Environment variables that affect Sybase IQ read operation:
- APT_SYBASE_NULL_AS_EMPTY: Set this environment variable to extract null values as empty string. This environment variable has no effect when fixed width extract is done.
- APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL: Prerequisite for APT_SYBASE_NULL_AS_EMPTY.
Environment variables that affect Sybase IQ write operation:
- APT_SYBASE_NULL_AS_EMPTY: Set this environment variable to load empty string as null value. To use this functionality, the IQ server should support the Load_ZeroLength_AsNull option and should be enabled.
- APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL: Prerequisite for APT_SYBASE_NULL_AS_EMPTY.
APT_SYBASE_PRESERVE_BLANKS: Set this environment variable to load blanks to varchar.
APT_SYBASE_CONVERSION_ERROR: Set this environment variable to turn off conversion error warnings during load.
APT_SYBASE_CHECKPOINT_OFF: Set this environment variable to disable checkpoint during load. This helps improve performance.
APT_SYBASE_LOAD_MEMORY_MB: Use this environment variable to set the Load memory .

For Sybase IQ read:
Fixedwidth: Set this option to extract rows as fixed width records.

For Sybase IQ write:
Fixedwidth: Set this option to load rows as fixed width ASCII. This option increases the performance of load option.

For Sybase ASE write:
Delimiter: Specify the column delimiter for load. Delimiter can be a string upto 30 characters length.
Use Identity: Set this option to false to load identity columns from incoming data.

How do you modify the default auto log purge schedule in Information Server?

Question

How do you modify the default auto log purge schedule in Information Server?

Cause

Default auto log purging deletes events until only 2000 events remain. In some cases it is necessary to keep more log events, such as when debugging is turned on.

Answer

The following procedure can be used to change how many events will be deleted and allows more log events to be viewed from the Information Server Web Console.

The LoggingAdmin.sh under the folder <InformationServer_Dir>/ASBServer/bin/ can be used to modify the auto log purge schedule:

./LoggingAdmin.sh -user <suiteAdmin> -password <password> -modify -schedule -name “Log event deletion routine” -threshold <newThreshold> -percentage <newPercentage> -frequency -minutes 30

where threshold is the minimum number of log events that must exist in the database before any log events are purged.

percentage governs the number of events to delete:
numberOfEventsToDelete = totalEvents – (((100 – percentage) * threshold)/100)

For example, if you execute:

./LoggingAdmin.sh -user <suiteAdmin> -password <password> -modify -schedule -name “Log event deletion routine” -threshold 15000 -percentage 75 -frequency -minutes 30

when the log purge task runs and the number of events is greater or equal to 15,000, 11,250 events will be purged, which is 75% of 15,000. 3,750 events will remain, which is 25% of 15,000.

Note:- The default value for threshold is 10,000 and the default value for percentage is 80.

Can I copy the DSParams file of an InfoSphere DataStage Project to replicate its environment variables

Question

Can I copy the DSParams file of an InfoSphere DataStage Project to replicate its environment variables?

Answer

The DSParams file contains the values for all the parameters and environment variables of a DataStage Project. The structure of this file changes in different releases of DataStage. As result, copying a DSParams from a different release is not a valid procedure and it should never be attempted. Doing this will cause unexpected behavior in DataStage, including abnormal terminations. Depending the version of DataStage that you are using there are different methods that you can use to copy environment variables.

For Information Server 8.5 and higher

Starting at version 8.5, the Administrator Client allows to export environment variables in an *.env file. This file can be then imported into a different project. This is the supported way of copying environment variables across projects. For more information refer to the online documentation.

For Information Server 8.1 and lower

Before 8.5, DataStage did not include any functionality to export and import variables. If you have a large number of variables that you need to copy between projects and don’t want to manually create or set these values in Administrator then you can manually edit or replace the DSParams file if you follow these recommendations:

Before you edit or replace the DSParams, always get a backup in case you need to rollback the changes.
Never copy a DSParams file from a project running on a different version of DataStage or from a different platform as this file may be introducing or missing parameters or variables that don’t apply to your current version. Only copy DSParams files from projects running on the exact same DataStage version and platform.
After editing or replacing the DSParams file, restart the Engine to reflect the changes. Then confirm the values are correct using Administrator.

How do I change dsadm password in IBM InfoSphere Information Server?

Question

How do I change the password for the dsadm DataStage (DS)/Information Server (IS) user?

Answer

Check whether dsadm is used as the default DataStage and QualityStage credentials:

Log on to IS Web Console
Administration > Domain Management > DataStage Credentials
Select the server
Click on Open Configuration
Then go back and click on Open My Credentials
If dsadm is filled in the above two windows, then you are using dsadm for the credentials.

If you are using OS authentication and use dsadm as the default DataStage and QualityStage credentials, please perform the following:

Change dsadm password at the OS level (on Unix/Linux box).
Log on to IS Web Console using isadmin account. Change dsadm password in:Administration > Domain Management > DataStage CredentialsSelect the server
a. Open Configuration
b. Open My CredentialsIn other words, change the dsadm password in Open Configuration window and Open My Credentials window.

If you are using OS authentication and do not use dsadm as the default DataStage and QualityStage credentials, you only have to change the dsadm password at the OS level (Step 1 above).

Manually Starting Resource Tracker in InfoSphere Information Server

Rules for handling Nulls in a transformer for InformationServer DataStage Version 8.5+

Question

How has InformationServer DataStage changed null handling in Version 8.5?

Answer

Null Handling has changed in Version 85 –

Null values can now be included in any expression
Null values no longer need to be explicitly handled
A null value in an expression will return a null value result
As long as the target column is nullable records will not be dropped
Stage Variables are now always nullable
Stage Variable derivations can now include any input fields

Backward Compatibility
Set APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING=1, as a user-defined environment variable, to use the previous null handling mechanism at the project level
OR
Use the check box per Transformer Stage to use previous null handling for that transformer

Null in Expressions:
(Any operation involving null will return null)
1 + null = null
“abc”:null = null
trim(null) = null

exceptions:
IsNull(null) and IsNotNull(null) return true or false as expected

Any comparison involving null is unknown (i.e. is not true)

1 < null is not true
1 >= null is not true
null == null is not true
null != null is not true
Logical operators – null is unknown (i.e. is not true)
True AND null is not true
True OR null is true
False OR null is not true

How to check if the scheduler is running and how to monitor the scheduler in IBM InfoSphere Information Server

Question

Where in the WAS console we can check if scheduler is running or stopped running?

Can you please provide additional details on schedule monitoring?

Answer

Regarding the ISF scheduling service, it is part of the ACS_server.ear application which can be viewed from the WAS Admin Console at:

Applications > Application Types > WebSphere enterprise applications

A green arrow means the the ACS_server application is running. Note that there are a number of services in ACS_server.ear. If it was not running, a lot more than just the scheduling service would be failing.

Information about ISF schedules can be viewed using a Scheduling View, created in the IS Web Console:

Select Administration > Schedule Monitoring > Views of Schedules
Select ‘New Scheduling View’
Fill in ‘Name’ and ‘Description’ fields
Change ‘Access’ to ‘Shared’ if you want to allow other users to see the scheduling viewNote: The rest of the fields are for filtering which schedules are viewable through this schedule view. If you leave the default values, all schedules will be shown.
Select ‘Save and Close’ when all changes have been made.
Back on the ‘Vews of Schedules’ page, select the Scheduling View just created and then select ‘View Schedules’. This lists schedules according to the filter criteria set in the Scheduling View.

From the list of schedules, there are several tasks, listed on the right, that can be performed:

Start – allows the task to run at the scheduled time
Stop – prevents the task from running
View Task – list details about the scheduled task
View Complete – lists task runs that have been completed
View Running – lists task currently in the running state
View Forecast – lists when tasks are scheduled to run in the future
Purge – removes run history of tasks which have already executed

Also, note that the ISF scheduling service purges task run history, so not all task histories will be available.

How do I monitor ASBAgent and Logging Agent in IBM InfoSphere Information Server?

Question

Can you please advise on how we can monitor these processes (ASBNode and Logging Agent), if our syntax to do so is correct, and if in fact it is normal to see that at times for not both of these processes to have a Unix/Linux PID?

Cause

Current script may contain checks similar to:

{
echo “Check if ASB (Application Service BackBone) Agent and Logging Agent are running”
if [[ $(ps -ef|grep -i
/opt/IBM/InformationServer/ASBNode/apps/jre/bin/java|grep -vc grep)
-eq 2 ]]; then
echo ” **ASBAgent and Logging Agent are running”
else
echo ” !! FAILURE FOR ASBAgent and Logging Agent CHECK”
status_flag=”1″
fi
}

Answer

When you check for the ASBAgent and Logging Agent by issuing ps -ef | grep -i agent, there will be two agents running for Information Server.

For example, if the agents are running, you should see output similar to the following:

root 1392838 1 0 Oct 10 – 169:48 /opt/IBM/InformationServer/ASBNode/bin/RunAgent -Xbootclasspath/a:conf -Djava.ext.dirs=apps/jre/lib/ext:lib/java -Djava.class.path=conf -Djava.security.auth.login.config=/opt/IBM/InformationServer/ASBNode/conf/auth.conf -Dcom.ibm.CORBA.ConfigURL=file:/opt/IBM/InformationServer/ASBNode/conf/sas.client.props -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager -Djava.util.logging.configureByServer=true -class com/ascential/asb/agent/impl/AgentImpl

root 1302646 1 0 Oct 10 – 16:10 ./../apps/jre/bin/java -Djava.security.auth.login.config=./../conf/auth.conf -Dcom.ibm.CORBA.ConfigURL=file:./../conf/sas.client.props -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager -Djava.util.logging.configureByServer=true -classpath ./../conf:./../lib/java/ACS_client.jar:./../lib/java/ACS_common.jar:./../lib/java/ASB_utils.jar:./../lib/java/ASB_util_client.jar:./../lib/java/WASClient.jar com.ascential.acs.logging.agent.LoggingAgentSocketImpl

In order to specifically identify the two agents for Information Server, issue the following commands:
ps -ef | grep -i AgentImpl

ps -ef | grep -i LoggingAgentSocketImpl

New Oracle Connector Properties

Question

What are the new properties that are added to the Oracle Connector?

Answer

The new DRS Connector stage uses the Oracle Connector stage to access Oracle databases. The following properties have been added in the Oracle Connector in order to support the DRS Connector. These new properties in the Oracle Connector are available after installing DRS Connector patch on top of IBM InfoSphere Information Server 8.5.

“Insert new rows only” Write mode

The

Write mode

Insert new rows only

The new write mode behaves similarly to the existing

Insert

write mode. The difference between the two is in how they handle records that could not be written to the database due to a primary key or unique constraint:

In case of the

Insert

write mode this condition is treated as error. If a reject link is defined and configured to accept records that failed for this reason then the records are rejected, otherwise a fatal error message is logged and the job stops.

In case of the

Insert new rows only

write mode the records that could not be written to the database because of a primary key or unique constraint are ignored and the connector proceeds to process the remaining records. But any error other than primary key or unique constraint violation still results in logging a fatal error message and stopping the job.

The following example compares the

Insert

Insert then update

and

Insert new rows only

write modes and illustrates their similarities and differences.

Presume that the database table TABLEX has primary key column COLA. Presume that the table contains the following rows:

COLA (PK)	COLB
2	OLD VALUE TWO
5	OLD VALUE FIVE
6	OLD VALUE SIX

Presume that the Oracle Connector stage is configured to write data to this table and that the records arriving on the input link of the stage are as follows:

COLA (Key)	COLB
1	NEW VALUE ONE
2	NEW VALUE TWO
3	NEW VALUE THREE
6	NEW VALUE SIX

If the

Write mode

property in the stage is set to value

Insert

, the job will fail when it tries to insert row COLA=2 because a row with the same primary key value already exists in the table.

If the

Write mode

property is set to value

Insert then update

then the rows COLA=1 and COLA=3 will be inserted, and the rows COLA=2 and COLA=6 will be updated. When the job completes the table will contain the following rows:

COLA (PK)	COLB
1	NEW VALUE ONE
2	NEW VALUE TWO
3	NEW VALUE THREE
5	OLD VALUE FIVE
6	NEW VALUE SIX

If the

Write mode

property is set to value

Insert new rows only

then the rows COLA=1 and COLA=3 will be inserted but the existing rows COLA=2 and COLA=6 will be ignored. When the job completes the table will contain the following rows:

COLA (PK)	COLB
1	NEW VALUE ONE
2	OLD VALUE TWO
3	NEW VALUE THREE
5	OLD VALUE FIVE
6	OLD VALUE SIX

Notice that the

Insert new rows only

write mode produced the same results in the target table that would have been produced with the

Insert

write mode had the reject link been defined for the stage and configured to accept rows that cannot be inserted to the table due to a constraint violation. Namely, with the

Insert new rows only

write mode such rows are ignored and with the

Insert

write mode such rows need to be sent to the reject link in order for the job to be able to continue to process the remaining rows.

Fail on row error

The

Fail on row error

property specifies whether the connector should log a fatal error message and stop the job when an error occurs when writing a record to the database. The property is not available if the

Write mode

property is set to value

Bulk load

. The allowed values for the property are

Yes

and

. Their meaning is as follows:

When the property is set to value

and a record could not be written to the database, the connector logs a warning message and continues processing the remaining input records.

When the property is set to value

Yes

and a record could not be written to the database, the connector logs a fatal error and the job stops.

The default value for the property depends on the type of job in which the connector stage is running:

For parallel jobs the default value is

Yes

. Note that if a reject links is defined for the stage this property is not available and automatically defaults to value

Yes

For server jobs the default value is

. By default if an error occurs when writing a record to the database, a warning message is logged and the job continues. If the input link to the Oracle Connector stage is coming from a Transformer stage which has been configured to reject the rows that the Oracle Connector stage could not write to the database, the

Fail on row errors

property must be set to value

in order to allow the Transformer to route those rows to the reject link.

Log multiple matches

The

Log multiple matches

property is available in the Oracle Connector stage only when the stage is running in a server job and in the lookup mode of operation. In this mode one or more reference links connect the Oracle Connector stage with the Transformer stage.

Note that even if the lookup statement in the connector returns multiple rows, only the first row is provided by the connector on the reference link. This property only controls whether to log a message if such a situation occurs.

The supported values for the property are:

None

– No message is logged for multiple matches.

Informational

– Message of informational severity is logged.

Warning

– Message of warning severity is logged.

Fatal

– Message of fatal severity is logged and the job stops.

Note that when the Oracle Connector stage is running in a parallel job and in lookup, it is connected with a reference link to the Lookup stage, and the Lookup stage provides support for handling multiple lookup matches.

Preserve trailing blanks

The

Preserve trailing blanks

property specifies whether the stage should preserve trailing whitespace characters in the text field values of the records that it receives from the framework and then passes to the database. It is therefore applicable to columns on the input link and key columns on the reference link, and when the columns are of one of the character data types, such as VarChar or NVarChar.

The supported values for this property are

Yes

(default) and

, and their meaning is as follows:

When the property is set to value

Yes

, the trailing whitespace characters are treated as any other characters. They are preserved along with the other characters and the data is passed to the database in its original form. This is the default behavior for the connector.

When the property is set to value

, the stage removes trailing whitespace characters from the text field values that it receives from the framework. The trimmed values are passed to the database. Any leading whitespace characters in the values are preserved.

This property is available for all modes specified in the

Write mode

property, including the

Bulk load

mode.

Manual mode
This Manual mode property is enabled when the Write mode property is set to value Bulk load. The property is located under the Bulk load options property and accepts values Yes and No. The meaning of the values is as follows:

When set to

Yes

the stage does not load the data directly to the Oracle database table, but instead it creates control and data files which can then be used to load the data to the database using the Oracle SQL*Loader utility.

When set to

, the connector loads the data directly to the database table using Oracle direct load path. The default value for the property is

Setting Manual mode property to value Yes enables a number of additional properties under it:
Directory for data and control files

This property specifies the directory to which the connector saves control and data files that it generates for manual load. This property does not have default value and the value must be provided by the user. If the connector fails to open the specified directory it logs a fatal message and the job stops.

Control file name

This property specifies the name of the control file. The stage generates this file and stores it in the directory specified in the

Directory for data and control files

property. The property does not have a default value. If the control file name value is not specified, the connector generates the name automatically in form

servername

tablename

.ctl. The

servername

is the value specified in the

Server

connection property, and the

tablename

is the value specified in the

Table name

property. If the connector fails to save the control file under the specified file name, it logs a fatal message and the job stops.

Data file name

This property specifies the name of the data file. The stage generates this file and stores it in the directory specified in the

Directory for data and control files

property. The property does not have a default value. If the data file name value is not specified, the connector generates the name automatically in form

servername

tablename

.dat. The

servername

is the value specified in the

Server

connection property, and the

tablename

is the value specified in the

Table name

property. If the connector fails to save the data file under the specified file name, it logs a fatal message and the job stops.

Load options
This property specifies the bulk load options that the connector should include in the generated control file. The value contains parameters that are passed to the Oracle SQL*Loader utility when it is invoked to process the generated control and data files.

The property value needs to conform to the syntax for the OPTIONS clause for Oracle SQL*Loader. The default value is:
OPTIONS(DIRECT=FALSE,PARALLEL=TRUE)

The DIRECT=FALSE parameter tells the Oracle SQL*Loader to use the conventional path load instead of the direct path load. The PARALLEL=TRUE parameter tells it that the data can be loaded in parallel from multiple concurrent sessions.

Refer to the Oracle product documentation for information about these options and other available options.

Note that the word OPTIONS and the parentheses must be included in the value specified for the property. The connector saves this property value in its original form to the generated control file and does not check its syntax.

Deleting .log, .out, .err, .txt files in ASBNode in IBM InfoSphere Information Server

Question

Is it safe to delete the asbagent_startup.err file and other files from ASBNode directory?

There are several files in my IBM Information Server installation and they are taking up too much disk space.

Answer

You can safely maintain (remove) the following files from ASBNode/ or ASBNode\ and ASBNode/bin/ or ASBNode\bin\:

*.log, *.out, *.err files
*.txt files named like:
orbmsg.XXXXXXXX.YYYY.ZZ.txt
orbtrc.XXXXXXXX.YYYY.ZZ.txt
old javacore*.txt and heapdump*.txt files

These files can be safely deleted, unless you are investigating problems where you need to review them. If so, you may want to keep the ones with the timestamp that is relevant to those issues.

InfoSphere Information Server 8.5 Logging Subsystem Changes (Log Helper, Logging Agent, and log purging)

Question

What are the changes to the logging subsystem, in IBM InfoSphere Information Server, Version 8.5?

Answer

In InfoSphere Information Server, Version 8.5, several changes were made to the Logging Subsystem:

InfoSphere DataStage Administration Client

Option to enable or disable Operational Repository (OR) logging on a per project basis.
Options to limit the number and type of log events that are logged to the OR, per job run.

These features are described in InfoSphere DataStage Administration Client documentation, but not described here.Log Helper, Logging Agent, and purging logs
These features are described in the present technote.

Log Helper’s usage of temporary, intermediate log files for processing log events generated by an InfoSphere DataStage job.
Option to set a maximum log file size and create a rotating set of historical files for the Logging Agent log files.
Events that are purged by using the InfoSphere DataStage Director client are purged in both the RT and OR databases.
Events that are purged by using the InfoSphere Information Server Web Console or InfoSphere Information Server Console are purged only in the OR database.

Note: The present technote applies only to Version 8.5 and reflects changes after Version 8.1. For Version 8.1-specific information, see DataStage logging changes introduced at 8.1 can impact performance and can also result in missing log entries.Log Helper new features

When DataStage is configured for OR logging, log messages from running DataStage jobs are written to the Operational Repository using the Log Helper. In Version 8.1 and earlier, the Log Helper would connect to the Logging Agent to log events. However, if the Logging Agent, Logging Service, or Application Server was not available, the Log Helper would fail and subsequently cause the job to fail.

To remove this dependency, now in Version 8.5, the Log Helper queues log events to intermediate log files on the disk. The Logging Agent reads events from an intermediate log event file and sends the events to the Logging Service to insert them into the OR. Finally, after all events have been logged, the intermediate log event file is deleted from the disk.

The intermediate log event files are written to the <ISInstallDir>/ASBNode/logs directory, by default. You can change the location by modifying the logFilePath property, as described in the “Configuring Logging Agent properties” section, below.

Possible performance implications:

OR log event latency and engine disk usage can be affected, depending on how many events a job logs, and how many jobs are running simultaneously.
Because log events are first queued to the disk, there is some delay before the events are available for viewing in InfoSphere Information Server Web console. (Note: Viewing DataStage events using the Director Client will not be affected.)
Engine disk usage can be affected in terms of storage space and I/O activity.

Logging Agent new features

The Logging Agent logs informational, status and error messages to its own log files. In Version 8.5, the Logging Agent log files are maintained as a rotating historical set. You can set optional properties to specify the maximum size of each file and the maximum number of historical files to maintain, as described in the “Configuring Logging Agent properties” section, below.

Also, the Logging Agent log files were renamed in Version 8.5, as follows:

Log file names in Version 8.1	Log file names in Version 8.5
logging-agent.outlogging-agent.err	logging-agent-N.outlogging-agent-N.errNote: Where N is the generation number (0-19 for the default limit of 20 files).

Configuring Logging Agent properties by using the RegistrationCommand script

The values for the following properties can be changed by using the RegistrationCommand script which is in ASBNode/bin (property names are case sensitive):

logFilePath=<ISInstallDir>/ASBNode/logs

Type a fully qualified directory where the intermediate log files write to. The InfoSphere DataStage primary user group (dstage) must have read and write permissions to this directory. The default value for this directory is

<ISInstallDir>/ASBNode/logs

agentLogLimit=1000000

Optional. Set the maximum size of Logging Agent log files to be retained (maximum size is 1 MB by default).

agentLogCount=20

Optional. Set the maximum number of Logging Agent log files to be retained (maximum count is 20 files by default)/

Syntax:
RegistrationCommand.<sh|bat> -setproperty <propertyName>
-value <propertyValue> -app LoggingAgent -user <isAdminUserid> -pw <password>

Where <propertyName> is one of the property names listed above, <propertyValue> is the value you want to set, and <isAdminUserid> and <password>are the credentials for a suite administrator.Example:
To set the directory for the log files on a Microsoft Windows system, run the command:
RegistrationCommand.bat -setproperty logFilePath
-value "d:\InformationServer\LoggingAgent\logs” -app LoggingAgent
-user <isAdminUserid > -pw <password>

You can use a similar invocation of RegistrationCommand to change the values of the log file properties, agentLogLimit and agentLogCount.

You must restart the Logging Agent for the new properties to take affect. See “Starting services” in the IBM InfoSphere Information Server Administration Guide:

Log Purging

In Version 8.1, InfoSphere DataStage log events are written either to the RT database or the OR database (before Version 8.1 there was no option to write events to the OR database). In Version 8.5, the Infosphere DataStage Director always logs/queries to/from the RT database. A subset of InfoSphere DataStage events can also be logged to the OR database (which is viewable from the Web Console and Information Services Console).

When log events are purged by using the DataStage Director, they are purged from both the RT database and OR database. When log events are purged by using the Information Server Web Console or Information Services Console, they are purged only from the OR database.

How do you disable the Resource Tracker or move the logs to a different directory in IBM InfoSphere Information Server?

Question

Resource tracker files are accumulating in /tmp or %TEMP%. What are the steps for disabling the Resource Tracker or moving the output files to a different directory?

Cause

The Resource Tracker logs the processor, memory, and I/O usage on each computer that runs parallel jobs. The machine log files are created by the DataStage Resource Tracker program.

There will be machine log files in /tmp or %TEMP%

The Resource Tracker uses a lot of memory and a lot of swap space, therefore you may want to consider disabling it or moving the logs it generates to a different directory

Answer

The resource_tracker is controlled by a file located in PXEngine/etc named restrack_settings. The format is shown in the following example:
LOGGING_INTERVAL=4
RUN_TIMEOUT=0
LOG_INCREMENT_INTERVAL=86400
CLEANUP_INTERVAL=432000
HANDLE_DIRECTORY=/tmp
LOG_DIRECTORY=/tmp

Here the logging interval is set to 4 or every 4 seconds write to the active log. The log increment interval is 86400 seconds or every 24 hours close the current log and start a new log. The cleanup interval is set to 432000 seconds or every 5 days delete logs greater than 5 days old. Lastly, the handle and log directory is set to /tmp or where the pid and logs are written.

Files in /tmp:

resource_tracker.8001.log – activity log file
apt_resource_track.8001.wb-machine – current settings in effect and resource_tracker process pid
machineLog.8001.wb-machine.20080508100536 – machine log data file(s)

The file that contains the activity for the resource_tracker is named resource_tracker.8001.log. In this log, you can see how the log manager rolls over the log files after the defined time of 86400 seconds or 24 hours and deletes log files older than 432000 seconds or 5 days. Below is a snapshot of a resource_tracker log file:

##IIIS-DSEE-TRES-0000310:05:37(000)Resource Tracker Log
##IIIS-DSEE-TRES-0000810:05:37(001)Opening log file: /tmp/machineLog.8001.hawkcewin01.20080508100536.
##IIIS-DSEE-TRES-0000910:05:37(000)Closing log file.
##IIIS-DSEE-TRES-0000810:05:37(001)Opening log file: /tmp/machineLog.8001.hawkcewin01.20080509100537.
##IIIS-DSEE-TRES-0001010:05:37(002)Removing old log: machineLog.8001.hawkcewin01.20080502061733.
##IIIS-DSEE-TRES-0001010:05:37(003)Removing old log: machineLog.8001.hawkcewin01.20080503061735.
##IIIS-DSEE-TRES-0001010:05:37(004)Removing old log: machineLog.8001.hawkcewin01.20080504061737.
##IIIS-DSEE-TRES-0000910:05:38(000)Closing log file.
##IIIS-DSEE-TRES-0000810:05:38(001)Opening log file: /tmp/machineLog.8001.hawkcewin01.20080510100538.
##IIIS-DSEE-TRES-0001010:05:38(002)Removing old log: machineLog.8001.hawkcewin01.20080505061739.
##IIIS-DSEE-TRES-0000910:05:42(000)Closing log file.
##IIIS-DSEE-TRES-0000810:05:42(001)Opening log file: /tmp/machineLog.8001.hawkcewin01.20080511100542.

You can either move the location of the files or disable the resource_tracker from starting.

In order to move the location to which the files are generated, edit the restrack_settings file in PXEngine/etc directory.Then, restart the DataStage Engine for the settings to take affect.
To disable the resource_tracker from starting:Unix/Linux PlatformsAdd the following to the dsenv file:APT_DISABLE_TRACKER_STARTUP=1; export APT_DISABLE_TRACKER_STARTUPThen, restart the DataStage Engine for the settings to take affect.orAdd the following to the System Environment variables:APT_DISABLE_TRACKER_STARTUP=1Then, restart the DataStage Engine for the settings to take effect.Windows PlatformsAdd the following to the System Environment variables:APT_DISABLE_TRACKER_STARTUP=1Reboot the machine.

The resource_tracker is started by the same script that starts the JobMonApp. The script is located in the PXEngine/java directory and it is named jobmoninit. In the jobmoninit script there is a bit of code that looks like this:

# startup resource tracker

if [ “$APT_DISABLE_TRACKER_STARTUP”X = “X” ] ; then
if [ -z “$rtlist” ] ; then

So, if APT_DISABLE_TRACKER_STARTUP is set, it will not start the resource_tracker here either.

On Unix (but NOT Windows) it is possible to have two instances for DataStage installed and running on the same Engine tier machine. However, multiple instances for of the Resource Tracker will interfere with each other unless some adjustments are made. Users can either:

Disable on instance of the resource tracker by setting APT_DISABLE_TRACKER_STARTUP in the dsenv file for one of the Datastage instances
Edit the the PXengine/bin/restrack_settings file for one of the instances and change the following values in one of the instances:HANDLE_DIRECTORY=/tmp
LOG_DIRECTORY=/tmpso that they point to a different directory than the values for the other instance. Be sure that the directories exist and have the right permissions.

InfoSphere Information Server DataStage Session Monitor

Question

In this technote:
http://www.ibm.com/support/docview.wss?rs=14&uid=swg21402667
it mentions the use of DataStage Session Monitor.
How does the DataStage Session Monitor get enabled?

Answer

To enable the DataStage Session Montor, follow these steps:

configure the value “MaxAllowedMissedHeartbeatCount” with a positive value in the isfconfig.properties.file, for example, 2

This indicates that only 2 missed heartbeats are allowed prior to a session being taken down. Each client has a different heartbeat which can range from 30 to 60 seconds.

This file is located under the directory, /IBM/WebSphere/AppServer/profiles/default/classes

Restart WebSphere Application Server
DataStage engine can be restarted but is not required
After restarting WebSphere Application Server, the following should show in the SystemOut.log (located under the directory, /IBM/WebSphere/AppServer/profiles/default/logs/server1):

[ISF-DEFAULT] [] Session Monitor has been enabled.

Related information

Session Monitor Documentation

Rebuild Repository Indexes in IBM InfoSphere Information Server

Question

How do I run Rebuild Repository Indexes on Windows? Unix?

Answer

Windows:

Open the DataStage Administrator
Under the Projects tab, select the project and click the Command button.
In the command window, enter the command DS.TOOLS (case sensitive) and click Execute:
The following window pops up.
Enter the number 2 in the input field at the bottom, and click the Respond button.
Uncheck the “Pause output at end of page” checkbox, and click Next. When complete, the screen below is displayed.
Click Respond when complete,

Unix/Linux:

Backup the project
cd `cat /.dshome`
. ./dsenv
./bin/uvsh
LOGTO project_name
DS.TOOLS
Select option 2 Rebuild Repository indexes
The second option requires the least amount of effort but you need to ensure that you either have an OS backup of the project directory or an export of the project before you begin.

Information Server 8.x transformer stage functions @INROWNUM and DSLinkRowCount return different results from the previous versions of DataStage

Question

Why do the functions @INROWNUM and DSLinkRowCount no longer start at 0 in Information Server 8.x versions?

Answer

The design intent was that row counts should start at 1. It was found that the results were not consistent across different retrieval methods, for example using DSLinkRowCount in a derivation worked correctly, but was wrong if used in a stage variable. The change seen is due to a fix, ecase 82865. The fix standardized the behavior of DSLinkRowCount and @INROWNUM to start from one in both cases, for transformers with no input links.

Starting IBM WebSphere Application Server (Linux, UNIX)

Starting a stand-alone WebSphere Application Server configuration:

Log in to the computer that hosts the services tier. Use the following credentials:
- If you configured WebSphere Application Server for non-root administration, use the credentials for the non-root user that is configured to administer WebSphere Application Server.
- If you did not configure WebSphere Application Server in this manner, log in as root.
Run the following commands:
```
cd /opt/IBM/InformationServer/ASBServer/bin
./MetadataServer.sh run
```
Note: Therunargument echoes all output to the console.

Alternatively, if you want to embed this script in another script, use the MetatdataServer.sh start command to launch the start process in the background:
```
cd /opt/IBM/InformationServer/ASBServer/bin
./MetadataServer.sh start
```
If you configured WebSphere Application Server for non-root administration, you can use this line in your script:
```
/usr/bin/su - wasadmin -c "/opt/IBM/InformationServer/ASBServer/bin/
   MetadataServer.sh start"
```

On each computer, start the following services: ASB Agent, Logging Agent, and DataStage® Service.
1. Wait until WebSphere Application Server is fully started.
2. Log in to each computer that hosts an engine tier. Use the following credentials:
  - If you configured the InfoSphere Information Server agents for non-root administration, use the credentials for the administrator user that you selected.
  - If you did not configure the agents in this manner, log in as root.
3. Run the following command to source the dsenv file:
```
. /opt/IBM/InformationServer/Server/DSEngine/dsenv
```
4. Make sure that the /.dshome file contains the current engine location. UNIX systems support multiple instances of InfoSphere DataStage. Thebin/uv -admin -startcommand starts the instance of InfoSphere DataStage that is in the /.dshome file.
5. Run the following commands to start the InfoSphere DataStage services:
```
cd /opt/IBM/InformationServer/Server/DSEngine
./bin/uv –admin –start
```
6. Run the following commands to start the ASB Agent and the Logging Agent:
```
cd /opt/IBM/InformationServer/ASBNode/bin
./NodeAgents.sh start
```

Categories: Datastage-Best Practices

Environment Variables and using Parameter Sets — Infosphere information Server 8.5

September 1, 2012 ukatru Leave a comment

In this article we will see how we can use parameter sets to defined Environment Variables in parallel job depeneding on the job size and volume of the data.

In the Administrator tool env variables are set for the project wide and we can override them by adding needed env variables in the datastage px job.

Usually in the Datastage Designer we add the environment variable to the job as shown below.

I found that adding env variable gui is not user friendly and you need to select one parameter at a time and it is time consuming process.

In order to overcome this issue we can use new parameter sets feature to add env variables.

GridJobs:

If you have grid enabled then create PS_GRID_SMALL_PX_JOB,PX_GRID_MEDIUM_PX_JOB,PX_GRID_LARGE_PX_JOB and add following env variables to the parameter set.

1)APT_GRID_QUEUE

The name of the resource manager queue to which a job is submitted.

2)APT_GRID_COMPUTENODES

High Volume Jobs:

It always good idea to add following env variables to the high volume parallel jobs

PS_HIGH_VOLUME_PX_JOBS

$APT_CONFIG_FILE: lets you define the biggest config file with the most number of nodes.(If grid is enabled we can remove this parameter)
$APT_SCORE_DUMP: when switched on it creates a job run report that shows the partitioning used, degree of parallelism, data buffering and inserted operators.Â Useful for finding out what your high volume job is doing.
$APT_PM_PLAYER_TIMING: this reporting option lets you see what each operator in a job is doing, especially how much data they are handling and how much CPU they are consuming.Â Good for spotting bottlenecks.

One way to speed up very high volume jobs is to pre-sort the data and make sure it is not resorted in the DataStage job.Â This is done by turning off auto sorting in high volume jobs:

APT_NO_SORT_INSERTION: stops the job from automatically adding a sort command to the start of a job that has stages that need sorted data such as Remove Duplicates.Â You can also add a sort stage to the job and set it to a value of “Previously Sorted” to avoid this is a specific job path.

Buffering is another thing that can be tweaked, it controls how data is passed between stages, usually you just leave it alone but on a very high volume job you might want custom settings:

APT_BUFFER_MAXIMUM_MEMORY: Sets the default value of Maximum memory buffer size.
APT_BUFFER_DISK_WRITE_INCREMENT: For systems where small to medium bursts of I/O are not desirable, the default 1MB write to disk size chunk size may be too small. APT_BUFFER_DISK_WRITE_INCREMENT controls this and can be set larger than 1048576 (1 MB). The setting may not exceed max_memory * 2/3.
APT_IO_MAXIMUM_OUTSTANDING: Sets the amount of memory, in bytes, allocated to a WebSphere DataStage job on every physical node for network communications. The default value is 2097152 (2MB). When you are executing many partitions on a single physical node, this number may need to be increased.
APT_FILE_EXPORT_BUFFER_SIZE: if your high volume jobs are writing to sequential files you may be overheating your file system, increasing the size of this value can deliver data to files in bigger chunks to combat long latency.

Small Jobs:

$APT_CONFIG_FILE: lets you define the small config file with the less number of nodes.(If grid is enabled we can remove this parameter)

$APT_EXECUTION_MODE: By default, the execution mode is parallel, with multiple processes. Set this variable to one of the following values to run an application in sequential execution mode: ONE_PROCESS, MANY_PROCESS and NO_SERLIALIZE

If we have many stages in a parallel job we can use $APT_THIN_SCORE parameter.

APT_THIN_SCORE: Setting this variable decreases the memory usage of steps with 100 operator instances or more by a noticable amount. To use this optimization, set APT_THIN_SCORE=1 in your environment. There are no performance benefits in setting this variable unless you are running out of real memory at some point in your flow or the additional memory is useful for sorting or buffering. This variable does not affect any specific operators which consume large amounts of memory, but improves general parallel job memory handling.

Categories: Datastage-Best Practices

Vertica database Installation steps on two node cluster(Redhat 5.8x86_64)

July 6, 2012 ukatru Leave a comment

Vertica is high-performing, advanced RDBMS that is very simple to install and administer, thanks to the its modern design and purpose built architecture.

Once we execute all preparatory steps on database servers and download Vertica software as per Installation Guide, we are starting installation process on a two node cluster (vertica1, vertica2):

[root@vertica1 tmp]# rpm -ivh vertica-ce-6.0.0-1.x86_64.RHEL5.rpm
Preparing… ########################################### [100%]
1:vertica-ce ########################################### [100%]

Vertica Analytic Database V6.0.0-1 successfully installed on host vertica1.ukatru.com

———————————————————————————-
Important Information
———————————————————————————-
If you are upgrading from a previous version, you must backup your database before
continuing with this install. After restarting your database, you will be unable
to revert to a previous version of the software.
———————————————————————————-

To download the latest Vertica documentation in zip or tar format please visit the
myvertica web site.

o complete installation and configuration of the cluster,
run: /opt/vertica/sbin/install_vertica
[root@vertica1 tmp]# /opt/vertica/sbin/install_vertica -s vertica1.ukatru.com,vertica2.ukatru.com -r vertica-ce-6.0.0-1.x86_64.RHEL5.rpm
Vertica Analytic Database 6.0.0-1 Installation Tool
Starting installation tasks…
Getting system information for cluster (this may take a while)….
Enter password for root@192.168.2.173 (2 attempts left):
Checking/fixing OS parameters…..

Setting vm.min_free_kbytes to 4096 …
Creating/Checking Vertica DBA group

Creating/Checking Vertica DBA user

Installing/Repairing SSH keys for dbadmin

Modifying existing Vertica DBA user

Creating Vertica Data Directory…

Testing N-way network test. (this may take a while)
All hosts are available …
Verifying system requirements on cluster.
IP configuration …
IP configuration …
Testing hosts (1 of 2)….

Running Consistency Tests
LANG and TZ environment variables …
Running Network Connectivity and Throughput Tests…
Waiting for 1 of 2 sites… …

Consistency Test (ok)
=========================

Info: The $TZ environment variable is not set on 192.168.2.172
Info: The $TZ environment variable is not set on 192.168.2.173

Network Test (ok)
=====================

Network communication (ok)
——————————
Low throughput 192.168.2.172 to 192.168.2.173: 88.8967450338 Mbps; check network interface/switch configuration
Low throughput 192.168.2.173 to 192.168.2.172: 87.0492314477 Mbps; check network interface/switch configuration
Updating spread configuration…
Verifying spread configuration on whole cluster.
Creating node node0001 definition for host 192.168.2.172
… Done
Creating node node0002 definition for host 192.168.2.173
… Done
Error Monitor 0 errors 0 warnings
Installation complete.

To create a database:
1. Logout and login as dbadmin.**
2. Run /opt/vertica/bin/adminTools as dbadmin
3. Select Create Database from the Configuration Menu

** The installation modified the group privileges for dbadmin.
If you used sudo to install vertica as dbadmin, you will
need to logout and login again before the privileges are applied.

Documentation:

http://my.vertica.com/docs/CE/6.0.0-1/HTML/index.htm#11927.htm

Categories: vertica

Older Entries