ZFS Migration and size calculation

Question & Answer

Question

ZFS Migration and calculating the -size parm

Answer

Abstract: Conversion to zFS File System

USS Support,
We are reviewing and starting to convert HFS file system datasets to zFS
file system datasets for as many datasets as we can.
A partner-in-crime discovered that when he attempted to convert one of
our USS root datasets for one of our MVS/USS LPAR's the conversion
utility complains about the /tmp and /dev directories stating they are
in use and the root file cannot be converted.

What I actually suspect is the root dataset cannot be unmounted from a
running OMVS system since it is in use.
The errors received from the conversion program did prompt some
questions in my mind which have caused us some grief up to this point in
time as well as a question about the proper means of converting a root
dataset to a zFS file from a HFS file.

Q1) It appears to me that if /etc which is a symbolic directory for
/SYSTEM/etc can be used as a mount point then I should be able to do the
same with /tmp.
Is this true?
We have been bitten a few times in the past with folks installing rather
large products, say BMC/Boole for example, and the install processes
create large amounts and sizes of /tmp files thus blowing the root
dataset out to 1000+ cylinders on DASD. As you know this is not easy to
reduce the size back down. So my thought is to create a separate zFS
dataset to represent /tmp and set a mount command in BPXPRMxx that looks
something like the following:
MOUNT FILESYSTEM('OMVS.Z18.TMP.SY3')
TYPE(ZFS)
MODE(RDWR)
MOUNTPOINT('/tmp')
Let me know if this should work.

Q2) The only way I can think of to copy the HFS root dataset to a zFS
dataset is as follows. If there is a better way documented can you
please point me to the doc or outline the process for me.
1) Back up and restore the HFS root dataset to a new DSN.
2) Create a temporary mount point directory on the running OMVS system
and mount the alternate root dataset.
3) Run the conversion utility against this directory mount point and
dataset hopefully resulting in a new zFS dataset which will remount
to the temporary directory mount point.
4) Shutdown the OMVS system and restart with the new zFS root dataset
pointed to by the BPXPRMxx member at IPL.
Will this work???

Thanks in advance for your thoughts on these issues.
Regards,

Hello,
The conversion utility does not work very well for System filesystems
especially the Root. The utility also uses /tmp and /dev so neither
of those should be converted with this tool.
Q2) You suggestion is indeed a good way to convert the root.
Clone the root and mount it somewhere else then convert the clone
and at the next IPL use the new ZFS for the Root.
You can also install z/OS onto ZFS instead of HFS which is another
good way to convert.
The paths /dev , /tmp , /etc and /var should all be separate filesystems
and not part of the root. /tmp and /dev can be ZFS, HFS or TFS.
There should be no need to convert these but simply create a new ZFS
and use that at the next IPL for /tmp and /dev . The system will
recreate any files needed in these directories.
Q1) Your suggested MOUNT statement for /tmp will work just fine.
The path /tmp will be resolved as it is actually a symlink to
$SYSNAME/tmp which sounds like it will resolve to /SYSTEM/tmp in
your case.

RESPOND ELECTRONICALLY:

Would it be OK and in your opinion be a good approach to complete a set
up as follow for the /tmp directory to remove the stored file from the
root file system?

1) Establish a DD statement in the OMVS procedure similiar to below:
//USSTMP DD DSN=SYS2.OMVS.SY4.TMPSTOR,
// DISP=(NEW,DELETE,DELETE),
// DATACLAS=HFSDATA

where SMS controls via the dataclass and specific dataset name the space
parameters associated with this HFS dataset.
2) Set the BPXPRMxx MOUNT command for /tmp to use DDNAME(USSTMP) to get
/tmp out of the root file system.

Note I am using HFS rather than zFS because zFS does not lend itself to
secondaries and dynamic expansion due to the preformatting required.

Thanks for our thoughts on this setup.
High regards,

Action taken:
Hello
I have never actually seen the DDNAME parm of a Mount used but it
is supposed to work. Yes, you put the DD card in the OMVS proc and
specify DDNAME on the mount.
I checked with our DFSMS folks and this should work.
One thing to check is that SMS is up before OMVS tries to do the mounts.
Look for IGD020I SMS IS NOW ACTIVE before the first Mount is processed
from BPXPRMxx ( BPXF013I FILE SYSTEM OMVS.Z19RS1.ROOT WAS SUCCESSFULLY
MOUNTED.)
You definitely want /tmp to be a separate filesystem.
Many folks use TFS for /tmp as it is recreated at each MOUNT or
IPL . The drawback to TFS is that it can not be dynamically grown in
size.
Many folks also use cron and skulker to clean out /tmp on a regular
basis. You might consider adding a
rm -R /tmp
to /etc/rc to empty it at IPL time.
The only other thing I can think of about the DDNAME might be that
implies you will shutdown OMVS gracefully and Unmount everything so
that the /tmp HFS can be deleted.
-------
As for ZFS, you can have a ZFS dynamically grow itself and it will
dynamically format the new additional space for you as part of
the grow process.
You can specify secondary extents and multiple volumes with a ZFS
when you define it. You can format the entire thing or just part of it
and let the grow process format the added part.
Hope this helps.
Many Thanks

RESPOND ELECTRONICALLY:

The plot thickens and I am beginning to believe that maybe I have drawn
a bad conclusion reference our current setup.
Presently I have the following directives in BPXPRMxx on each MVS LPAR.
FILESYSTYPE TYPE(TFS)
ENTRYPOINT(BPXTFS)
..
MOUNT FILESYSTEM('/tmp')
TYPE(TFS)
MODE(RDWR)
MOUNTPOINT('/tmp')
PARM('-s 100')

The reason I became suspious of /tmp was for no apparent reason we
discovered one of our root HFS dasd files on one system had grown from
its original size of roughtly 2,250 tracks (150 Cyls) to over 50,000
tracks (3,300+ Cyls).

I may have wrongly assumed that /tmp was the culprit but I cannot find
any other additions of permanent files or directory structures in the
file systems nor was there any residue/files of any significant size in
/tmp when we discovered the out of hand growth.

Let me ask this?
With the above directives in BPXPRMxx for TFS and the mount of /tmp to
TFS where do temporary files actually get stored and if in a dataspace
will large usage overflow to the root HFS where the original /tmp is
part of?

Also, I have not discovered a means of preventing the format utility for
a zFS VSAM Linear dataset from formatting all primary and secondary
spaces, E.G. you indicated we should be able to only initially format
the original primary space value not and secondaries?

Thanks again for adding to my education.
Regards,

Action taken:
Hello
OK, let's fall back and Punt.... ;-)
If your Root is getting filled up, then that is not overflow from
/tmp. It is something else that is writing into your Root filesystem.
Once common culprit is CEEDUMPs so if you see any files that have
CEEDUMP as part of the filename then those need to be removed.
SMP/E can also write to the Root if you do not have your DDDEFs just
the right way.
Perhaps we should check how full the root currently is with the
OMVS command:
df -vk /
On our 1.9 system it looks like this:
$df -vk /Z19RS1
Mounted on Filesystem Avail/Total
/Z19RS1 (OMVS.Z19RS1.ROOT) 132186/2179440
ZFS, Read Only, Device:2, ACLS=Y
File System Owner : ZOS1A Automove=Y Client=N
Filetag : T=off codeset=0
Aggregate Name : OMVS.Z19RS1.ROOT
The total size of 2179440 1K blocks needs to be divided by 8 to get
8K blocks then divide by 6 to get tracks, then divide by 15 to get
cylinders which comes to 3027 cylinders.
New products should be installed in their own filesystems usually
mounted off of /usr/lpp/<product-mount-point>
We have Shared FIlesystem in a Sysplex so I needed to put the Version
ZFS in the df command here.
A df -vk will show all your filesystems.
Our 1.8 Version filesystem is
$df -vk /Z18RS1
Mounted on Filesystem Avail/Total
/Z18RS1 (OMVS.Z18RS1.ROOT) 194953/1700640
ZFS, Read Only, Device:39, ACLS=Y
File System Owner : ZOS18 Automove=Y Client=N
Filetag : T=off codeset=0
Aggregate Name : OMVS.Z18RS1.ROOT
and that equates to 2362 cylinders (1700640/8/6/15)
----
With a TFS mounted at /tmp then anything writen to /tmp will be placed
in a Dataspace up to the 100 Megs size you have specified in the
PARM('-s 100')
If that fills up, then you get the standard out of space error.
There is no overflowing to the Root.
----
If you have userids in your RACF database that have HOME=/ then if
any of these userids login to OMVS, then they may be writing into
the Root. Files like /.sh_history and /.profile may be out there.
If any of these users are using Ported Tools ( sftp ) then they
may have create a /.ssh sub-directory with stuff in there.
--------
Now, for your ZFS question, you may want to have a look at the
DFS ZFS Admin manual in section 2.2.1 ioeagfmt
found at URL
publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/FCXD5A71/2.2.1
The "trick" is what you specify for the allocation and the -size
and -grow parms.
For example, if you specify CYL(10 5) and do not specify -size
or -grow, then ioeagfmt will format the primary space of 10 cylinders.
When that fills, then (assuming you mounted with AGGRGROW) the
secondary size of 5 cylinders will be allocated and formated for
you dynamically.
In the example in section
1.4.4 Creating a multi-volume compatibility mode aggregate
it shows formating the entire ZFS on all the volumes.
It takes time to format 3336 cylinders so if this is done dynamically
then the end user sits and waits for that to finish. The initial
allocation on a new volume is the primary size....
This example shown means that it has had all 10 volumes formated and
when that fills up all 10 volumes then it has no more growth available
and will report that it is full..
Clear as mud ??
Hope this helps

RESPOND ELECTRONICALLY:
My first read of what you have provided gives me a lot to review and is
great as a starting point. Let me digest what you ahve provided a bit,
do some displays and research and see if we can determine a culprit.
I love to give folks a hard time so maybe we can have some fun here if
we can find who nailed us! Just joking!
I will investigate and play with the zFS format utility and try to get a
better understanding of aggregates; which I am dumb about the concept at
the present time, and review the available mount parameters you have
provided to get a better understanding of the zFS approach.

I just allocated and formatted a new zFS dataset for use this upcoming
weekend as a SYSLOGD dataset storage area for our Production MVS/USS
LPAR.
It took some trial and error but hopefully I have some basic idea of how
the format utility works after reading the information your provided.
Let's see if I determined the correct values for the allocate/format
values.

1) I used to following utility parameters to allocate and format the
primary extent only of a 200 cylinder dataset.

PARM='-aggregate &HFSDSN -compat -size 17820 -grow 8910'

2) I calculated the -size value as ((200 cyls - 2 cyls[1% of size]) *
90[8K CI's/cyl])

So 198 * 90 = 17,820

This leaves 1% for -logsize.

3) I calculated the -grow the same way.

((100 cyls - 1 cyl[1% of 2nd space] * 90)

So 99 * 90 = 8,910

4) Resulting messages from format request:

IOEZ00004I Formatting to 8K block number 18000 for primary extent of
SYS2.OMVS.ZFS.SYSLOGD.SY2.
IOEZ00005I Primary extent loaded successfully for
SYS2.OMVS.ZFS.SYSLOGD.SY2.
IOEZ00535I *** Using initialempty value of 1.
*** Using 17999 (8192-byte) blocks
*** Defaulting to 179 log blocks(maximum of 19 concurrent transactions).

IOEZ00327I Done. SYS2.OMVS.ZFS.SYSLOGD.SY2 is now a zFS aggregate.
IOEZ00048I Detaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2
IOEZ00071I Attaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2 to create
HFS-compatible file system
IOEZ00074I Creating file system of size 144000K, owner id 0, group id 9,
permissions x1ED
IOEZ00048I Detaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2
IOEZ00077I HFS-compatibility aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2 has
been successfully created

<Questions/Clarifications>
Q1) Are the above formula's correct or do they need adjustments for
normal use?

Q2) I assumed, since it did not state so, a log area will be left within
each additional file extension when extended and formatted?

Q3) Is there any special attributes that need to be specified with the
USS mount command to allow for the extensions to take place if/when
required?

Q4) When file data is deleted/removed from a zFS file is the space
available for immediate reuse?

Thanks for all your insight into this utility and USS in general.

High Regards,
----------------------------------------------------------------------
I think you have over-calculated here a bit and been confused with
some default actions.
The -size parm should be the total size including the log area.
You do not need to try to calculate the size without the log stuff.
When you specified less than the initial allocation of 200 cylinders
the program rounded that value up to the 200 on you.
So you wound up with 18000 8K blocks which is 200 cylinders.
Out of that 18000 blocks, the log area will be taken.
--------------------
The -grow option will increase the size BEYOND the initial allocation
size of 200 cylinders by the amount you specify but since you
specified less than 200 cylinders for the size, it rounded the size
to 200 cylinders and did not need to grow it to get to 200 cylinders.
So if you allocated this ZFS with CYL(200 100) and you want it to be
300 cylinders in size then you would specify
-size 27000 -grow 9000
The primary size of 18000 will taken and then it will grow that to
the total size asked for by an additional 9000 blocks.
If you used a secondary size of 50 instead of 100, then two extents
would have been taken to get to the desired size.
-
Out of that 27000 blocks, the log area will be taken so you do not
have to try to calculate what that would be.
If the ZFS grows, then additional log space will be taken as
needed.
-
Hopefully that helps with your Q1 and Q2
For Q3, when you mount the ZFS, you need one of
a) specify PARM(AGGRGROW) on the MOUNT command
OR
b) specify aggrgrow in the IOEPRMxx file for all ZFS's.
For Q4, when you remove a file from a ZFS ( or HFS ), if the file
is not in use, then the space is immediately available for reuse.
You will want to read section 1.4.9 zFS disk space allocation
in the ZFS Admin manual a couple of times tho as the way the space
is managed will be of interest.
The URL for that section is
publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/FCXD5A80/1.4.9
Hope this helps
Please let us know if you have further questions,

RESPOND ELECTRONICALLY:

You say I over calculated. If you believe I did then please shed some
light on the below circumstances and messages received. Here is what I
have tried on our system with the results, for your evaluation.
Here are three executions using the same VSAM LDS define but different
format utility parm?s with the results received for each from the
utility.

1) KDS file CYL(200 100).
PARM='-aggregate &HFSDSN -compat -size 18000 -grow 9000'
<Msgs>
IOEZ00004I Formatting to 8K block number 18000 for primary extent of
SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST.
IOEZ00005I Primary extent loaded successfully for
SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST.
IOEZ00535I *** Using initialempty value of 1.
IOEZ00323I Attempting to extend SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST to 18000
8K blocks.
IOEZ00326E Error 8 extending SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST
IOEZ00328E Errors encountered making SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST a
zFS aggregate.

? I believe the utility is stating 18000 blocks will not fit in 1st
extent.

2) KDS file CYL(200 100).
Note: ?size, -grow reduced by one.
// PARM='-aggregate &HFSDSN -compat -size 17999 -grow 8999'

<Msgs>
IOEZ00004I Formatting to 8K block number 18000 for primary extent of
SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST.
IOEZ00005I Primary extent loaded successfully for
SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST.
IOEZ00535I *** Using initialempty value of 1.
*** Using 17999 (8192-byte) blocks
*** Defaulting to 179 log blocks(maximum of 19 concurrent transactions).

IOEZ00327I Done. SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST is now a zFS aggregate.

IOEZ00048I Detaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST
IOEZ00071I Attaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST to create
HFS-compatible file system
IOEZ00074I Creating file system of size 144000K, owner id 0, group id 9,
permissions x1ED
IOEZ00048I Detaching aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST
IOEZ00077I HFS-compatibility aggregate SYS2.OMVS.ZFS.SYSLOGD.SY2.TEST
has been successfully created

? Works OK with 17999; 18,000 specified does not work OK????????

This is what lead me to the conclusion I needed to reduce the ?size and
?grow by 1% because on my initial try, when I used 18000, the format
failed as you can see above.

Ref. Q4 ? space reuse.
You seem to be inferring you must dismount a file to ever get space
reuse?
This is a bit hard to do on a running system say for SYSLOGD without
loosing messages? I will read thoroughly what doc you have pointed me
too but this just does not seem good! Maybe this is why my Production
SYSLOGD file has grown to over 1,000 cylinders.
There has got to be a better way?
I use skulker to clean older log files and rmdir to remove empty past
directories after skulker runs each morning just after midnight against
the SYSLOGD storage file structures. But my DASD dataset just keeps
growing and growing like the energizer bunny!

Thanks for you continued feedback.
Regards,
--------------------------------------------------------------
Hello
This is becoming very interesting.
If you do not specify -size or -grow then you get an ZFS with
18000 8K blocks which is 200 cylinders which is the primary allocation.
If you try to specify -size 18000 with no -grow parm then I would have
expected that to also work and give exactly the same results but it
does not.
Using 17999 is less than the primary allocation so it basically ignores
the value and performs the same as not specifing -size and works.
There may be some sort of counting from zero instead of counting from
one type of thing here (0-17999 is 18000 blocks).
It almost looks like a -size of 18000 is actually 18001 blocks maybe ??
I need to discuss this with Level 2...
Many Thanks

Hello
Well, I managed to recreate what you are seeing and this may be
a bug..
With a primary size of 200 cylinders which is 18000 blocks,
if I do NOT specify size, then it creates a ZFS of 18000 blocks.
If I do specify size of 18000 then it fails trying to get a secondary
extent but it should not need to do that.
If I use -size 27000 and -grow 9000 then that gives me a ZFS of
27000 blocks.
Can you please open a NEW PMR to zFS Defect folks reporting this
inconsistancy when the -size value is exactly the Primary
allocation size. In this case, the bypass is simply to not specify
-size at all.
---------------------------------------------------
For Q4, the space should be returned if the file is removed and
nothing is using it.... If the file is still open by some application
then it will continue to be used and the dasd space will continue
to be held until the application closes the file.
We may need to review exactly how you are handling this switch of
syslogd log files. Unfortunately, syslogd is part of TCPIP so perhaps
you can please open a new PMR to TCPIP Q&A to ask how to handle
the switch of syslogd log files.
Just to make sure things are really closed, you can kill syslogd
at some convinient time and check the size of the filesystem
with either the df -vk command or the zfsadm aggrinfo command.
That should show you how much space is available.
You will want to do those commands before stopping syslogd and then
again after to see if there is a difference.
You should not have to unmount the filesystem to get space back.

RESPOND ELECTRONICALLY:

Many thanks back at ya for all the help in understanding these pieces of
the puzzle.
I agree, from my testing, with level 2's assessment of the format
utility. It seems to only fail when you specify the exact blk count. Who
else would try that and find this but me!
One final clarification if I may and we are ready to close the ETR.
I plan to add the aggrgrow= parameter to my SYS1.PARMLIB member
"IOEFSPRM" on two LPAR's for this upcoming weekend's IPL sequence.

From the V1R8 DFS ZFS Admin manual it it not obvious to me what the
added parameter should look like "case" wise.

In the doc on page 122 it shows the following:

Default Value Off
Expected Value On or off
Example aggrgrow=on

whereas when I review the IBM delivered member "IOEFSPRM" it shows for
example

*auto_attach=ON

Q) What is the correct form of the "on" keyword case wise;
all upper case,
all lower case,
mixed case, E.G. On

I cannot find any statements reference the parameter values and case in
the doc so far.

I did find the following text in the Admin manual:

Any line beginning with # or * is considered a comment.
==> The text in the IOEFSPRM file is case insensitive.
==> Any option or value can be upper or lower case.
Blank lines are allowed.
You should not have any sequence numbers in the IOEFSPRM file.
If you specify an invalid text value, the default value will be
assigned.
If you specify an invalid numeric value, and it is smaller than the
minimum allowed value, the minimum value will be assigned.
If you specify an invalid numeric value, and it is larger than the
maximum allowed value, the maximum value will be assigned.

Is this stating that "on", "On", or "ON" are all interchangeable as a
stated option for aggrgrow?

Thanks and hopefully we can get this closed today.
Again, thanks for all your assistance.

Action taken:
Hello
Just to clarify a small point ( sorry for being nit-picking)
There are 2 way to specify ZFS parms.
1) use a //IOEZPRM DD in the ZFS proc to point at IOEFSPRM
OR
2) use the PARM('PRM=(01,02,03)') on the FILESYSTYPE in BPXPRMxx
to point to Parmlib member(s) IOEPRMnn where nn is 01,02,03
In both cases, the doc indicates that the case does not matter
so you can have On or ON or oN or on and all will work.
---

Reason For Closure:

Thanks for all your patience and support on this matter. I am closing
this ETR as complete. I do plan to open a DEFECT PMR with IBM for
resolution of the error discovered and discussed in this ETR.

So I will most likely refer to this ETR in the defect
PMR. I have a printed copy of the text in this ETR on file at my desk
for reference in creating the PMR defect request.
Again thanks for the service. This ETR is being closed by the customer
as complete.
Regards,

[{"Product":{"code":"SWG90","label":"z\/OS"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":["5695SCPX1 - Z\/OS UNIX SYSTEM SERVICES KERNEL AND FILE SYSTEM","5695SCPX1 - Z\/OS UNIX SYSTEM SERVICES KERNEL AND FILE SYSTEM"],"Platform":[{"code":"","label":"MVS\/ESA"},{"code":"","label":"OS\/390"}],"Version":"1.8;1.9","Edition":"","Line of Business":{"code":"LOB56","label":"Z HW"}}]

Tips

ZFS Migration and size calculation

Question & Answer

Question

Answer

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?