Friday, September 23, 2011

Running IOmeter from multiple machines

IOmeter is broken into two pieces, dynamo which is the workload generator and IOmeter which is a GUI to control dynamo. You can run instances of dynamo on multiple machines and have them all controlled by one IOmeter GUI instance.

If you would like to run IOmeter from multiple machines against one target:

1) Make sure you have the target drive mapped on all of the source machines2) Open up command prompt and change to the IOmeter directory

3) Execute the following command to start dynamo and have it log on to a remote IOmeter server

c:\Program Files (x86)\Iometer.org\Iometer 2006.07.27>dynamo -i 10.0.101.51 -m 10.0.102.185

In this example 10.0.101.51 is the remote IOmeter server that will control the test and 10.0.102.185 is the local machine

4) You should now see a new manager listed on the IOmeter server

In this picture, LAB-NEXENTA-CL is a remote dynamo session being controlled by an IOmeter running on my desktop

Monday, September 12, 2011

Nexenta - reverting to a previous checkpoint

In the previous post I spoke of corruption on the syspool volume. Instead of reloading the OS I thought I would try to revert to a previous checkpoint and see if that would clear things up. Keep in mind that all a checkpoint is is a snapshot of the syspool.

I'm going to do all of this through the NMC, the NMV always seems to have limitations
To see checkpoints:

nmc@lab-storage:/$ setup appliance checkpoint show
ROOTFS CREATION CHECKPOINT-TYPE CURRENT DEFAULT VERSION
rootfs-nmu-004 Sep 7 14:02 2011 rollback No No 3.1.1
rootfs-nmu-003 Sep 7 13:16 2011 rollback No No 3.1.1
rootfs-nmu-000 Sep 7 11:44 2011 initial No No 3.1.0
rootfs-nmu-002 Sep 7 11:24 2011 upgrade Yes Yes 3.1.1
rootfs-nmu-001 Sep 7 11:13 2011 rollback No No 3.1.0

(I know, the formatting sucks) You can see that "rootfs-nmu-002" is the current checkpoint. I want to make rootfs-nmu-004 the current one.

nmc@lab-storage:/$ setup appliance checkpoint rootfs-nmu-004 activate
Activate rollback checkpoint 'rootfs-nmu-004'? Yes
Checkpoint 'rootfs-nmu-004' has been activated. You can reboot now.

nmc@lab-storage:/$ setup appliance reboot
Reboot appliance 'lab-storage' ? Yes
Operation in progress, it may take up to 30sec, please do not interrupt...

Once the box came back up I re-ran the scrub using "zpool scrub syspool" and there were fewer errors detected, but there were still errors. I went further and further back through the checkpoints and they all had corruption. So now I'll reload the OS. Once nice thing about ZFS is that it all the volume information is saved on the drives.

Lesson learned: always used mirrored syspools

nexenta / zfs volume scrub

saw that the auto-scrub service on the lab-storage Nexenta box came across some corruption on the syspool volume and wanted to verify it by running the scrub process again. You can do this through the GUI, but that's no fun. Jump to the bash shell and run "zpool scrub [-s] pool ..."

From the man page:
zpool scrub [-s] pool ...

Begins a scrub. The scrub examines all data in the specified pools to verify that it checksums correctly.
For replicated (mirror or raidz) devices, ZFS automatically repairs any damage discovered during the
scrub. The "zpool status" command reports the progress of the scrub and summarizes the results of the
scrub upon completion.

Scrubbing and resilvering are very similar operations. The difference is that resilvering only examines
data that ZFS knows to be out of date (for example, when attaching a new device to a mirror or replacing
an existing device), whereas scrubbing examines all data to discover silent errors due to hardware faults
or disk failure.

Because scrubbing and resilvering are I/O-intensive operations, ZFS only allows one at a time. If a scrub
is already in progress, the "zpool scrub" command terminates it and starts a new scrub. If a resilver is
in progress, ZFS does not allow a scrub to be started until the resilver completes.

-s Stop scrubbing.

zpool status [-xv] [pool] ...

Displays the detailed health status for the given pools. If no pool is specified, then the status of each
pool in the system is displayed. For more information on pool and device health, see the "Device Failure
and Recovery" section.

If a scrub or resilver is in progress, this command reports the percentage done and the estimated time to
completion. Both of these are only approximate, because the amount of data in the pool and the other work-
loads on the system can change.

-x Only display status for pools that are exhibiting errors or are otherwise unavailable.

-v Displays verbose data error information, printing out a complete list of all data errors since the
last complete pool scrub.


Output of the commands:

root@lab-storage:/volumes/volume01/vlab# zpool scrub syspool
root@lab-storage:/volumes/volume01/vlab# zpool status syspool
pool: syspool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub in progress since Mon Sep 12 07:04:33 2011
515M scanned out of 4.36G at 51.5M/s, 0h1m to go
0 repaired, 11.52% done
config:

NAME STATE READ WRITE CKSUM
syspool ONLINE 0 0 0
c0t0d0s0 ONLINE 0 0 0

errors: 3 data errors, use '-v' for a list