zfs-discuss@opensolaris.org
[Top] [All Lists]

[zfs-discuss] Re: [zfs-code] Space allocation failure

Subject: [zfs-discuss] Re: [zfs-code] Space allocation failure
From: Manoj Joseph
Date: Thu, 28 Jun 2007 01:42:57 +0530
Hi,

In brief, what I am trying to do is to use libzpool to access a zpool - like ztest does.

Matthew Ahrens wrote:
Manoj Joseph wrote:
Hi,

Replying to myself again. :)

I see this problem only if I attempt to use a zpool that already exists. If I create one (using files instead of devices, don't know if it matters) like ztest does, it works like a charm.

You should probably be posting on zfs-discuss.

Switching from zfs-code to zfs-discuss.

The pool you're trying to access is damaged. It would appear that one of the devices can not be written to.

No, AFAIK, the pool is not damaged. But yes, it looks like the device can't be written to by the userland zfs.

bash-3.00# zpool import test
bash-3.00# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test    85K  1.95G  24.5K  /test
bash-3.00# ./udmu test
 pool: test
 state: ONLINE
 scrub: none requested
 config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          c2t0d0    ONLINE       0     0     0

errors: No known data errors
Export the pool.
cannot open 'test': no such pool
Import the pool.
error: ZFS: I/O failure (write on <unknown> off 0: zio 8265d80 [L0 unallocated] 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb LE contiguous birth=245 fill=0 cksum=6bba8d3a44:2cfa96558ac7:c732e55bea858:2b86470f6a83373): error 28
Abort (core dumped)
bash-3.00# zpool import test
bash-3.00# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          c2t0d0    ONLINE       0     0     0

errors: No known data errors
bash-3.00# touch /test/z
bash-3.00# sync
bash-3.00# ls -l /test/z
-rw-r--r--   1 root     root           0 Jun 28 04:18 /test/z
bash-3.00#

The userland zfs's export succeeds. But doing a system("zpool status test") right after the spa_export() succeeds shows that the the 'kernel zfs' still thinks it is imported.

I guess that makes sense. Nothing has been told to the 'kernel zfs' about the export.

But I still do not understand why the 'userland zfs' can't write to the pool.

Regards,
Manoj

PS: The code I have be tinkering with is attached.


--matt


Any clue as to why this is so would be appreciated.

Cheers
Manoj

Manoj Joseph wrote:
Hi,

I tried adding an spa_export();spa_import() to the code snippet. I get a similar crash while importing.

I/O failure (write on <unknown> off 0: zio 822ed40 [L0 unallocated] 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb LE contiguous birth=4116 fill=0 cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error 28 Abort (core dumped)

I thought ztest could use an existing pool. Is that assumption wrong?

These are the stacks of interest.

 d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19
 d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e
 d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69
 d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24
 d131e4d2 cv_wait  (81c3e0c, 81c3d6c) + 5e
 d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179
 d12f9080 spa_config_update (819dac0, 0) + c4
 d12f467a spa_import (8047657, 8181f88, 0) + 256
 080510c6 main     (2, 804749c, 80474a8) + b2
 08050f22 _start   (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a


 d131ed79 vpanic   (d1341dbc, ca5cd248) + 51
 d131ed9f panic    (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f
 d131921d zio_done (822ed40) + 455
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d131c15d zio_next_stage (822ed40) + 161
 d131ba83 zio_vdev_io_assess (822ed40) + 183
 d131c15d zio_next_stage (822ed40) + 161
 d1307011 vdev_mirror_io_done (822ed40) + 421
 d131b8a2 zio_vdev_io_done (822ed40) + 36
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d1306be6 vdev_mirror_io_start (822ed40) + 1d2
 d131b862 zio_vdev_io_start (822ed40) + 34e
 d131c313 zio_next_stage_async (822ed40) + 1ab
 d131bb47 zio_vdev_io_assess (822ed40) + 247
 d131c15d zio_next_stage (822ed40) + 161
 d1307011 vdev_mirror_io_done (822ed40) + 421
 d131b8a2 zio_vdev_io_done (822ed40) + 36
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a
 d1318c88 zio_wait_children_done (822ed40) + 18
 d1306be6 vdev_mirror_io_start (822ed40) + 1d2
 d131b862 zio_vdev_io_start (822ed40) + 34e
 d131c15d zio_next_stage (822ed40) + 161
 d1318dc1 zio_ready (822ed40) + 131
 d131c15d zio_next_stage (822ed40) + 161
 d131b41b zio_dva_allocate (822ed40) + 343
 d131c15d zio_next_stage (822ed40) + 161
 d131bdcb zio_checksum_generate (822ed40) + 123
 d131c15d zio_next_stage (822ed40) + 161
 d1319873 zio_write_compress (822ed40) + 4af
 d131c15d zio_next_stage (822ed40) + 161
 d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a
 d1318c68 zio_wait_children_ready (822ed40) + 18
 d131c313 zio_next_stage_async (822ed40) + 1ab
 d1318b1f zio_nowait (822ed40) + 1b
 d12c6941 arc_write (82490c0, 819dac0, 7, 3, 2, 1014) + 1ed
 d12ce7ae dbuf_sync (82bd008, 82490c0, 82beb40) + e6e
 d12e2ecb dnode_sync (82ea090, 0, 82490c0, 82beb40) + 517
 d12d663a dmu_objset_sync_dnodes (82a6e00, 82a6ee4, 82beb40) + 14e
 d12d6983 dmu_objset_sync (82a6e00, 82beb40) + 137
 d12ec20e dsl_pool_sync (81c3cc0, 1014, 0) + 182
 d12f7db6 spa_sync (819dac0, 1014, 0) + 26e
 d12fdf14 txg_sync_thread (81c3cc0) + 2a8
 d11d7604 _thr_setup (ccdda400) + 52
 d11d7860 _lwp_start (ccdda400, 0, 0, 0, 0, 0)

Regards,
Manoj

Manoj Joseph wrote:
Hi,

Trying to understand the zfs code, I was playing with libzpool - like ztest does.

What I am trying to do is create an object in a zfs filesystem. But I am seeing failures when I try to sync changes.

bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P DVA[0]=<0:0:200> fletcher2 uncompressed LE contiguous birth=7 fill=0 cksum=4141414141414141:4141:2828282828282820:82820): error 28
Abort (core dumped)

I can see that space does not get allocated and sync results in ENOSPC. There is plenty of space in the pool. So, that is not the issue. I guess I am missing the step of allocating space. Could someone help me figure out what it is?

I have tried to follow what ztest does. This is what I do:

<code snippet>
kernel_init(FREAD | FWRITE);
buf = malloc(BUF_SIZE);
memset(buf, 'A', BUF_SIZE);

error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os);
if (error) {
    fprintf(stderr, "dmu_objset_open() = %d", error);
    return (error);
}
tx = dmu_tx_create(os);
dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE);
// dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT);

error = dmu_tx_assign(tx, TXG_WAIT);
if (error) {
    dmu_tx_abort(tx);
    return (error);
}

object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0,
    DMU_OT_NONE, 0, tx);

printf("Object %lld\n", object);

dmu_write(os, object, 0, BUF_SIZE, buf, tx);
dmu_tx_commit(tx);

txg_wait_synced(dmu_objset_pool(os), 0);
</code snippet>

It is interesting that the checksum that is reported is the pattern that I try to write.

This is the panic stack:
 d11d8e65 _lwp_kill (98, 6) + 15
 d1192102 raise    (6) + 22
d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, ce3fdcbc) + cd
 d131ed79 vpanic   (d1341dbc, ce3fdcc8) + 51
 d131ed9f panic    (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f
 d131921d zio_done (81f5d00) + 455
 d131c15d zio_next_stage (81f5d00) + 161
 d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a
 d1318c88 zio_wait_children_done (81f5d00) + 18
 d131c15d zio_next_stage (81f5d00) + 161
 d131ba83 zio_vdev_io_assess (81f5d00) + 183
 d131c15d zio_next_stage (81f5d00) + 161
 d1307011 vdev_mirror_io_done (81f5d00) + 421
d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, d11ba2df, 3) + 36
 d131f585 taskq_thread (81809c0) + 89
 d11d7604 _thr_setup (d0e0ac00) + 52
 d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0)

Thanks in advance!

Regards,
Manoj

#include <sys/zfs_context.h>
#include <sys/spa.h>
#include <sys/dmu.h>
#include <sys/txg.h>
#include <sys/zap.h>
#include <sys/dmu_traverse.h>
#include <sys/dmu_objset.h>
#include <sys/poll.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/zio.h>
#include <sys/zio_checksum.h>
#include <sys/zio_compress.h>
#include <sys/zil.h>
#include <sys/vdev_impl.h>
#include <sys/spa_impl.h>
#include <sys/dsl_prop.h>
#include <sys/refcount.h>
#include <stdio.h>
#include <stdio_ext.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <umem.h>
#include <dlfcn.h>
#include <ctype.h>
#include <math.h>
#include <sys/fs/zfs.h>

#include <sys/dditypes.h>
#include <devid.h>

#define BUF_SIZE 4192
#define FATAL_MSG_SZ    1024

int zs_vdev_primaries = 0;
static char ztest_dev_template[] = "%s/%s.%llua";
static char *zopt_dir = "/tmp";
static char *zopt_pool = "udmu";
static int zopt_ashift = SPA_MINBLOCKSHIFT;
static int zopt_raidz_parity = 1;
char *fatal_msg;

static void
fatal(int do_perror, char *message, ...)
{
        va_list args;
        int save_errno = errno;
        char buf[FATAL_MSG_SZ];

        (void) fflush(stdout);

        va_start(args, message);
        (void) sprintf(buf, "ztest: ");
        /* LINTED */
        (void) vsprintf(buf + strlen(buf), message, args);
        va_end(args);
        if (do_perror) {
                (void) snprintf(buf + strlen(buf), FATAL_MSG_SZ - strlen(buf),
                    ": %s", strerror(save_errno));
        }
        (void) fprintf(stderr, "%s\n", buf);
        fatal_msg = buf;
        exit(3);
}

static nvlist_t *
make_vdev_file(size_t size)
{
        char dev_name[MAXPATHLEN];
        uint64_t vdev;
        uint64_t ashift = zopt_ashift;
        int fd;
        nvlist_t *file;

        if (size == 0) {
                (void) snprintf(dev_name, sizeof (dev_name), "%s",
                    "/dev/bogus");
        } else {
                vdev = zs_vdev_primaries++;
                (void) sprintf(dev_name, ztest_dev_template,
                    zopt_dir, zopt_pool, vdev);

                fd = open(dev_name, O_RDWR | O_CREAT | O_TRUNC, 0666);
                if (fd == -1)
                        fatal(1, "can't open %s", dev_name);
                if (ftruncate(fd, size) != 0)
                        fatal(1, "can't ftruncate %s", dev_name);
                (void) close(fd);
        }

        VERIFY(nvlist_alloc(&file, NV_UNIQUE_NAME, 0) == 0);
        VERIFY(nvlist_add_string(file, ZPOOL_CONFIG_TYPE, VDEV_TYPE_FILE) == 0);
        VERIFY(nvlist_add_string(file, ZPOOL_CONFIG_PATH, dev_name) == 0);
        VERIFY(nvlist_add_uint64(file, ZPOOL_CONFIG_ASHIFT, ashift) == 0);

        return (file);
}

static nvlist_t *
make_vdev_dev()
{
        char dev_name[MAXPATHLEN] = "/dev/dsk/c2t0d0s0";
        uint64_t vdev;
        uint64_t ashift = zopt_ashift;
        int fd;
        nvlist_t *vdev_nv;
        boolean_t wholedisk = B_FALSE;

        ddi_devid_t devid;
        char *minor = NULL, *devid_str = NULL;

        VERIFY(nvlist_alloc(&vdev_nv, NV_UNIQUE_NAME, 0) == 0);
        VERIFY(nvlist_add_string(vdev_nv, ZPOOL_CONFIG_TYPE, VDEV_TYPE_DISK) == 
0);
        VERIFY(nvlist_add_string(vdev_nv, ZPOOL_CONFIG_PATH, dev_name) == 0);
        VERIFY(nvlist_add_uint64(vdev_nv, ZPOOL_CONFIG_ASHIFT, ashift) == 0);
        VERIFY(nvlist_add_uint64(vdev_nv, ZPOOL_CONFIG_WHOLE_DISK,
                (uint64_t)wholedisk) == 0);

        if ((fd = open(dev_name, O_RDONLY)) < 0) {
                (void) fprintf(stderr, "cannot open '%s': %s\n",
                    dev_name, strerror(errno));

                nvlist_free(vdev_nv);
                return (NULL);
        }

        if (devid_get(fd, &devid) == 0) {
                if (devid_get_minor_name(fd, &minor) == 0 &&
                    (devid_str = devid_str_encode(devid, minor)) !=
                    NULL) {
                        verify(nvlist_add_string(vdev_nv,
                                ZPOOL_CONFIG_DEVID, devid_str) == 0);
                }
                if (devid_str != NULL)
                        devid_str_free(devid_str);
                if (minor != NULL)
                        devid_str_free(minor);
                devid_free(devid);
        }

        (void) close(fd);
        return (vdev_nv);
}


static nvlist_t *
make_vdev_root(size_t size)
{
        nvlist_t *root, **child;
        int c;
        int t = 1;

        child = umem_alloc(t * sizeof (nvlist_t *), UMEM_NOFAIL);
        child[0] = make_vdev_dev();

        VERIFY(nvlist_alloc(&root, NV_UNIQUE_NAME, 0) == 0);
        VERIFY(nvlist_add_string(root, ZPOOL_CONFIG_TYPE, VDEV_TYPE_ROOT) == 0);
        VERIFY(nvlist_add_nvlist_array(root, ZPOOL_CONFIG_CHILDREN,
            child, t) == 0);

        nvlist_free(child[0]);
        umem_free(child, sizeof (nvlist_t *));

        return (root);
}

int
create_pool(char *pool)
{
        nvlist_t *nvroot;
        int error = 0;

        printf("Create a new storage pool.\n");
        nvroot = make_vdev_root(SPA_MINDEVSIZE);
        error = spa_create(pool, nvroot, NULL);
        nvlist_free(nvroot);

        return (error);
}

int
main(int argc, char **argv)
{
        objset_t *os = NULL;
        zilog_t *zilog;
        uint64_t object = 0ULL;
        uint64_t seq;
        uint64_t txg;
        dmu_tx_t *tx;
        itx_t *itx;
        int error;
        char *buf;
        char *osname;
        nvlist_t *config;
        struct rlimit rl = { 1024, 1024 };

        if (argc != 2) {
                fprintf(stderr, "Usage: %s <pool>\n", argv[0]);
                exit (1);
        }

        osname = argv[1];

        buf = malloc(BUF_SIZE);
        memset(buf, 'A', BUF_SIZE);

        // spa_config_dir = "/tmp";

        (void) setvbuf(stdout, NULL, _IOLBF, 0);
        (void) setrlimit(RLIMIT_NOFILE, &rl);
        (void) enable_extended_FILE_stdio(-1, -1);


        kernel_init(FREAD | FWRITE);

/*
        printf("Destroy existing storage pool.\n");
        (void) spa_destroy(osname);

        error = create_pool(osname);
        if (error) {
                printf("create_pool(%s) failed - %d\n", osname, error);
                exit(error);
        }
*/

        system("zpool status test");

        printf("Export the pool.\n");
        error = spa_export(osname, &config);
        if (error) {
                printf("spa_export('%s') = %d\n", osname, error);
                exit(error);
        }

        system("zpool export test");
        system("zpool status test");
        spa_config_dir = "/tmp";

        printf("Import the pool.\n");
        error = spa_import(osname, config, NULL);
        if (error) {
                printf("spa_export('%s') = %d\n", osname, error);
                exit(error);
        }

        printf("Do IO on the pool.\n"); 
        error = dmu_objset_open(osname, DMU_OST_ANY, DS_MODE_NONE, &os);
        if (error) {
                fprintf(stderr, "dmu_objset_open() = %d", error);
                return (error);
        }

        // zilog = zil_open(os, NULL);

        tx = dmu_tx_create(os);
        
        dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE);
        // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT);

        error = dmu_tx_assign(tx, TXG_WAIT);
        if (error) {
                dmu_tx_abort(tx);
                return (error);
        }
        object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0,
            DMU_OT_NONE, 0, tx);

        printf("Object %lld\n", object);
/*
        error = dmu_object_set_blocksize(os, object, 1ULL << SPA_MINBLOCKSHIFT, 
0, tx);

        if (error) {
                fprintf(stderr, "dmu_object_set_blocksize() = %d", error);
                return (error);
        }
        // itx = zil_itx_create(TX_CREATE, );
        // seq = zil_itx_assign(zilog, itx, tx);
*/
        dmu_write(os, object, 0, BUF_SIZE, buf, tx);

        dmu_tx_commit(tx);

        // zil_commit(zilog, seq, object);
        txg_wait_synced(dmu_objset_pool(os), 0);

        return (0);

}
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
<Prev in Thread] Current Thread [Next in Thread>
  • [zfs-discuss] Re: [zfs-code] Space allocation failure, Manoj Joseph <=