I can only speculate, but maybe they're referring to the same thing Andrew Morton meant when he described ZFS as a rampant layering violation.
ie ZFS isn't just a file system. It's a volume manager, raid and file system rolled into one holistic system vs for example LVM + MD + ext4.
And (again I'm only speculating) in their micro kernel design want to have individual components running separately to layer together a complete solution.
Good question. I don't know about other microkernels, but NetBSD is a small kernel that supports ZFS. The support has been there since the 4.0.5 and 5.3[0], possibly earlier too. I'm not adept at navigating the mailing lists here, but I imagine a good place to learn about the challenges of porting ZFS to a smaller kernel would be the NetBSD and ZFS lists from that era (2008-2009). What NetBSD does today is use a 'zfs' modlue that depends on a 'solaris' kernel modile. The dependency of Solaris primitives is probably one of the major challenges with porting ZFS to any kernel. FWIW, somehow a ZFS port for the "hybrid" kernel in Windows also exists[1].
“I don’t know about other microkernels” implies that NetBSD is also a microkernel. It is not.
Microkernel is not a size distinction. NetBSD kernel may even be smaller in terms of LOC or binary size than some microkernels. Idk. But that is beside the point.
Microkernel is an *architecture*. It is a name for a specific type of kernel design, which NetBSD is not.
I particularly don't buy it because ZFS used to have a FUSE build, and I'm pretty sure there's at least one company still running it in userspace in some form (something for k8s, IIRC?)
Yeah I've always written this off as a fun side project for a group of people but after seeing consistent updates and improvements over the last several years I've been so impressed by how far this project has been going.
> Redox had a read-only ZFS driver but it was abandoned because of the monolithic nature of ZFS that created problems with the Redox microkernel design.
Curious about the details behind those compatibility problems.
If it relied on OpenZFS, then I wouldn't be too surprised.
The whole ARC thing for example, sidestepping the general block cache, feels like a major hack resulting from how it was brutally extracted from Solaris at the time...
The way zfs just doesn't "fit" was why I had hope for btrfs... ZFS is still great for a file server, but wouldn't use it on a general purpose machine.
Solaris had a unified page cache, and ARC existed separately, along side of it there as well.
One huge problem with ZFS is that there is no zero copy due to the ARC wart. Eg, if you're doing sendfile() from a ZFS filesystem, every byte you send is copied into a network buffer. But if you're doing sendfile from a UFS filesystem, the pages are just loaned to the network.
This means that on the Netflix Open Connect CDN, where we serve close to the hardware limits of the system, we simply cannot use ZFS for video data due to ZFS basically doubling the memory bandwidth requirements. Switching from UFS to ZFS would essentially cut the maximum performance of our servers in half.
Can you elaborate the last paragraph? In what way doesn't zfs fit? (I couldn't make it out from the first two paragraphs.) Where did btrfs fall short of your expectations? Why would you avoid zfs on general purpose machines if you deem it good enough for file servers?
Even on Solaris the ARC existed. ZFS replaces a lot of systems traditionally not directly related to a Filesystem implementation.
For instance using the `zfs` tool one wouldn't only configure file system properties, but also control NFS exports, which traditionally was done using /etc/exports.
Zfs relies on Solaris (Unix) kernel primitives IIRC ... I remember hearing that to get zfs to work with an is you basically have to implement a good portion of the Solaris kernel interface as shims
It doesn't currently have any GPU support (for example) - even for a pretty simple desktop CPU rendering is rather incompatible with battery life or performance in a laptop form factor.
Innovation is wonderful, but it’s hard to believe this has enough users to flush out the challenging bugs. Maybe if it had some kind of correctness proof, but it just seems like there are way too many subtle bugs in file systems in general for me to try a new FS.
Ext4 and NTFS both have a 2^32-1 limit on number of files as well. Realistically, you never actually want to make tons of files, so I have a pretty hard time seeing this being an issue in practice.
Files in nested folders are primarily an abstraction for humans. They are a maximally flexible and customizable system. This has substantial costs (especially in environments with parallel work). As such, no one really has millions of pieces of fully separate, unstructured, hierarchical data. Once you have that much data, there is almost always additional structure that would be better represented in something like a database where you can actually express the invariants that you have.
Aren’t block sizes (and minimum file size) normally around 4kB? So a max number of 1-byte files would take up around 16 TB, without adding any overhead. Those drives are available these days
Well they’d have to write their own driver anyway for one. If they were going to take an existing design and write a new driver, ZFS would be the better choice by far. Much longer and broader operational history and much better documentation.
And you might not get sued by Oracle! RedoxOS seems to use the MIT license while OpenZFS is under the CDDL. Given Oracles litigious nature they'd have to make sure none of their code looked like OpenZFS code, even better make sure any of the developers had ever even looked at the ZFS code.
Its much better to hope that OpenZFS decides to create a RedoxOS implementation themselves then to try and make a clean room ZFS implementation.
Fair enough, though you can’t really understand how BTRFS works without reading the GPLed Linux source while ZFS has some separate disk format documentation. Don’t know that anyone would sue you though.
License is the obvious blocker, aside from all the technical issues[0]. Btrfs is GPL, RedoxOS is MIT, ZFS is CDDL. You can integrate CDDL into an MIT project without problems[1], but due to the viral nature of the GPL, integrating btrfs would have impacts on the rest of the project.
What I'm wondering is what about HAMMER2? It's under a copyfree license and it is developed for a microkernel operating system (DragonflyBSD). Seems like a natural fit.
[0] btrfs holds the distinction of being the only filesystem that has lost all of my data, and it managed to do it twice! Corrupt my drive once, shame on you. Corrupt my drive twice, can't corrupt my drive again.
[1] further explanation: The CDDL is basically "the GPL but it only applies to the files under the CDDL, rather than the whole project". So the code for ZFS would remain under the CDDL and it would have all the restrictions that come with that, but the rest of the code base can remain under MIT. This is why FreeBSD can have ZFS fully integrated whereas on Linux ZFS is an out-of-tree module.
> Corrupt my drive twice, can't corrupt my drive again.
Exact same drive? You might want to check that drive isn't silently corrupting data.
I still blame btrfs, something very similar happened to me.
I had a WD Green drive with a known flaw were it would just silently zero data on writes in some random situations. EXT4 worked fine on this drives for years (the filesystem was fine, my files had random zeroed sections). But btrfs just couldn't handle this situation and immediately got itself into an unrecoverable state, scrub and fsck just couldn't fix the issue.
In one way, I was better off. At least I now knew that drive had been silently corrupting data for years. But it destroyed my confidence in btrfs forever. Btrfs didn't actually lose any additional data for me, it was in RAID and the data was all still there, so it should have been able to recover itself.
But it simply couldn't. I had to manually use a hex editor to piece a few files back together (and restore many others from backup).
Even worse, when I talked to people on the #btrfs IRC channel, not only was nobody was surprised the btrfs had borked itself due to bad hardware, but everyone recommend that a btrfs filesystem that had been borked could never be trusted. Instead, the only way to get a trustworthy, clean, and canonical btrfs filesystem was to delete it and start from scratch (this time without the stupid faulty drive)
Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.
Last time I looked at DragonflyBSD, it was kind of an intermediate between a traditional kernel and a microkernel. There certainly was a lot more in the kernel as compared to systems built on e.g. L4.
There certainly is a continuum. I've always wanted to build a microkernel-ish system on top of Linux that only has userspace options for block devices, file systems and tcp/ip. It would be dog-slow but theoretically work.
You mean because the CDDL files would have to be licensed under GPL, and that's not compatible with the CDDL? I assume MIT-licensed files can be relicenssd as GPL, that's why that mix is fine?
Yes, if ZFS (CDDL) was integrated into Linux (GPL) then the GPL would need to apply to the CDDL files, which causes a conflict because the CDDL is not compatible with the GPL.
This isn't a problem integrating MIT code into a GPL project, because MIT's requirements are a subset of the GPL's requirements so the combined project being under the GPL is no problem. (Going the other way by integrating GPL code into an MIT project is technically also possible, but it would covert that project to a GPL project so most MIT projects would be resistant to this.)
This isn't a problem combining MIT and CDDL because both lack the GPL's virality. They can happily coexist in the same project, leaving each other alone.
Anyone have an idea what this actually means and what problems they were having in practice?
ie ZFS isn't just a file system. It's a volume manager, raid and file system rolled into one holistic system vs for example LVM + MD + ext4.
And (again I'm only speculating) in their micro kernel design want to have individual components running separately to layer together a complete solution.
[0] https://vermaden.wordpress.com/2022/03/25/zfs-compatibility/
[1] https://github.com/openzfsonwindows/openzfs
Microkernel is not a size distinction. NetBSD kernel may even be smaller in terms of LOC or binary size than some microkernels. Idk. But that is beside the point.
Microkernel is an *architecture*. It is a name for a specific type of kernel design, which NetBSD is not.
It's great, of course, to provide correct information. But please do it without putdowns; you don't need them, and they acidify discussion.
I admire these projects & the teams for their tenacity.
Four bells! Damn the torpedoes.
0. https://genode.org/
I agree, can’t say “dead” but it is a Google project so it’s like being born with a terminal condition.
Curious about the details behind those compatibility problems.
The whole ARC thing for example, sidestepping the general block cache, feels like a major hack resulting from how it was brutally extracted from Solaris at the time...
The way zfs just doesn't "fit" was why I had hope for btrfs... ZFS is still great for a file server, but wouldn't use it on a general purpose machine.
One huge problem with ZFS is that there is no zero copy due to the ARC wart. Eg, if you're doing sendfile() from a ZFS filesystem, every byte you send is copied into a network buffer. But if you're doing sendfile from a UFS filesystem, the pages are just loaned to the network.
This means that on the Netflix Open Connect CDN, where we serve close to the hardware limits of the system, we simply cannot use ZFS for video data due to ZFS basically doubling the memory bandwidth requirements. Switching from UFS to ZFS would essentially cut the maximum performance of our servers in half.
For instance using the `zfs` tool one wouldn't only configure file system properties, but also control NFS exports, which traditionally was done using /etc/exports.
32bit inodes? why?
Other systems had to go through pains to migrate to 64bit. Why not skip that?
Its much better to hope that OpenZFS decides to create a RedoxOS implementation themselves then to try and make a clean room ZFS implementation.
What I'm wondering is what about HAMMER2? It's under a copyfree license and it is developed for a microkernel operating system (DragonflyBSD). Seems like a natural fit.
[0] btrfs holds the distinction of being the only filesystem that has lost all of my data, and it managed to do it twice! Corrupt my drive once, shame on you. Corrupt my drive twice, can't corrupt my drive again.
[1] further explanation: The CDDL is basically "the GPL but it only applies to the files under the CDDL, rather than the whole project". So the code for ZFS would remain under the CDDL and it would have all the restrictions that come with that, but the rest of the code base can remain under MIT. This is why FreeBSD can have ZFS fully integrated whereas on Linux ZFS is an out-of-tree module.
Exact same drive? You might want to check that drive isn't silently corrupting data.
I still blame btrfs, something very similar happened to me.
I had a WD Green drive with a known flaw were it would just silently zero data on writes in some random situations. EXT4 worked fine on this drives for years (the filesystem was fine, my files had random zeroed sections). But btrfs just couldn't handle this situation and immediately got itself into an unrecoverable state, scrub and fsck just couldn't fix the issue.
In one way, I was better off. At least I now knew that drive had been silently corrupting data for years. But it destroyed my confidence in btrfs forever. Btrfs didn't actually lose any additional data for me, it was in RAID and the data was all still there, so it should have been able to recover itself.
But it simply couldn't. I had to manually use a hex editor to piece a few files back together (and restore many others from backup).
Even worse, when I talked to people on the #btrfs IRC channel, not only was nobody was surprised the btrfs had borked itself due to bad hardware, but everyone recommend that a btrfs filesystem that had been borked could never be trusted. Instead, the only way to get a trustworthy, clean, and canonical btrfs filesystem was to delete it and start from scratch (this time without the stupid faulty drive)
Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.
There certainly is a continuum. I've always wanted to build a microkernel-ish system on top of Linux that only has userspace options for block devices, file systems and tcp/ip. It would be dog-slow but theoretically work.
This isn't a problem integrating MIT code into a GPL project, because MIT's requirements are a subset of the GPL's requirements so the combined project being under the GPL is no problem. (Going the other way by integrating GPL code into an MIT project is technically also possible, but it would covert that project to a GPL project so most MIT projects would be resistant to this.)
This isn't a problem combining MIT and CDDL because both lack the GPL's virality. They can happily coexist in the same project, leaving each other alone.
(obligatory: I am not a lawyer)
[1] https://www.redox-os.org/