Board index » database » Linux and broken O_DIRECT

Linux and broken O_DIRECT

2004-10-29 03:56:49 PM
dist/configure.ac in BDB 4.2 contains the following comment:
# Linux has a broken O_DIRECT flag, but we allow people to override it from
# the command line.
In what way is Linux broken? Are there problems with metadata
updates, like on other systems? Is the brokenness specific to certain
kernel versions?
-
 

Re:Linux and broken O_DIRECT

Florian Weimer <fw@deneb.enyo.de>wrote in message news:<871xfiot32.fsf@deneb.enyo.de>...
Quote
# Linux has a broken O_DIRECT flag, but we allow people to override it from
# the command line.

In what way is Linux broken?
We've seen a variety of behaviors, including:
+ Systems with O_DIRECT in their include files, but on which
the open system call will fail if O_DIRECT is specified,
+ Systems where the open calls will succeed when the O_DIRECT
flag is specified, but any subsequent read or write using
the file descriptor returned by the open call will fail,
+ Systems where O_DIRECT worked with some filesystems but not
with others,
+ Systems where buffers require specific alignment if they
are to be written to a file descriptor for which O_DIRECT
was specified; if a buffer isn't properly aligned the
read/write call will fail (rather than the system falling
back to a slower read/write).
Quote
Are there problems with metadata updates, like on other systems?
Is the brokenness specific to certain kernel versions?
I can't answer either of these questions, we never researched the
problems to that level.
In the Berkeley DB 4.3 release, we default to not using the O_DIRECT
flag. You can always override the default and configure O_DIRECT
explicitly, however, using:
env db_cv_open_o_direct=yes ../dist/configure [args]
Regards,
--keith
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 www.sleepycat.com
-

Re:Linux and broken O_DIRECT

* Keith Bostic:
Quote
In what way is Linux broken?

We've seen a variety of behaviors, including:
[...]
+ Systems where buffers require specific alignment if they
are to be written to a file descriptor for which O_DIRECT
was specified; if a buffer isn't properly aligned the
read/write call will fail (rather than the system falling
back to a slower read/write).
Linux 2.6 requires that block boundaries in the user-space buffer do
not cross page boundaries. In practice, this means that the buffer
has to be page-aligned. I think it's possible to work around this
problem in the read/write routines at the expense of an extra copy.
Since the kernel caching algorithms are bypassed, it might still be a
win.
-