Commit Graph

67196 Commits

Author SHA1 Message Date
Peter Korsgaard
90ceb9644d serial: samsung.c: mark s3c24xx_serial_remove as __devexit
Mark the remove function as __devexit so it gets eliminated in
CONFIG_HOTPLUG=n builds.  Saves ~100 bytes.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:24 -07:00
Alan Cox
94362fd7fb tty: fix some bogns in the serqt_usb2 driver
Remove the replicated urban legends from the comments and fix a couple of
other silly calls

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:24 -07:00
Alan Cox
a6540f731d ppp: Fix throttling bugs
The ppp layer goes around calling the unthrottle method from non sleeping
paths. This isn't safe because the unthrottle methods in the tty layer need
to be able to sleep (consider a USB dongle).

Until now this didn't show up because the ppp layer never actually throttled
a port so the unthrottle was always a no-op. Currently it's a mutex taking
path so warnings are spewed if the unthrottle occurs via certain paths.

Fix this by removing the unneccessary unthrottle calls.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:24 -07:00
Jiri Slaby
a115902f67 vt_ioctl: fix lock imbalance
Don't return from switch/case directly in vt_ioctl. Set ret and break
instead so that we unlock BKL.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:24 -07:00
Jiri Slaby
69ae59d7d8 pcmcia/cm4000: fix lock imbalance
Don't return from switch/case, break instead, so that we unlock BKL.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:24 -07:00
Jiri Slaby
eca4104426 n_r3964: fix lock imbalance
There is omitted BKunL in r3964_read.

Centralize the paths to one point with one unlock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:23 -07:00
Roel Kluin
52e3632ea6 serial: fix off by one errors
In zs_console_putchar() occurs:

	if (zs_transmit_drain(zport, irq))
		write_zsdata(zport, ch);

However if in zs_transmit_drain() no empty Tx Buffer occurs, limit reaches
-1 => true, and the write still occurs.

This patch changes postfix to prefix decrements in this and similar
functions to prevent similar mistakes in the future.  This decreases the
iterations with one but the chosen loop count was arbitrary anyway.

In sunhv limit reaches -1, not 0, so the test is off by one.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:23 -07:00
Mike Frysinger
607c268ef9 serial: bfin_5xx: fix building as module when early printk is enabled
Since early printk only makes sense/works when the serial driver is built
into the kernel, disable the option for this driver when it is going to be
built as a module.  Otherwise we get build failures due to the ifdef
handling.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:23 -07:00
Mike Frysinger
9c529a3d76 serial: bfin_5xx: add missing spin_lock init
The Blackfin serial driver never initialized the spin_lock that is part of
the serial core structure, but we never noticed because spin_lock's are
rarely enabled on UP systems.  Yeah lockdep and friends.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:23 -07:00
Mike Frysinger
56578abfd1 bfin_jtag_comm: clean up printk usage
The original patch garned some feedback and a v2 was posted, but that
version seems to have been missed when merging the driver.

At any rate, this cleans up the printk usage as suggested by Jiri Slaby.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:32:23 -07:00
FUJITA Tomonori
dfa7c4d869 parport_pc: set properly the dma_mask for parport_pc device
parport_pc_probe_port() creates the own 'parport_pc' device if the
device argument is NULL. Then parport_pc_probe_port() doesn't
initialize the dma_mask and coherent_dma_mask of the device and calls
dma_alloc_coherent with it. dma_alloc_coherent fails because
dma_alloc_coherent() doesn't accept the uninitialized dma_mask:

http://lkml.org/lkml/2009/6/16/150

Long ago, X86_32 and X86_64 had the own dma_alloc_coherent
implementations; X86_32 accepted a device having dma_mask that is not
initialized however X86_64 didn't. When we merged them, we chose to
prohibit a device having dma_mask that is not initialized. I think
that it's good to require drivers to set up dma_mask (and
coherent_dma_mask) properly if the drivers want DMA.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reported-by: Malcom Blaney <malcolm.blaney@maptek.com.au>
Tested-by: Malcom Blaney <malcolm.blaney@maptek.com.au>
Cc: stable@kernel.org
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:29:00 -07:00
Jens Rottmann
e2434dc1c1 parport_pc: after superio probing restore original register values
CONFIG_PARPORT_PC_SUPERIO probes for various superio chips by writing
byte sequences to a set of different potential I/O ranges.  But the
probed ranges are not exclusive to parallel ports.  Some of our boards
just happen to have a watchdog in one of them.  Took us almost a week
to figure out why some distros reboot without warning after running
flawlessly for 3 hours.  For exactly 170 = 0xAA minutes, that is ...

Fixed by restoring original values after probing.  Also fixed too small
request_region() in detect_and_report_it87().

Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: <stable@kernel.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:28:59 -07:00
Linus Torvalds
752a478751 Revert "char: moxa, prevent opening unavailable ports"
This reverts commit a90b037583, which
already got fixed as commit f0e8527726:
the same patch (trivial differences) got applied twice.

Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:24:43 -07:00
Jiri Slaby
129dd98194 fusion: mptsas, fix lock imbalance
Fix two typos in mptsas_not_responding_devices. It was mutex_lock instead
of unlock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-22 08:54:14 -05:00
Sebastian Ott
181d95229b [S390] dasd: fix refcounting in dasd_change_state
To set a dasd online dasd_change_state is called twice. The first
cycle will schedule initial analysis of the device, set the rc to
-EAGAIN and will not touch the device state any more.
The initial analysis will in turn call dasd_change_state to increase
the state to the final DASD_STATE_ONLINE.

If the dasd_change_state on the second thread outruns the other one
both finish with the state set to DASD_STATE_ONLINE and the device
refcount will be decreased by 2.

Fix this by leaving dasd_change_state on rc == -EAGAIN so that the
refcount will always be decreased by 1.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:23 +02:00
Martin Schwidefsky
4f0076f77f [S390] driver_data access
Replace the remaining direct accesses to the driver_data pointer
with calls to the dev_get_drvdata() and dev_set_drvdata() functions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:23 +02:00
Stefan Haberland
e6125fba81 [S390] dasd_pm: fix stop flag handling
The stop flags are handled in the generic restore function so the
stop flag is removed also for FBA and DIAG devices.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:22 +02:00
Felix Beck
772f54720a [S390] ap/zcrypt: Suspend/Resume ap bus and zcrypt
Add Suspend/Resume support to ap bus and zcrypt. All enhancements are
done in the ap bus. No changes in the crypto card specific part are
necessary.

Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:21 +02:00
Jan Glauber
6618241b47 [S390] qdio: Sanitize do_QDIO sanity checks
Remove unneeded sanity checks from do_QDIO since this is the hot path.
Change the type of bufnr and count to unsigned int so the check for the
maximum value works.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:21 +02:00
Jan Glauber
f0a0b15e0f [S390] qdio: leave inbound SBALs primed
It is not required to change the state of primed SBALs. Leaving them
primed saves a SQBS instruction under z/VM.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:21 +02:00
Jan Glauber
cf9a031c2c [S390] qdio: merge AI tasklet into interrupt handler
Since the adapter interrupt tasklet only schedules the queue tasklets
and contains no code that requires serialization in can be merged
with the adapter interrupt handler. That possibly safes some CPU
cycles.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:20 +02:00
Jan Glauber
36e3e72120 [S390] qdio: extract all primed SBALs at once
For devices without QIOASSIST primed SBALS were extracted in a loop.
Remove the loop since get_buf_states can already return more than
one primed SBAL.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:20 +02:00
Jan Glauber
9a2c160a8c [S390] qdio: fix check for running under z/VM
The check whether qdio runs under z/VM was incorrect since SIGA-Sync is not
set if the device runs with QIOASSIST. Use MACHINE_IS_VM instead to prevent
polling under z/VM.

Merge qdio_inbound_q_done and tiqdio_is_inbound_q_done.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:20 +02:00
Jan Glauber
60b5df2f12 [S390] qdio: move adapter interrupt tasklet code
Move the adapter interrupt tasklet function to the qdio main code
since all the functions used by the tasklet are located there.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:19 +02:00
Michael Holzheu
f3dfa86caa [S390] Use del_timer instead of del_timer_sync
When syncing the sclp console queue, we call del_timer_sync() while holding
the "sclp_con_lock" spinlock. This lock is also taken in the timer function
"sclp_console_timeout". Therefore the sync version of del_timer() cannot be
used here. Because the synchronous deletion of the timer is only needed
in the suspend callback and in that case only one CPU is remaining and
therefore it is not possible that the timer function is running in parallel,
we can safely use del_timer() instead of del_timer_sync().

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:19 +02:00
Heiko Carstens
5c0792f692 [S390] vt220 console: convert from bootmem to slab
The slab allocator is earlier available so convert the
bootmem allocations to slab/gfp allocations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:18 +02:00
Heiko Carstens
4c8f4794b6 [S390] sclp console: convert from bootmem to slab
The slab allocator is earlier available so convert the
bootmem allocations to slab/gfp allocations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:17 +02:00
Heiko Carstens
33403dcfcd [S390] 3270 console: convert from bootmem to slab
The slab allocator is earlier available so convert the
bootmem allocations to slab/gfp allocations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:17 +02:00
Heiko Carstens
6d56eee2c0 [S390] 3215 console: convert from bootmem to slab
The slab allocator is earlier available so convert the
bootmem allocations to slab/gfp allocations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22 12:08:17 +02:00
Oskar Schirmer
8b0215aa5b s6gmac: xtensa s6000 on-chip ethernet driver
The s6000 on-chip MAC supports 10/100/1000Mbit and is connected to an
external PHY via MII or RGMII interface.

[jw@emlix.com: don't use device->bus_id directly]
Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Daniel Glockner <dg@emlix.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Zankel <chris@zankel.net>
2009-06-22 02:37:34 -07:00
Kiyoshi Ueda
f40c67f0f7 dm mpath: change to be request based
This patch converts dm-multipath target to request-based from bio-based.

Basically, the patch just converts the I/O unit from struct bio
to struct request.
In the course of the conversion, it also changes the I/O queueing
mechanism.  The change in the I/O queueing is described in details
as follows.

I/O queueing mechanism change
-----------------------------
In I/O submission, map_io(), there is no mechanism change from
bio-based, since the clone request is ready for retry as it is.
However, in I/O complition, do_end_io(), there is a mechanism change
from bio-based, since the clone request is not ready for retry.

In do_end_io() of bio-based, the clone bio has all needed memory
for resubmission.  So the target driver can queue it and resubmit
it later without memory allocations.
The mechanism has almost no overhead.

On the other hand, in do_end_io() of request-based, the clone request
doesn't have clone bios, so the target driver can't resubmit it
as it is.  To resubmit the clone request, memory allocation for
clone bios is needed, and it takes some overheads.
To avoid the overheads just for queueing, the target driver doesn't
queue the clone request inside itself.
Instead, the target driver asks dm core for queueing and remapping
the original request of the clone request, since the overhead for
queueing is just a freeing memory for the clone request.

As a result, the target driver doesn't need to record/restore
the information of the original request for resubmitting
the clone request.  So dm_bio_details in dm_mpath_io is removed.

multipath_busy()
---------------------
The target driver returns "busy", only when the following case:
  o The target driver will map I/Os, if map() function is called
  and
  o The mapped I/Os will wait on underlying device's queue due to
    their congestions, if map() function is called now.

In other cases, the target driver doesn't return "busy".
Otherwise, dm core will keep the I/Os and the target driver can't
do what it wants.
(e.g. the target driver can't map I/Os now, so wants to kill I/Os.)

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:37 +01:00
Kiyoshi Ueda
523d9297d4 dm: disable interrupt when taking map_lock
This patch disables interrupt when taking map_lock to avoid
lockdep warnings in request-based dm.

request-based dm takes map_lock after taking queue_lock with
disabling interrupt:
  spin_lock_irqsave(queue_lock)
  q->request_fn() == dm_request_fn()
    => dm_get_table()
         => read_lock(map_lock)
while queue_lock could be (but isn't) taken in interrupt context.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:37 +01:00
Kiyoshi Ueda
5d67aa2366 dm: do not set QUEUE_ORDERED_DRAIN if request based
Request-based dm doesn't have barrier support yet.
So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table loading time,
the flag set is deferred until then.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:36 +01:00
Kiyoshi Ueda
e6ee8c0b76 dm: enable request based option
This patch enables request-based dm.

o Request-based dm and bio-based dm coexist, since there are
  some target drivers which are more fitting to bio-based dm.
  Also, there are other bio-based devices in the kernel
  (e.g. md, loop).
  Since bio-based device can't receive struct request,
  there are some limitations on device stacking between
  bio-based and request-based.

                     type of underlying device
                   bio-based      request-based
   ----------------------------------------------
    bio-based         OK                OK
    request-based     --                OK

  The device type is recognized by the queue flag in the kernel,
  so dm follows that.

o The type of a dm device is decided at the first table binding time.
  Once the type of a dm device is decided, the type can't be changed.

o Mempool allocations are deferred to at the table loading time, since
  mempools for request-based dm are different from those for bio-based
  dm and needed mempool type is fixed by the type of table.

o Currently, request-based dm supports only tables that have a single
  target.  To support multiple targets, we need to support request
  splitting or prevent bio/request from spanning multiple targets.
  The former needs lots of changes in the block layer, and the latter
  needs that all target drivers support merge() function.
  Both will take a time.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:36 +01:00
Kiyoshi Ueda
cec47e3d4a dm: prepare for request based option
This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
    make_request_fn: dm_make_request()
    pref_fn:         dm_prep_fn()
    request_fn:      dm_request_fn()
    softirq_done_fn: dm_softirq_done()
    lld_busy_fn:     dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
  - making request from bio
  - cloning, mapping and dispatching request
  - completing request and bio
  - suspending md
  - resuming md

  bio to request
  ==============
  md->queue->make_request_fn() (dm_make_request()) calls __make_request()
  for a bio submitted to the md.
  Then, the bio is kept in the queue as a new request or merged into
  another request in the queue if possible.

  Cloning and Mapping
  ===================
  Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
  when requests are dispatched after they are sorted by the I/O scheduler.

  dm_request_fn() checks busy state of underlying devices using
  target's busy() function and stops dispatching requests to keep them
  on the dm device's queue if busy.
  It helps better I/O merging, since no merge is done for a request
  once it is dispatched to underlying devices.

  Actual cloning and mapping are done in dm_prep_fn() and map_request()
  called from dm_request_fn().
  dm_prep_fn() clones not only request but also bios of the request
  so that dm can hold bio completion in error cases and prevent
  the bio submitter from noticing the error.
  (See the "Completion" section below for details.)

  After the cloning, the clone is mapped by target's map_rq() function
    and inserted to underlying device's queue using
    blk_insert_cloned_request().

  Completion
  ==========
  Request completion can be hooked by rq->end_io(), but then, all bios
  in the request will have been completed even error cases, and the bio
  submitter will have noticed the error.
  To prevent the bio completion in error cases, request-based dm clones
  both bio and request and hooks both bio->bi_end_io() and rq->end_io():
      bio->bi_end_io(): end_clone_bio()
      rq->end_io():     end_clone_request()

  Summary of the request completion flow is below:
  blk_end_request() for a clone request
    => blk_update_request()
       => bio->bi_end_io() == end_clone_bio() for each clone bio
          => Free the clone bio
          => Success: Complete the original bio (blk_update_request())
             Error:   Don't complete the original bio
    => blk_finish_request()
       => rq->end_io() == end_clone_request()
          => blk_complete_request()
             => dm_softirq_done()
                => Free the clone request
                => Success: Complete the original request (blk_end_request())
                   Error:   Requeue the original request

  end_clone_bio() completes the original request on the size of
  the original bio in successful cases.
  Even if all bios in the original request are completed by that
  completion, the original request must not be completed yet to keep
  the ordering of request completion for the stacking.
  So end_clone_bio() uses blk_update_request() instead of
  blk_end_request().
  In error cases, end_clone_bio() doesn't complete the original bio.
  It just frees the cloned bio and gives over the error handling to
  end_clone_request().

  end_clone_request(), which is called with queue lock held, completes
  the clone request and the original request in a softirq context
  (dm_softirq_done()), which has no queue lock, to avoid a deadlock
  issue on submission of another request during the completion:
      - The submitted request may be mapped to the same device
      - Request submission requires queue lock, but the queue lock
        has been held by itself and it doesn't know that

  The clone request has no clone bio when dm_softirq_done() is called.
  So target drivers can't resubmit it again even error cases.
  Instead, they can ask dm core for requeueing and remapping
  the original request in that cases.

  suspend
  =======
  Request-based dm uses stopping md->queue as suspend of the md.
  For noflush suspend, just stops md->queue.

  For flush suspend, inserts a marker request to the tail of md->queue.
  And dispatches all requests in md->queue until the marker comes to
  the front of md->queue.  Then, stops dispatching request and waits
  for the all dispatched requests to complete.
  After that, completes the marker request, stops md->queue and
  wake up the waiter on the suspend queue, md->wait.

  resume
  ======
  Starts md->queue.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:35 +01:00
Jonthan Brassow
f5db4af466 dm raid1: add userspace log
This patch contains a device-mapper mirror log module that forwards
requests to userspace for processing.

The structures used for communication between kernel and userspace are
located in include/linux/dm-log-userspace.h.  Due to the frequency,
diversity, and 2-way communication nature of the exchanges between
kernel and userspace, 'connector' was chosen as the interface for
communication.

The first log implementations written in userspace - "clustered-disk"
and "clustered-core" - support clustered shared storage.   A userspace
daemon (in the LVM2 source code repository) uses openAIS/corosync to
process requests in an ordered fashion with the rest of the nodes in the
cluster so as to prevent log state corruption.  Other implementations
with no association to LVM or openAIS/corosync, are certainly possible.

(Imagine if two machines are writing to the same region of a mirror.
They would both mark the region dirty, but you need a cluster-aware
entity that can handle properly marking the region clean when they are
done.  Otherwise, you might clear the region when the first machine is
done, not the second.)

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:35 +01:00
Mike Snitzer
754c5fc7eb dm: calculate queue limits during resume not load
Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device.  When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.

This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together.  As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.

This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'.  Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.

dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().

init_valid_queue_limits() is replaced by blk_set_default_limits().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:34 +01:00
Mike Snitzer
18d8594dd9 dm log: fix create_log_context to use logical_block_size of log device
create_log_context() must use the logical_block_size from the log disk,
where the I/O happens, not the target's logical_block_size.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:33 +01:00
Mike Snitzer
af4874e03e dm target:s introduce iterate devices fn
Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target.  Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: martin.petersen@oracle.com
2009-06-22 10:12:33 +01:00
Mike Snitzer
1197764e40 dm table: establish queue limits by copying table limits
Copy the table's queue_limits to the DM device's request_queue.  This
properly initializes the queue's topology limits and also avoids having
to track the evolution of 'struct queue_limits' in
dm_table_set_restrictions()

Also fixes a bug that was introduced in dm_table_set_restrictions() via
commit ae03bf639a.  In addition to
establishing 'bounce_pfn' in the queue's limits blk_queue_bounce_limit()
also performs an allocation to setup the ISA DMA pool.  This allocation
resulted in "sleeping function called from invalid context" when called
from dm_table_set_restrictions().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:32 +01:00
Mike Snitzer
5ab97588fb dm table: replace struct io_restrictions with struct queue_limits
Use blk_stack_limits() to stack block limits (including topology) rather
than duplicate the equivalent within Device Mapper.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:32 +01:00
Mike Snitzer
be6d4305db dm table: validate device logical_block_size
Impose necessary and sufficient conditions on a devices's table such
that any incoming bio which respects its logical_block_size can be
processed successfully.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:31 +01:00
Mike Snitzer
02acc3a4fa dm table: ensure targets are aligned to logical_block_size
Ensure I/O is aligned to the logical block size of target devices.

Rename check_device_area() to device_area_is_valid() for clarity and
establish the device limits including the logical block size prior to
calling it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:30 +01:00
Milan Broz
60935eb21d dm ioctl: support cookies for udev
Add support for passing a 32 bit "cookie" into the kernel with the
DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls.  The (unsigned)
value of this cookie is returned to userspace alongside the uevents
issued by these ioctls in the variable DM_COOKIE.

This means the userspace process issuing these ioctls can be notified
by udev after udev has completed any actions triggered.

To minimise the interface extension, we pass the cookie into the
kernel in the event_nr field which is otherwise unused when calling
these ioctls.  Incrementing the version number allows userspace to
determine in advance whether or not the kernel supports the cookie.
If the kernel does support this but userspace does not, there should
be no impact as the new variable will just get ignored.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:30 +01:00
Peter Rajnoha
486d220fe4 dm: sysfs add suspended attribute
Add a file named 'suspended' to each device-mapper device directory in
sysfs.  It holds the value 1 while the device is suspended.  Otherwise
it holds 0.

Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:29 +01:00
Jonthan Brassow
1b6da75459 dm table: improve warning message when devices not freed before destruction
Report any devices forgotten to be freed before a table is destroyed.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:29 +01:00
Kiyoshi Ueda
f392ba889b dm mpath: add service time load balancer
This patch adds a service time oriented dynamic load balancer,
dm-service-time, which selects the path with the shortest estimated
service time for the incoming I/O.
The service time is estimated by dividing the in-flight I/O size
by a performance value of each path.

The performance value can be given as a table argument at the table
loading time.  If no performance value is given, all paths are
considered equal.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:28 +01:00
Kiyoshi Ueda
fd5e033908 dm mpath: add queue length load balancer
This patch adds a dynamic load balancer, dm-queue-length, which
balances the number of in-flight I/Os across the paths.

The code is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:27 +01:00
Kiyoshi Ueda
02ab823fd1 dm mpath: add start_io and nr_bytes to path selectors
This patch makes two additions to the dm path selector interface for
dynamic load balancers:
  o a new hook, start_io()
  o a new parameter 'nr_bytes' to select_path()/start_io()/end_io()
    to pass the size of the I/O

start_io() is called when a target driver actually submits I/O
to the selected path.
Path selectors can use it to start accounting of the I/O.
(e.g. counting the number of in-flight I/Os.)
The start_io hook is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

nr_bytes, the size of the I/O, is so path selectors can take the
size of the I/O into account when deciding which path to use.
dm-service-time uses it to estimate service time, for example.
(Added the nr_bytes member to dm_mpath_io instead of using existing
 details.bi_size, since request-based dm patch deletes it.)

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:27 +01:00
Mikulas Patocka
2bd0234525 dm snapshot: use barrier when writing exception store
Send barrier requests when updating the exception area.

Exception area updates need to be ordered w.r.t. data writes, so that
the writes are not reordered in hardware disk cache.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:26 +01:00