android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Marc Zyngier	31d9d9b6d8	genirq: Add support for per-cpu dev_id interrupts The ARM GIC interrupt controller offers per CPU interrupts (PPIs), which are usually used to connect local timers to each core. Each CPU has its own private interface to the GIC, and only sees the PPIs that are directly connect to it. While these timers are separate devices and have a separate interrupt line to a core, they all use the same IRQ number. For these devices, request_irq() is not the right API as it assumes that an IRQ number is visible by a number of CPUs (through the affinity setting), but makes it very awkward to express that an IRQ number can be handled by all CPUs, and yet be a different interrupt line on each CPU, requiring a different dev_id cookie to be passed back to the handler. The _percpu_irq() functions is designed to overcome these limitations, by providing a per-cpu dev_id vector: int request_percpu_irq(unsigned int irq, irq_handler_t handler, const char devname, void __percpu percpu_dev_id); void free_percpu_irq(unsigned int, void __percpu ); int setup_percpu_irq(unsigned int irq, struct irqaction new); void remove_percpu_irq(unsigned int irq, struct irqaction act); void enable_percpu_irq(unsigned int irq); void disable_percpu_irq(unsigned int irq); The API has a number of limitations: - no interrupt sharing - no threading - common handler across all the CPUs Once the interrupt is requested using setup_percpu_irq() or request_percpu_irq(), it must be enabled by each core that wishes its local interrupt to be delivered. Based on an initial patch by Thomas Gleixner. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-10-03 15:35:26 +02:00
Wu Fengguang	143dfe8611	writeback: IO-less balance_dirty_pages() As proposed by Chris, Dave and Jan, don't start foreground writeback IO inside balance_dirty_pages(). Instead, simply let it idle sleep for some time to throttle the dirtying task. In the mean while, kick off the per-bdi flusher thread to do background writeback IO. RATIONALS ========= - disk seeks on concurrent writeback of multiple inodes (Dave Chinner) If every thread doing writes and being throttled start foreground writeback, it leads to N IO submitters from at least N different inodes at the same time, end up with N different sets of IO being issued with potentially zero locality to each other, resulting in much lower elevator sort/merge efficiency and hence we seek the disk all over the place to service the different sets of IO. OTOH, if there is only one submission thread, it doesn't jump between inodes in the same way when congestion clears - it keeps writing to the same inode, resulting in large related chunks of sequential IOs being issued to the disk. This is more efficient than the above foreground writeback because the elevator works better and the disk seeks less. - lock contention and cache bouncing on concurrent IO submitters (Dave Chinner) With this patchset, the fs_mark benchmark on a 12-drive software RAID0 goes from CPU bound to IO bound, freeing "3-4 CPUs worth of spinlock contention". * "CPU usage has dropped by ~55%", "it certainly appears that most of the CPU time saving comes from the removal of contention on the inode_wb_list_lock" (IMHO at least 10% comes from the reduction of cacheline bouncing, because the new code is able to call much less frequently into balance_dirty_pages() and hence access the global page states) * the user space "App overhead" is reduced by 20%, by avoiding the cacheline pollution by the complex writeback code path * "for a ~5% throughput reduction", "the number of write IOs have dropped by ~25%", and the elapsed time reduced from 41:42.17 to 40:53.23. * On a simple test of 100 dd, it reduces the CPU %system time from 30% to 3%, and improves IO throughput from 38MB/s to 42MB/s. - IO size too small for fast arrays and too large for slow USB sticks The write_chunk used by current balance_dirty_pages() cannot be directly set to some large value (eg. 128MB) for better IO efficiency. Because it could lead to more than 1 second user perceivable stalls. Even the current 4MB write size may be too large for slow USB sticks. The fact that balance_dirty_pages() starts IO on itself couples the IO size to wait time, which makes it hard to do suitable IO size while keeping the wait time under control. Now it's possible to increase writeback chunk size proportional to the disk bandwidth. In a simple test of 50 dd's on XFS, 1-HDD, 3GB ram, the larger writeback size dramatically reduces the seek count to 1/10 (far beyond my expectation) and improves the write throughput by 24%. - long block time in balance_dirty_pages() hurts desktop responsiveness Many of us may have the experience: it often takes a couple of seconds or even long time to stop a heavy writing dd/cp/tar command with Ctrl-C or "kill -9". - IO pipeline broken by bumpy write() progress There are a broad class of "loop {read(buf); write(buf);}" applications whose read() pipeline will be under-utilized or even come to a stop if the write()s have long latencies _or_ don't progress in a constant rate. The current threshold based throttling inherently transfers the large low level IO completion fluctuations to bumpy application write()s, and further deteriorates with increasing number of dirtiers and/or bdi's. For example, when doing 50 dd's + 1 remote rsync to an XFS partition, the rsync progresses very bumpy in legacy kernel, and throughput is improved by 67% by this patchset. (plus the larger write chunk size, it will be 93% speedup). The new rate based throttling can support 1000+ dd's with excellent smoothness, low latency and low overheads. For the above reasons, it's much better to do IO-less and low latency pauses in balance_dirty_pages(). Jan Kara, Dave Chinner and me explored the scheme to let balance_dirty_pages() wait for enough writeback IO completions to safeguard the dirty limit. However it's found to have two problems: - in large NUMA systems, the per-cpu counters may have big accounting errors, leading to big throttle wait time and jitters. - NFS may kill large amount of unstable pages with one single COMMIT. Because NFS server serves COMMIT with expensive fsync() IOs, it is desirable to delay and reduce the number of COMMITs. So it's not likely to optimize away such kind of bursty IO completions, and the resulted large (and tiny) stall times in IO completion based throttling. So here is a pause time oriented approach, which tries to control the pause time in each balance_dirty_pages() invocations, by controlling the number of pages dirtied before calling balance_dirty_pages(), for smooth and efficient dirty throttling: - avoid useless (eg. zero pause time) balance_dirty_pages() calls - avoid too small pause time (less than 4ms, which burns CPU power) - avoid too large pause time (more than 200ms, which hurts responsiveness) - avoid big fluctuations of pause times It can control pause times at will. The default policy (in a followup patch) will be to do ~10ms pauses in 1-dd case, and increase to ~100ms in 1000-dd case. BEHAVIOR CHANGE =============== (1) dirty threshold Users will notice that the applications will get throttled once crossing the global (background + dirty)/2=15% threshold, and then balanced around 17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable memory in 1-dd case. Since the task will be soft throttled earlier than before, it may be perceived by end users as performance "slow down" if his application happens to dirty more than 15% dirtyable memory. (2) smoothness/responsiveness Users will notice a more responsive system during heavy writeback. "killall dd" will take effect instantly. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +08:00
Wu Fengguang	9d823e8f6b	writeback: per task dirty rate limit Add two fields to task_struct. 1) account dirtied pages in the individual tasks, for accuracy 2) per-task balance_dirty_pages() call intervals, for flexibility The balance_dirty_pages() call interval (ie. nr_dirtied_pause) will scale near-sqrt to the safety gap between dirty pages and threshold. The main problem of per-task nr_dirtied is, if 1k+ tasks start dirtying pages at exactly the same time, each task will be assigned a large initial nr_dirtied_pause, so that the dirty threshold will be exceeded long before each task reached its nr_dirtied_pause and hence call balance_dirty_pages(). The solution is to watch for the number of pages dirtied on each CPU in between the calls into balance_dirty_pages(). If it exceeds ratelimit_pages (3% dirty threshold), force call balance_dirty_pages() for a chance to set bdi->dirty_exceeded. In normal situations, this safeguarding condition is not expected to trigger at all. On the sqrt in dirty_poll_interval(): It will serve as an initial guess when dirty pages are still in the freerun area. When dirty pages are floating inside the dirty control scope [freerun, limit], a followup patch will use some refined dirty poll interval to get the desired pause time. thresh-dirty (MB) sqrt 1 16 2 22 4 32 8 45 16 64 32 90 64 128 128 181 256 256 512 362 1024 512 The above table means, given 1MB (or 1GB) gap and the dd tasks polling balance_dirty_pages() on every 16 (or 512) pages, the dirty limit won't be exceeded as long as there are less than 16 (or 512) concurrent dd's. So sqrt naturally leads to less overheads and more safe concurrent tasks for large memory servers, which have large (thresh-freerun) gaps. peter: keep the per-CPU ratelimit for safeguarding the 1k+ tasks case CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Andrea Righi <andrea@betterlinux.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +08:00
Wu Fengguang	7381131cbc	writeback: stabilize bdi->dirty_ratelimit There are some imperfections in balanced_dirty_ratelimit. 1) large fluctuations The dirty_rate used for computing balanced_dirty_ratelimit is merely averaged in the past 200ms (very small comparing to the 3s estimation period for write_bw), which makes rather dispersed distribution of balanced_dirty_ratelimit. It's pretty hard to average out the singular points by increasing the estimation period. Considering that the averaging technique will introduce very undesirable time lags, I give it up totally. (btw, the 3s write_bw averaging time lag is much more acceptable because its impact is one-way and therefore won't lead to oscillations.) The more practical way is filtering -- most singular balanced_dirty_ratelimit points can be filtered out by remembering some prev_balanced_rate and prev_prev_balanced_rate. However the more reliable way is to guard balanced_dirty_ratelimit with task_ratelimit. 2) due to truncates and fs redirties, the (write_bw <=> dirty_rate) match could become unbalanced, which may lead to large systematical errors in balanced_dirty_ratelimit. The truncates, due to its possibly bumpy nature, can hardly be compensated smoothly. So let's face it. When some over-estimated balanced_dirty_ratelimit brings dirty_ratelimit high, dirty pages will go higher than the setpoint. task_ratelimit will in turn become lower than dirty_ratelimit. So if we consider both balanced_dirty_ratelimit and task_ratelimit and update dirty_ratelimit only when they are on the same side of dirty_ratelimit, the systematical errors in balanced_dirty_ratelimit won't be able to bring dirty_ratelimit far away. The balanced_dirty_ratelimit estimation may also be inaccurate near @limit or @freerun, however is less an issue. 3) since we ultimately want to - keep the fluctuations of task ratelimit as small as possible - keep the dirty pages around the setpoint as long time as possible the update policy used for (2) also serves the above goals nicely: if for some reason the dirty pages are high (task_ratelimit < dirty_ratelimit), and dirty_ratelimit is low (dirty_ratelimit < balanced_dirty_ratelimit), there is no point to bring up dirty_ratelimit in a hurry only to hurt both the above two goals. So, we make use of task_ratelimit to limit the update of dirty_ratelimit in two ways: 1) avoid changing dirty rate when it's against the position control target (the adjusted rate will slow down the progress of dirty pages going back to setpoint). 2) limit the step size. task_ratelimit is changing values step by step, leaving a consistent trace comparing to the randomly jumping balanced_dirty_ratelimit. task_ratelimit also has the nice smaller errors in stable state and typically larger errors when there are big errors in rate. So it's a pretty good limiting factor for the step size of dirty_ratelimit. Note that bdi->dirty_ratelimit is always tracking balanced_dirty_ratelimit. task_ratelimit is merely used as a limiting factor. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +08:00
Wu Fengguang	be3ffa2764	writeback: dirty rate control It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N) when there are N dd tasks. On write() syscall, use bdi->dirty_ratelimit ============================================ balance_dirty_pages(pages_dirtied) { task_ratelimit = bdi->dirty_ratelimit * bdi_position_ratio(); pause = pages_dirtied / task_ratelimit; sleep(pause); } On every 200ms, update bdi->dirty_ratelimit =========================================== bdi_update_dirty_ratelimit() { task_ratelimit = bdi->dirty_ratelimit * bdi_position_ratio(); balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate; bdi->dirty_ratelimit = balanced_dirty_ratelimit } Estimation of balanced bdi->dirty_ratelimit =========================================== balanced task_ratelimit ----------------------- balance_dirty_pages() needs to throttle tasks dirtying pages such that the total amount of dirty pages stays below the specified dirty limit in order to avoid memory deadlocks. Furthermore we desire fairness in that tasks get throttled proportionally to the amount of pages they dirty. IOW we want to throttle tasks such that we match the dirty rate to the writeout bandwidth, this yields a stable amount of dirty pages: dirty_rate == write_bw (1) The fairness requirement gives us: task_ratelimit = balanced_dirty_ratelimit == write_bw / N (2) where N is the number of dd tasks. We don't know N beforehand, but still can estimate balanced_dirty_ratelimit within 200ms. Start by throttling each dd task at rate task_ratelimit = task_ratelimit_0 (3) (any non-zero initial value is OK) After 200ms, we measured dirty_rate = # of pages dirtied by all dd's / 200ms write_bw = # of pages written to the disk / 200ms For the aggressive dd dirtiers, the equality holds dirty_rate == N * task_rate == N * task_ratelimit_0 (4) Or task_ratelimit_0 == dirty_rate / N (5) Now we conclude that the balanced task ratelimit can be estimated by write_bw balanced_dirty_ratelimit = task_ratelimit_0 * ---------- (6) dirty_rate Because with (4) and (5) we can get the desired equality (1): write_bw balanced_dirty_ratelimit == (dirty_rate / N) * ---------- dirty_rate == write_bw / N Then using the balanced task ratelimit we can compute task pause times like: task_pause = task->nr_dirtied / task_ratelimit task_ratelimit with position control ------------------------------------ However, while the above gives us means of matching the dirty rate to the writeout bandwidth, it at best provides us with a stable dirty page count (assuming a static system). In order to control the dirty page count such that it is high enough to provide performance, but does not exceed the specified limit we need another control. The dirty position control works by extending (2) to task_ratelimit = balanced_dirty_ratelimit * pos_ratio (7) where pos_ratio is a negative feedback function that subjects to 1) f(setpoint) = 1.0 2) df/dx < 0 That is, if the dirty pages are ABOVE the setpoint, we throttle each task a bit more HEAVY than balanced_dirty_ratelimit, so that the dirty pages are created less fast than they are cleaned, thus DROP to the setpoints (and the reverse). Based on (7) and the assumption that both dirty_ratelimit and pos_ratio remains CONSTANT for the past 200ms, we get task_ratelimit_0 = balanced_dirty_ratelimit * pos_ratio (8) Putting (8) into (6), we get the formula used in bdi_update_dirty_ratelimit(): write_bw balanced_dirty_ratelimit = pos_ratio ---------- (9) dirty_rate Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:56 +08:00
Wu Fengguang	af6a311384	writeback: add bg_threshold parameter to __bdi_update_bandwidth() No behavior change. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:56 +08:00
Wu Fengguang	c8e28ce049	writeback: account per-bdi accumulated dirtied pages Introduce the BDI_DIRTIED counter. It will be used for estimating the bdi's dirty bandwidth. CC: Jan Kara <jack@suse.cz> CC: Michael Rubin <mrubin@google.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:56 +08:00
Linus Walleij	b1e3be0647	clocksource: fixup ux500 build problems Based on a patch from Arnd Bergmann this fixes up the build problem of assigning a non-existing global when the ux500 PRCMU timer is not linked in by passing its base address to the init function. We also add a missing <linux/errno.h> inclusion and staticize the dummy function. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2011-10-03 09:34:16 +02:00
Dan Williams	ac013ed1cb	[SCSI] isci: export phy events via ->lldd_control_phy() Allow the sas-transport-class to update events for local phys via a new PHY_FUNC_GET_EVENTS command to ->lldd_control_phy(). Fixup drivers that are not prepared for new enum phy_func values, and unify ->lldd_control_phy() error codes. These are the SAS defined phy events that are reported in a smp-report-phy-error-log command: * /sys/class/sas_phy/<phyX>/invalid_dword_count * /sys/class/sas_phy/<phyX>/running_disparity_error_count * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count * /sys/class/sas_phy/<phyX>/phy_reset_problem_count Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 13:24:26 -05:00
Dan Williams	b50102d3e9	[SCSI] isci: atapi support Based on original implementation from Jiangbi Liu and Maciej Trela. ATAPI transfers happen in two-to-three stages. The two stage atapi commands are those that include a dma data transfer. The data transfer portion of these operations is handled by the hardware packet-dma acceleration. The three-stage commands do not have a data transfer and are handled without hardware assistance in raw frame mode. stage1: transmit host-to-device fis to notify the device of an incoming atapi cdb. Upon reception of the pio-setup-fis repost the task_context to perform the dma transfer of the cdb+data (go to stage3), or repost the task_context to transmit the cdb as a raw frame (go to stage 2). stage2: wait for hardware notification of the cdb transmission and then go to stage 3. stage3: wait for the arrival of the terminating device-to-host fis and terminate the command. To keep the implementation simple we only support ATAPI packet-dma protocol (for commands with data) to avoid needing to handle the data transfer manually (like we do for SATA-PIO). This may affect compatibility for a small number of devices (see ATA_HORKAGE_ATAPI_MOD16_DMA). If the data-transfer underruns, or encounters an error the device-to-host fis is expected to arrive in the unsolicited frame queue to pass to libata for disposition. However, in the DONE_UNEXP_FIS (data underrun) case it appears we need to craft a response. In the DONE_REG_ERR case we do receive the UF and propagate it to libsas. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 13:20:03 -05:00
Vasu Dev	49a198898e	[SCSI] libfc: cache align struct fc_exch fields cache aligned xid and ex_lock beside removing holes. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:56:26 -05:00
Vasu Dev	ed26cfece6	[SCSI] libfc: cache align struct fc_fcp_pkt fields Re-arrange its fields to avoid padding and have better cacheline alignments. Removed not used start_time, end_time and last_pkt_time fields. This all reduced this struct size to 448 from 480 and that also reduced one cacheline on x86_64 beside eliminating 8 pads. However kept logical fields together. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:55:07 -05:00
Dan Williams	05a2a17317	[SCSI] libsas: fix warnings when checking sata/stp protocol Several sas drivers legitimately check the protocol against the union of SAS_PROTOCOL_SATA and SAS_PROTOCOL_STP. Provide a SAS_PROTOCOL_STP_ALL to silence warnings like: drivers/scsi/pm8001/pm8001_sas.c:438:3: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:798:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:1783:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:1886:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/isci/request.c:3565:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:51:53 -05:00
Dan Williams	d962480e9a	[SCSI] libsas: fix try_test_sas_gpio_gp_bit() build error If the user has disabled CONFIG_SCSI_SAS_HOST_SMP then libsas drivers will not be receiving smp-gpio frames and do not need this lookup code. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:47:40 -05:00
Dan Williams	f6e67035a9	[SCSI] libsas,libata: fix ->change_queue_{depth\|type} for sata devices Pass queue_depth change requests to libata, and prevent queue_type changes for ATA devices. Otherwise: 1/ we do not honor the libata specific restrictions on the queue depth 2/ libsas drivers that do not set sdev->tagged_supported are unable to change the queue_depth of ata devices via sysfs Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:30:30 -05:00
Luben Tuikov	ffaac8f45b	[SCSI] libsas: Allow expander T-T attachments Allow expander table-to-table attachments for expanders that support it. Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-02 12:23:11 -05:00
MyungJoo Ham	ce26c5bb95	PM / devfreq: Add basic governors Four cpufreq-like governors are provided as examples. powersave: use the lowest frequency possible. The user (device) should set the polling_ms as 0 because polling is useless for this governor. performance: use the highest freqeuncy possible. The user (device) should set the polling_ms as 0 because polling is useless for this governor. userspace: use the user specified frequency stored at devfreq.user_set_freq. With sysfs support in the following patch, a user may set the value with the sysfs interface. simple_ondemand: simplified version of cpufreq's ondemand governor. When a user updates OPP entries (enable/disable/add), OPP framework automatically notifies devfreq to update operating frequency accordingly. Thus, devfreq users (device drivers) do not need to update devfreq manually with OPP entry updates or set polling_ms for powersave , performance, userspace, or any other "static" governors. Note that these are given only as basic examples for governors and any devices with devfreq may implement their own governors with the drivers and use them. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-10-02 00:19:34 +02:00
MyungJoo Ham	a3c98b8b2e	PM: Introduce devfreq: generic DVFS framework with device-specific OPPs With OPPs, a device may have multiple operable frequency and voltage sets. However, there can be multiple possible operable sets and a system will need to choose one from them. In order to reduce the power consumption (by reducing frequency and voltage) without affecting the performance too much, a Dynamic Voltage and Frequency Scaling (DVFS) scheme may be used. This patch introduces the DVFS capability to non-CPU devices with OPPs. DVFS is a techique whereby the frequency and supplied voltage of a device is adjusted on-the-fly. DVFS usually sets the frequency as low as possible with given conditions (such as QoS assurance) and adjusts voltage according to the chosen frequency in order to reduce power consumption and heat dissipation. The generic DVFS for devices, devfreq, may appear quite similar with /drivers/cpufreq. However, cpufreq does not allow to have multiple devices registered and is not suitable to have multiple heterogenous devices with different (but simple) governors. Normally, DVFS mechanism controls frequency based on the demand for the device, and then, chooses voltage based on the chosen frequency. devfreq also controls the frequency based on the governor's frequency recommendation and let OPP pick up the pair of frequency and voltage based on the recommended frequency. Then, the chosen OPP is passed to device driver's "target" callback. When PM QoS is going to be used with the devfreq device, the device driver should enable OPPs that are appropriate with the current PM QoS requests. In order to do so, the device driver may call opp_enable and opp_disable at the notifier callback of PM QoS so that PM QoS's update_target() call enables the appropriate OPPs. Note that at least one of OPPs should be enabled at any time; be careful when there is a transition. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-10-02 00:19:15 +02:00
Linus Torvalds	f72a209a3e	Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs	2011-10-01 08:37:25 -07:00
Ingo Molnar	048b718029	Merge branch 'rcu/next' of git://github.com/paulmckrcu/linux into core/rcu	2011-10-01 14:21:36 +02:00
MyungJoo Ham	03ca370fbf	PM / OPP: Add OPP availability change notifier. The patch enables to register notifier_block for an OPP-device in order to get notified for any changes in the availability of OPPs of the device. For example, if a new OPP is inserted or enable/disable status of an OPP is changed, the notifier is executed. This enables the usage of opp_add, opp_enable, and opp_disable to directly take effect with any connected entities such as cpufreq or devfreq. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-09-30 22:35:12 +02:00
Johannes Berg	4b801bc969	mac80211: document client powersave With the addition of uAPSD and driver buffering the powersave handling has gotten quite complex. Add a section to the documentation to explain it for anyone wanting to implement it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:24 -04:00
Johannes Berg	37fbd90800	mac80211: allow out-of-band EOSP notification iwlwifi has a separate EOSP notification from the device, and to make use of that properly it needs to be passed to mac80211. To be able to mix with tx_status_irqsafe and rx_irqsafe it also needs to be an "_irqsafe" version in the sense that it goes through the tasklet, the actual flag clearing would be IRQ-safe but doing it directly would cause reordering issues. This is needed in the case of a P2P GO going into an absence period without transmitting any frames that should be driver-released as in this case there's no other way to inform mac80211 that the service period ended. Note that for drivers that don't use the _irqsafe functions another version of this function will be required. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:23 -04:00
Johannes Berg	40b9640883	mac80211: explicitly notify drivers of frame release iwlwifi needs to know the number of frames that are going to be sent to a station while it is asleep so it can properly handle the uCode blocking of that station. Before uAPSD, we got by by telling the device that a single frame was going to be released whenever we encountered IEEE80211_TX_CTL_POLL_RESPONSE. With uAPSD, however, that is no longer possible since there could be more than a single frame. To support this model, add a new callback to notify drivers when frames are going to be released. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:21 -04:00
Johannes Berg	deeaee197b	mac80211: reply only once to each PS-poll If a PS-poll frame is retried (but was received) there is no way to detect that since it has no sequence number. As a consequence, the standard asks us to not react to PS-poll frames until the response to one made it out (was ACKed or lost). Implement this by using the WLAN_STA_SP flags to also indicate a PS-Poll "service period" and the IEEE80211_TX_STATUS_EOSP flag for the response packet to indicate the end of the "SP" as usual. We could use separate flags, but that will most likely completely confuse drivers, and while the standard doesn't exclude simultaneously polling using uAPSD and PS-Poll, doing that seems quite problematic. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:18 -04:00
Johannes Berg	47086fc51a	mac80211: implement uAPSD Add uAPSD support to mac80211. This is probably not possible with all devices, so advertising it with the cfg80211 flag will be left up to drivers that want it. Due to my previous patches it is now a fairly straight-forward extension. Drivers need to have accurate TX status reporting for the EOSP frame. For drivers that buffer themselves, the provided APIs allow releasing the right number of frames, but then drivers need to set EOSP and more-data themselves. This is documented in more detail in the new code itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:15 -04:00
Johannes Berg	4049e09acd	mac80211: allow releasing driver-buffered frames If there are frames for a station buffered in the driver, mac80211 announces those in the TIM IE but there's no way to release them. Add new API to release such frames and use it when the station polls for a frame. Since the API will soon also be used for uAPSD it is easily extensible. Note that before this change drivers announcing driver-buffered frames in the TIM bit actually will respond to a PS-Poll with a potentially lower priority frame (if there are any frames buffered in mac80211), after this patch a driver that hasn't been changed will no longer respond at all. This only affects ath9k, which will need to be fixed to implement the new API. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:15 -04:00
Johannes Berg	948d887dec	mac80211: split PS buffers into ACs For uAPSD support we'll need to have per-AC PS buffers. As this is a major undertaking, split the buffers before really adding support for uAPSD. This already makes some reference to the uapsd_queues variable, but for now that will never be non-zero. Since book-keeping is complicated, also change the logic for keeping a maximum of frames only and allow 64 frames per AC (up from 128 for a station). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:12 -04:00
Johannes Berg	042ec45337	mac80211: let drivers inform it about per TID buffered frames For uAPSD implementation, it is necessary to know on which ACs frames are buffered. mac80211 obviously knows about the frames it has buffered itself, but with aggregation many drivers buffer frames. Thus, mac80211 needs to be informed about this. For now, since we don't have APSD in any form, this will unconditionally set the TIM bit for the station but later with uAPSD only some ACs might cause the TIM bit to be set. ath9k is the only driver using this API and I only modify it in the most basic way, it won't be able to implement uAPSD with this yet. But it can't do that anyway since there's no way to selectively release frames to the peer yet. Since drivers will buffer frames per TID, let them inform mac80211 on a per TID basis, mac80211 will then sort out the AC mapping itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:10 -04:00
Arik Nemtsov	07ba55d7f1	nl80211/mac80211: allow adding TDLS peers as stations When adding a TDLS peer STA, mark it with a new flag in both nl80211 and mac80211. Before adding a peer, make sure the wiphy supports TDLS and our operating mode is appropriate (managed). In addition, make sure all peers are removed on disassociation. A TDLS peer is first added just before link setup is initiated. In later setup stages we have more info about peer supported rates, capabilities, etc. This info is reported via nl80211_set_station(). Signed-off-by: Arik Nemtsov <arik@wizery.com> Cc: Kalyan C Gaddam <chakkal@iit.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:08 -04:00
Arik Nemtsov	dfe018bf99	mac80211: handle TDLS high-level commands and frames Register and implement the TDLS cfg80211 callback functions. Internally prepare and send TDLS management frames. We incorporate local STA capabilities and supported rates with extra IEs given by usermode. The resulting packet is either encapsulated in a data frame, or assembled as an action frame. It is transmitted either directly or through the AP, as mandated by the TDLS specification. Declare support for the TDLS external setup wiphy capability. This tells usermode to handle link setup and discovery on its own, and use the kernel driver for sending TDLS mgmt packets. Signed-off-by: Arik Nemtsov <arik@wizery.com> Cc: Kalyan C Gaddam <chakkal@iit.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:07 -04:00
Arik Nemtsov	768db3438b	mac80211: standardize adding supported rates IEs Relocate the mesh implementation of adding the (extended) supported rates IE to util.c, anticipating its use by other parts of mac80211. Signed-off-by: Arik Nemtsov <arik@wizery.com> Cc: Kalyan C Gaddam <chakkal@iit.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:06 -04:00
Arik Nemtsov	109086ce0b	nl80211: support sending TDLS commands/frames Add support for sending high-level TDLS commands and TDLS frames via NL80211_CMD_TDLS_OPER and NL80211_CMD_TDLS_MGMT, respectively. Add appropriate cfg80211 callbacks for lower level drivers. Add wiphy capability flags for TDLS support and advertise them via nl80211. Signed-off-by: Arik Nemtsov <arik@wizery.com> Cc: Kalyan C Gaddam <chakkal@iit.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:05 -04:00
Johannes Berg	3b9ce80ce9	cfg80211/mac80211: apply station uAPSD parameters selectively Currently, when hostapd sets the station as authorized we also overwrite its uAPSD parameter. This obviously leads to buggy behaviour (later, with my patches that actually add uAPSD support). To fix this, only apply those parameters if they were actually set in nl80211, and to achieve that add a bitmap of things to apply. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-09-30 15:57:03 -04:00
John W. Linville	8e00f5fbb4	Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/wl12xx/main.c	2011-09-30 14:52:29 -04:00
Ohad Ben-Cohen	0ed6d2d27b	iommu/core: let drivers know if an iommu fault handler isn't installed Make report_iommu_fault() return -ENOSYS whenever an iommu fault handler isn't installed, so IOMMU drivers can then do their own platform-specific default behavior if they wanted. Fault handlers can still return -ENOSYS in case they want to elicit the default behavior of the IOMMU drivers. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-09-30 16:40:32 +02:00
Tomi Valkeinen	212b0d50e2	OMAPDSS: remove vaddr from overlay info overlay_info struct, used to configure overlays, currently includes both physical and virtual addresses for the pixels. The vaddr was added to support more exotic configurations where CPU would be used to update a display, but it is not currently used and there has been no interest in the feature. Using CPU to update a screen is also less interesting now that OMAP4 has two LCD outputs. This patch removes the vaddr field, and modifies the users of omapdss accordingly. This makes the use of omapdss a bit simpler, as the user doesn't need to think if it needs to give the vaddr. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:17:32 +03:00
Tomi Valkeinen	562a060611	OMAPDSS: Add N800 panel driver This is a driver for N800's display, ported from the old omapfb. This is a slightly lighter version of the driver as not all features of the old driver can be ported without big changes to DSS2, and also because some of the HW features used in the old driver are unclear (e.g. the power management part). That said, the new driver works fine for basic use. Architecturally the driver is not as neat as it could be. N800's display HW consists of a display buffer chip and a panel, and ideally they would be represented by separate, independent drivers. This is not currently possible, and this driver contains both buffer chip and panel driver. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:17:30 +03:00
Tomi Valkeinen	bb36dbfd23	OMAPDSS: Taal: remove external backlight support Taal panel driver supports two kinds of backlight control: 1) using DSI commands sent to the panel to control the backlight, 2) calling function pointers going to the board file to control the backlight. The second option is a bit hacky, and will no longer be needed when the PWM driver supports the backlight features. After that we can use the standard PWM backlight driver. This patch removes the second backlight control mechanism, and adds a boolean field, use_dsi_backlight, to nokia_dsi_panel_data which the board file can use to inform whether the panel driver should use DSI commands to control the backlight. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:17:27 +03:00
Tomi Valkeinen	ba2eac9ed3	OMAP: DSS2: add panel-dvi driver We have currently panel-generic-dpi driver, which is a combined driver for dummy panels and also for DVI output. The aim is to split the panel-generic-dpi into two, one for fixed size dummy panels connected via DPI, and the other (this) for variable resolution output which supports DDC channel (in practice a DVI framer chip connected to DPI output). Original i2c code by: Ricardo Salveti de Araujo <ricardo.salveti@canonical.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:48 +03:00
Tomi Valkeinen	df4769c9c4	OMAP: DSS2: add detect() to omap_dss_driver struct detect() can be used to probe if the display is connected. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:44 +03:00
Tomi Valkeinen	3d5e0ef746	OMAP: DSS2: add read_edid() to omap_dss_driver struct read_edid() can be used to get the EDID information from the display. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:44 +03:00
Tomi Valkeinen	7f6f3c4bf3	OMAP: DSS2: DISPC: Add missing IRQ definitions Add IRQ definitions for missing OMAP4 IRQs: FRAMEDONEWB, FRAMEDONETV, WBBUFFEROVERFLOW. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:42 +03:00
Tomi Valkeinen	c90a78ecc2	OMAP: DSS2: DSI: Add comment about regn regn divider is one greater than the REGN divider in TRM. Add a comment to point this out. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:41 +03:00
Tomi Valkeinen	b44e45825d	OMAP: DSS2: HDMI: change regn definition regn divider is currently programmed to the registers without change, but when calculating clock frequencies it is used as regn+1. To make this similar to how DSI handles the dividers this patch changes the regn value to be used as such for calculations, but the value programmed to registers is regn-1. This simplifies the clock frequency calculations, makes it similar to DSI, and also allows us to use regn value 0 as undefined. Cc: Mythri P K <mythripk@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:41 +03:00
Archit Taneja	8af6ff0107	OMAP: DSS2: DSI Video mode support Add initial support for DSI video mode panels: - Add a new structure omap_dss_dsi_videomode_data in the member "panel" in omap_dss_device struct. This allows panel driver to configure dsi video_mode specific parameters. - Configure basic DSI video mode timing parameters: HBP, HFP, HSA, VBP, VFP, VSA, TL and VACT. - Configure DSI protocol engine registers for video_mode support. - Introduce functions dsi_video_mode_enable() and dsi_video_mode_disable() which enable/disable video mode for a given virtual channel and a given pixel format type. Things left for later - Add functions to check for errors in video mode timings provided by panel. - Configure timing registers required for command mode interleaving. Signed-off-by: Archit Taneja <archit@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:30 +03:00
Archit Taneja	a3b3cc2b88	OMAP: DSS2: Create an enum for DSI pixel formats Currently, DSI pixel info is only represented by the pixel size in bits using the pixel_size parameter in omap_dss_device struct's ctrl member. This is not sufficient information for DSI video mode usage, as two of the supported formats(RGB666 loosely packed, and RGB888) have the same pixel container size, but different data_type values for the video mode packet header. Create enum "omap_dss_dsi_pixel_format" which describes the pixel data format the panel is configured for. Create helper function dsi_get_pixel_size() which returns the pixel size of the given pixel format. Modify functions omapdss_default_get_recommended_bpp() and dss_use_replication() to use dsi_get_pixel_size(). Signed-off-by: Archit Taneja <archit@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:29 +03:00
Archit Taneja	b3b89c05cb	OMAP: DSS2: DSI: Introduce generic read functions Introduce read functions which use generic Processor-to-Peripheral transaction types. These are needed by some devices which may not support corresponding DCS commands. Add function dsi_vc_generic_send_read_request() which can send a short packet with 0, 1 or 2 bytes of request data and the corresponding generic data type. Rename function dsi_vc_dcs_read_rx_fifo() to dsi_vc_read_rx_fifo() and modify it to take the enum "dss_dsi_content_type" as an argument to use either DCS or GENERIC Peripheral-to-Processor transaction types while parsing data read from the device. Signed-off-by: Archit Taneja <archit@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:28 +03:00
Archit Taneja	5c716a04ed	OMAP: DSS2: DSI: Remove functions dsi_vc_dcs_read_1() and dsi_vc_dcs_read_2() Remove functions dsi_vc_dcs_read_1() and dsi_vc_dcs_read_2(), these are used when the panel is expected to return 1 and 2 bytes respecitvely. This was manily used for debugging purposes. These functions should be implemented in the panel driver if needed. Signed-off-by: Archit Taneja <archit@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:27 +03:00
Archit Taneja	6ff8aa3182	OMAP: DSS2: DSI: Introduce generic write functions Intoduce enum "dss_dsi_content_type" to differentiate between DCS and generic content types. Introduce short and long packet write functions which use generic Processor-to-Peripheral transaction types. These are needed by some devices which may not support corresponding DCS commands. Create common write functions which allow code reuse between DCS and generic write functions. Signed-off-by: Archit Taneja <archit@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>	2011-09-30 16:16:26 +03:00

... 14 15 16 17 18 ...

46526 Commits