New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speaker protection research #53
Comments
Useful resources from TI: How could we use I/V sense:
Bonus points for the latter two: hwmon and power subsystem reporting of average power draw. |
I believe second option is the most sane. Userspace support like in (3) requires some form of integrity assurance as well, since the kernel will have to first take a handshake to ensure that protection-enabled software is being ran, as well as having a watchdog to check the userspace process every (milliseconds interval I think?), which will require messy API between kernel and userspace audio. Also this closes the way for using native ALSA without a sound server. However if we limit safety with a power monitoring in kernel, this ensures protection in a more confident fashion. I believe the algorithm might make its way into ASoC subsystem, as it should be pretty much universal and it's the kernel job to drive hardware and protect it. I might try to develop various algorithmic solutions once some data is obtained. This is the security measure like a circuit breaker, not the operational mode. So kernel is only responsible for essentially cutting power off the amp by toggling hw mute on when something goes wrong. However, userspace still holds the responsibility for impulse response filters for quality sound and prevention of overcurrent. This protection should only get tripped when userspace bugs out or is misconfigured. Also, nobody forbids us to actually do both for best power and safety combo. Kernel both keeps an eye at monitoring parameters to shut amps down if there's a risk and streams that data into userspace. PipeWire (we mainly target it for now) tries to avoid fault mode and muting, it adjusts the volume and (maybe) filters to ensure the correct current. But still, kernel makes the thing safe even if we break PW, misconfigure it, cause bugs etc. So no watchdog/authentication handshake needed. |
That's all fine and good, but as I said in the first sentence of the thread, let's keep this issue about the research. We've already covered the obvious bases as far as possible implementations, and any further discussion is pointless until we actually find out what we need to do to keep the speakers in a safe operating envelope. |
Okay, so I did a bit of recon in macOS and it turns out this stuff is all out in the open in plist files. I think we can afford to look at this (the speaker limits are facts about the hardware, so not copyrightable).
<key>SpeakerName</key>
<string>LT</string>
<key>SpeakerGroup</key>
<integer>1</integer>
<key>IgnoreTelemetry</key>
<false/>
<key>OL_thermal</key>
<dict>
<key>Rshunt</key>
<real>0</real>
<key>Reb_ref</key>
<real>3.72</real>
<key>Rampout</key>
<integer>0</integer>
<key>T_sett_vc</key>
<real>104.9</real>
<key>tau_Tvc</key>
<real>3.9</real>
<key>T_sett_mg</key>
<real>129.5</real>
<key>tau_Tmg</key>
<integer>70</integer>
<key>ThermalFFSpeedupFactor</key>
<real>0.25</real>
<key>HardTempLimitHeadroom</key>
<real>10</real>
<key>TemperatureLimit</key>
<integer>140</integer>
</dict> The ADT also contains
The parameters are each 16 bits it seems. The first one that has data is DC resistance in milliohms. The second one is called 't' judging by some strings, and I'm not sure what it is (but it doesn't look individually calibrated?) If we only care about DC resistance for the safety model, the amp chips can measure that (and I was planning on adding that feature for experimentation), so it might be preferable to just have the driver/protection daemon do a measurement at startup and avoid having to drag around calibration data from the ADT (which could be out of date anyway, if the user has done a non-official speaker replacement). |
Okay, so let's do a bit of math. Assuming a constant resistance (which isn't true but it'll do for now), with the amp at 15.5 dBV, and input at -2dBFS, that's 13.5 dBV = 4.73 Vrms, current at 3.72Ω is 1.27 A. Almost exactly 6 W I was pumping into each tweeter. We have τ_Tvc = 3.9 and τ_Tmg = 70, and I think T_sett_vc=104.9 and T_sett_mg=129.5 are the thermal resistances in °C/W. That means that, if I'm mental mathing right, the tweeter voice coil would've reached >600°C after a few seconds, give or take (and longer as the magnet heats up). Yeah, no wonder I cooked it. Given both thermal resistances and max temp, the long term average power (= maximum safe kernel level cap) is on the order of 0.5W or so, 12 times lower than what I have. That's a little over 10 dB lower, add 2 for the signal headroom, you end up at 3.5 dBV max safe output. The lowest we can go for the amp gain is 11 dBV, so that means we need to cap the DVC at -7.5 dB or so on top. I sure hope I screwed up the math somewhere, because otherwise that's a ridiculously small safe volume limit to put in the kernel. |
More resources: https://liu.diva-portal.org/smash/get/diva2:954210/FULLTEXT01.pdf In particular, vs. these keys: <key>PilotAmplHi_dB</key>
<integer>-30</integer>
<key>PilotAmplLo_dB</key>
<integer>-40</integer>
<key>PilotUpperThres</key>
<integer>90</integer>
<key>PilotLowerThres</key>
<real>80</real>
<key>PilotDecayTime</key>
<real>0.05</real>
<key>PilotFreq</key>
<real>43.0664</real> ... make me think whatever Apple engineers came up with this were reading that same paper. |
Checks out for me. (Including crosschecking with the amp datasheet but assuming 3.72 ohms is right.)
So the steady state temperature difference should be power times the thermal resistance, giving the 600 degrees, hmm. How sure are we about the units? Also what about the taus? If they are in joules per degree, that would take a bit of time to heat up. |
The taus would be in seconds, since they are time constants. So after 3.9 seconds you'd be ~62% of the way to the target. Hence my 40-second sweeps should've effectively reached the new steady state temperature for the voice coil vs. magnet system (the magnet itself would only be part of the way up, since its time constant is 70) As for the units, strings from the DSP plugin say:
Of course, I can't know whether those map 1:1 to the parameters in the plist, but it would be weird if they don't. |
OK, convinces me |
Oh yeah, and this math is for sine waves, but a square wave has 1.414 times the RMS voltage, and therefore delivers 2 times the power. Assuming the amp definition of output gains is for sine waves, that means we need another factor of 2 (3dB) in our safety calculation (this is the I Won The Loudness War safety factor ;) - and this checks out, since the song is ~+3dB LUFS!). |
Also, since coil resistance increases with temperature, it wouldn't actually hit 600°C. With the tempco from the plist, you get 2x resistance at ~285°C, so let's say equilibrium at 300°C or so. That's ~soldering temperature, which also checks out with the damage not being an instant open circuit. High enough to solidly melt the plastic, low enough that the coil itself probably survived (though might've partially shorted out). |
So, before I go to sleep, let's see how badly I abused the woofers. 3.9Ω, same amp settings, so 1.21 A and about 5.7W. Those have a T_sett_vc of 32.1, so 183°C plus ambient, call it 200°C. Due to the tempco, equilibrium is more like at +126°C, where resistance is 45% higher. That's ~150°C with ambient. Max temperature is... 140°C. Checks out. I abused them but not too badly. Which means the existing kernel limit is actually not too shabby for the woofers. -2dB for the signal, -3dB for square waves, -1dB so we don't actually exceed 140°C, let's say -6dB would get us into 100% safe territory. Honestly? They're loud enough, -6dB is usable, and if I put them first in the channel map no-userspace playback will default to the woofers. So I'm actually tempted to enable audio for this model, putting a crazy cap on the tweeters but just -6dB on the woofers, and at least then people can start playing around with userspace and get some audio. |
Ah, but we need to consider the magnet temp too, that's 20°C/W. Let's round that up to same as the voice coil, and call it 2x in total. So -3dB on top of the prior calculation, for sustained power safety. |
My guess would be 't' refers to Qt (or Qts), which is maybe the most relevant Thielle-Small parameter. What's the value of the 't' parameter? |
After going through the speaker protection model stuff, I'm pretty sure it's temperature at which the DC resistance was measured. For this laptop it's 2801 for all speakers (28.01°C). It doesn't make much sense for a Thiele-Small parameter to be the same for woofers and tweeters, and temperature makes sense since it gives you the reference point for the DC impedance vs. temperature curve, which allows you to calculate voice coil temperature given that and the tempco (which is in the speaker protection profile). |
Makes sense -- except that temperature is not part of the Thielle-Small parameter set. |
Turns out this all works pretty much as expected. J314 tweeter voice coil temperature (blue) vs. model estimate (orange). Green is the magnet temperature estimate. The woofer model is a bit worse, but not much. Now we just need to implement this in the production daemon, and take a guess as to what a good temperature governor model is (since it's not entirely clear what Apple does here exactly). |
In current design: 1. PD and clt_path->s.dev are shared among connections. 2. every con[n]'s cleanup phase will call destroy_con_cq_qp() 3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and when clt_path->s.dev become zero, it will destroy PD. 4. when con[1] failed to create, con[1] will not take clt_path->s.dev, but it try to decreased clt_path->s.dev So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails, destroy_con_cq_qp(con[1]) will be called first which will destroy the PD while this PD is still taken by con[0]. Here, we refactor the error path of create_cm() and init_conns(), so that we do the cleanup in the order they are created. The warning occurs when destroying RXE PD whose reference count is not zero. rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe] Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe] Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001 R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00 FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_dealloc_pd+0x16/0x20 [rdma_rxe] ib_dealloc_pd_user+0x4b/0x80 [ib_core] rtrs_ib_dev_put+0x79/0xd0 [rtrs_core] destroy_con_cq_qp+0x8a/0xa0 [rtrs_client] init_path+0x1e7/0x9a0 [rtrs_client] ? __pfx_autoremove_wake_function+0x10/0x10 ? lock_is_held_type+0xd7/0x130 ? rcu_read_lock_sched_held+0x43/0x80 ? pcpu_alloc+0x3dd/0x7d0 ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client] rtrs_clt_open+0x24f/0x5a0 [rtrs_client] ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client] rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client] Fixes: 6a98d71 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/1682384563-2-4-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Could you maybe give a status update? Thanks :) |
@hackgrid I have an unfounded theory that that this will be a surprise feature drop, along with the official fedora distro release coming at some point later this month |
@DavidBuchanan314 That would be very nice. The next fedora release is scheduled to be released on 06.11. But can you please answer the question how you come to the assumption that the speaker support could be included in the fedora release? @marcan and everyone else, i would like to thank you for the great work. Can you say anything about how to support to move forward on the problem with the speaker? For me it is the last piece of the puzzle to be able to use Linux more or less productively on the Macbook. I assume some others feel the same way as I do. |
If you follow the IRC channels and Linas last webcam videos, you get the
impression it is coming soon.
Markus ***@***.***> schrieb am Fr., 29. Sept. 2023, 07:56:
… @DavidBuchanan314 <https://github.com/DavidBuchanan314> That would be
very nice. The next fedora release is scheduled to be released on 06.11.
But can you please answer the question how you come to the assumption that
the speaker support could be included in the fedora release?
@marcan <https://github.com/marcan> and everyone else, i would like to
thank you for the great work. Can you say anything about how to support to
move forward on the problem with the speaker? For me it is the last piece
of the puzzle to be able to use Linux more or less productively on the
Macbook. I assume some others feel the same way as I do.
—
Reply to this email directly, view it on GitHub
<#53 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAANGLH3MSMAGSO7ECEU4Q3X4ZPIPANCNFSM6AAAAAAQLBVQ2U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Markus-Be I don't have any evidence, hence "unfounded theory". I am reading tea leaves. It would be pointless to speculate any further. Nevertheless, I'm in the mood to write a bunch of pointless words, so here goes. It is my understanding that most of the speaker-related research has been completed, and the remaining work required is "just" integrating it all. Given that Fedora is the new flagship asahi distro, that's where I expect to see it integrated first. Given that both Fedora and Asahi have high quality standards, I'd expect speaker support out-of-the-box in an "official" release. Estimating software engineering progress is famously impossible, so when that official release arrives is anyone's guess. |
commit b63f904 upstream. When unregister pd capabilitie in tcpm, KASAN will capture below double -free issue. The root cause is the same capabilitiy will be kfreed twice, the first time is kfreed by pd_capabilities_release() and the second time is explicitly kfreed by tcpm_port_unregister_pd(). [ 3.988059] BUG: KASAN: double-free in tcpm_port_unregister_pd+0x1a4/0x3dc [ 3.995001] Free of addr ffff0008164d3000 by task kworker/u16:0/10 [ 4.001206] [ 4.002712] CPU: 2 PID: 10 Comm: kworker/u16:0 Not tainted 6.8.0-rc5-next-20240220-05616-g52728c567a55 #53 [ 4.012402] Hardware name: Freescale i.MX8QXP MEK (DT) [ 4.017569] Workqueue: events_unbound deferred_probe_work_func [ 4.023456] Call trace: [ 4.025920] dump_backtrace+0x94/0xec [ 4.029629] show_stack+0x18/0x24 [ 4.032974] dump_stack_lvl+0x78/0x90 [ 4.036675] print_report+0xfc/0x5c0 [ 4.040289] kasan_report_invalid_free+0xa0/0xc0 [ 4.044937] __kasan_slab_free+0x124/0x154 [ 4.049072] kfree+0xb4/0x1e8 [ 4.052069] tcpm_port_unregister_pd+0x1a4/0x3dc [ 4.056725] tcpm_register_port+0x1dd0/0x2558 [ 4.061121] tcpci_register_port+0x420/0x71c [ 4.065430] tcpci_probe+0x118/0x2e0 To fix the issue, this will remove kree() from tcpm_port_unregister_pd(). Fixes: cd099cd ("usb: typec: tcpm: Support multiple capabilities") cc: stable@vger.kernel.org Suggested-by: Aisheng Dong <aisheng.dong@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240311065219.777037-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Opening this to track the ongoing research into the safety envelope of the speakers (focusing mostly on testing; we can open other threads to discuss potential solutions once we have more data on what's needed).
The first target is an M2 MacBook Air, which is interesting because:
Initial testing done with povik's latest audio branch, including volume caps and tweeter HPF. Configured as follows:
No VISENSE support yet, that's in the works.
Testing using a sine sweep, 10-20kHz, 40 seconds, volume 0.8 (~-2dB). Initial testing was ad-hoc since I was not expecting damage this quickly, but a couple sweeps from Linux were enough to destroy the left tweeter and possibly damage the right one.
The failure mode for the tweeters seems to be a severe drop in volume, except for a small band of improved reproduction (that varies). My theory is that this is thermal damage, i.e. the tweeter melted itself and seized up. Left: dead tweeter; right: possibly-damaged but still functional tweeter.
In addition, there is rattling, which can be identified by the presence of sub harmonics. See the band in the middle of this sweep of the right tweeter:
It's not clear whether I actually damaged the woofers during this test. More controlled testing will follow. For now, my conclusion is that the tweeters are the major damage risk, and that damage occurs fairly quickly, even with just 40-second sweeps. If this is thermal, that suggests even short-term power excursions are dangerous.
After the initial testing, a follow-up with I Won The Loudness War successfully killed the right tweeter in the same way as the left one. I am sending the machine in for repairs.
Next steps:
For Linux testing:
A priori, it would seem that we have to set the tweeter amp gain much lower than we do now.
Bonus points once most of the work is done:
The text was updated successfully, but these errors were encountered: