Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker protection research #53

Open
marcan opened this issue Sep 13, 2022 · 22 comments
Open

Speaker protection research #53

marcan opened this issue Sep 13, 2022 · 22 comments

Comments

@marcan
Copy link
Member

marcan commented Sep 13, 2022

Opening this to track the ongoing research into the safety envelope of the speakers (focusing mostly on testing; we can open other threads to discuss potential solutions once we have more data on what's needed).

The first target is an M2 MacBook Air, which is interesting because:

  • macOS uses speaker protection (VI sense)
  • It has 4 speakers
  • It has funny grille-less speakers, which might need more EQ than usual to make sound good
  • I have AppleCare on it

Initial testing done with povik's latest audio branch, including volume caps and tweeter HPF. Configured as follows:

  • Tweeter HPF: 800 Hz
  • Woofer HPF: 2 Hz
  • Amp gain: 9 (15.5 dBV) (macOS: 15 / 18.5 dBV) for all speakers

No VISENSE support yet, that's in the works.

Testing using a sine sweep, 10-20kHz, 40 seconds, volume 0.8 (~-2dB). Initial testing was ad-hoc since I was not expecting damage this quickly, but a couple sweeps from Linux were enough to destroy the left tweeter and possibly damage the right one.

The failure mode for the tweeters seems to be a severe drop in volume, except for a small band of improved reproduction (that varies). My theory is that this is thermal damage, i.e. the tweeter melted itself and seized up. Left: dead tweeter; right: possibly-damaged but still functional tweeter.
Screenshot_20220913_131632

In addition, there is rattling, which can be identified by the presence of sub harmonics. See the band in the middle of this sweep of the right tweeter:
Screenshot_20220913_132052

It's not clear whether I actually damaged the woofers during this test. More controlled testing will follow. For now, my conclusion is that the tweeters are the major damage risk, and that damage occurs fairly quickly, even with just 40-second sweeps. If this is thermal, that suggests even short-term power excursions are dangerous.

After the initial testing, a follow-up with I Won The Loudness War successfully killed the right tweeter in the same way as the left one. I am sending the machine in for repairs.

Next steps:

  • Add in/out capture to hypervisor tracer so we can see both what macOS sends to the speakers and what it gets back from VISENSE
  • Capture impulse response of macOS EQ/DSP
  • Find out whether macOS seems to do any kind of explicit power capping using VISENSE (e.g. by trying sine sweeps at varying volumes)
  • Figure out maximum amplitude/power curves for macOS

For Linux testing:

  • Get VISENSE working
  • Switch to 5-second or shorter sweeps to reduce damage risk
  • Do multiple sweeps at increasing power levels and note any appearance of damage
  • Generate more controlled test files (e.g. 4ch sweeps) so we can more easily correlate things

A priori, it would seem that we have to set the tweeter amp gain much lower than we do now.

Bonus points once most of the work is done:

  • Try to destroy the speakers from macOS, so we can point at that when people inevitably complain that our volume caps are lower than macOS' (I strongly suspect I Won The Loudness War on loop in macOS will cause damage with Apple's DSP).
@dsseng
Copy link

dsseng commented Sep 13, 2022

Useful resources from TI:

How could we use I/V sense:

  • Tracing in m1n1/kernel during testing to ensure we set safe limits
  • Kernel support with safety mute implemented. This means if we find out overcurrent longer than specified, codec is being put into mute.
  • Kernel support with volume scaling. Can be hard and will likely require userspace support. Hard to ensure safety (I.e. userspace doesn't freeze or behave erroneously)

Bonus points for the latter two: hwmon and power subsystem reporting of average power draw.

@dsseng
Copy link

dsseng commented Sep 13, 2022

I believe second option is the most sane. Userspace support like in (3) requires some form of integrity assurance as well, since the kernel will have to first take a handshake to ensure that protection-enabled software is being ran, as well as having a watchdog to check the userspace process every (milliseconds interval I think?), which will require messy API between kernel and userspace audio. Also this closes the way for using native ALSA without a sound server.

However if we limit safety with a power monitoring in kernel, this ensures protection in a more confident fashion. I believe the algorithm might make its way into ASoC subsystem, as it should be pretty much universal and it's the kernel job to drive hardware and protect it. I might try to develop various algorithmic solutions once some data is obtained.

This is the security measure like a circuit breaker, not the operational mode. So kernel is only responsible for essentially cutting power off the amp by toggling hw mute on when something goes wrong. However, userspace still holds the responsibility for impulse response filters for quality sound and prevention of overcurrent. This protection should only get tripped when userspace bugs out or is misconfigured.

Also, nobody forbids us to actually do both for best power and safety combo. Kernel both keeps an eye at monitoring parameters to shut amps down if there's a risk and streams that data into userspace. PipeWire (we mainly target it for now) tries to avoid fault mode and muting, it adjusts the volume and (maybe) filters to ensure the correct current. But still, kernel makes the thing safe even if we break PW, misconfigure it, cause bugs etc. So no watchdog/authentication handshake needed.

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

That's all fine and good, but as I said in the first sentence of the thread, let's keep this issue about the research. We've already covered the obvious bases as far as possible implementations, and any further discussion is pointless until we actually find out what we need to do to keep the speakers in a safe operating envelope.

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

Okay, so I did a bit of recon in macOS and it turns out this stuff is all out in the open in plist files. I think we can afford to look at this (the speaker limits are facts about the hardware, so not copyrightable).

/System/Library/Audio/Tunings/AID<id> has all the DSP chains and speaker protection parameters, where <id> is /product/audio.acoustic-id from the ADT. For example, the speaker protection model for the M2 air (AID13) is in AID13/DSP/Strips/aid13-aufx-spp3-appl.plist. Snippet from the tweeter temperature model:

                        <key>SpeakerName</key>
                        <string>LT</string>
                        <key>SpeakerGroup</key>
                        <integer>1</integer>
                        <key>IgnoreTelemetry</key>
                        <false/>
                        <key>OL_thermal</key>
                        <dict>
                                <key>Rshunt</key>
                                <real>0</real>
                                <key>Reb_ref</key>
                                <real>3.72</real>
                                <key>Rampout</key>
                                <integer>0</integer>
                                <key>T_sett_vc</key>
                                <real>104.9</real>
                                <key>tau_Tvc</key>
                                <real>3.9</real>
                                <key>T_sett_mg</key>
                                <real>129.5</real>
                                <key>tau_Tmg</key>
                                <integer>70</integer>
                                <key>ThermalFFSpeedupFactor</key>
                                <real>0.25</real>
                                <key>HardTempLimitHeadroom</key>
                                <real>10</real>
                                <key>TemperatureLimit</key>
                                <integer>140</integer>
                        </dict>

The ADT also contains /product/audio.speaker-thiele-small, which is a thing. That blob structure roughly decodes like this:

02 00 5e 04

00000000 860f f10a 00000000 00000000 0000 30417073 
                                         'spA0'
00000000 610f f10a 00000000 00000000 0000 31417073 'spA1'
00000000 e60d f10a 00000000 00000000 0000 34417073 'spA4'
00000000 ba0d f10a 00000000 00000000 0000 35417073 'spA5'

97f3 checksum?

The parameters are each 16 bits it seems. The first one that has data is DC resistance in milliohms. The second one is called 't' judging by some strings, and I'm not sure what it is (but it doesn't look individually calibrated?)

If we only care about DC resistance for the safety model, the amp chips can measure that (and I was planning on adding that feature for experimentation), so it might be preferable to just have the driver/protection daemon do a measurement at startup and avoid having to drag around calibration data from the ADT (which could be out of date anyway, if the user has done a non-official speaker replacement).

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

Okay, so let's do a bit of math. Assuming a constant resistance (which isn't true but it'll do for now), with the amp at 15.5 dBV, and input at -2dBFS, that's 13.5 dBV = 4.73 Vrms, current at 3.72Ω is 1.27 A. Almost exactly 6 W I was pumping into each tweeter.

We have τ_Tvc = 3.9 and τ_Tmg = 70, and I think T_sett_vc=104.9 and T_sett_mg=129.5 are the thermal resistances in °C/W. That means that, if I'm mental mathing right, the tweeter voice coil would've reached >600°C after a few seconds, give or take (and longer as the magnet heats up).

Yeah, no wonder I cooked it.

Given both thermal resistances and max temp, the long term average power (= maximum safe kernel level cap) is on the order of 0.5W or so, 12 times lower than what I have. That's a little over 10 dB lower, add 2 for the signal headroom, you end up at 3.5 dBV max safe output. The lowest we can go for the amp gain is 11 dBV, so that means we need to cap the DVC at -7.5 dB or so on top.

I sure hope I screwed up the math somewhere, because otherwise that's a ridiculously small safe volume limit to put in the kernel.

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

More resources: https://liu.diva-portal.org/smash/get/diva2:954210/FULLTEXT01.pdf

In particular,

Screenshot_20220913_234706

vs. these keys:

                                <key>PilotAmplHi_dB</key>
                                <integer>-30</integer>
                                <key>PilotAmplLo_dB</key>
                                <integer>-40</integer>
                                <key>PilotUpperThres</key>
                                <integer>90</integer>
                                <key>PilotLowerThres</key>
                                <real>80</real>
                                <key>PilotDecayTime</key>
                                <real>0.05</real>
                                <key>PilotFreq</key>
                                <real>43.0664</real>

... make me think whatever Apple engineers came up with this were reading that same paper.

@povik
Copy link
Member

povik commented Sep 13, 2022

Okay, so let's do a bit of math. Assuming a constant resistance (which isn't true but it'll do for now), with the amp at 15.5 dBV, and input at -2dBFS, that's 13.5 dBV = 4.73 Vrms, current at 3.72Ω is 1.27 A. Almost exactly 6 W I was pumping into each tweeter.

Checks out for me. (Including crosschecking with the amp datasheet but assuming 3.72 ohms is right.)

We have τ_Tvc = 3.9 and τ_Tmg = 70, and I think T_sett_vc=104.9 and T_sett_mg=129.5 are the thermal resistances in °C/W. That means that, if I'm mental mathing right, the tweeter voice coil would've reached >600°C after a few seconds, give or take (and longer as the magnet heats up).

So the steady state temperature difference should be power times the thermal resistance, giving the 600 degrees, hmm. How sure are we about the units? Also what about the taus? If they are in joules per degree, that would take a bit of time to heat up.

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

The taus would be in seconds, since they are time constants. So after 3.9 seconds you'd be ~62% of the way to the target. Hence my 40-second sweeps should've effectively reached the new steady state temperature for the voice coil vs. magnet system (the magnet itself would only be part of the way up, since its time constant is 70)

As for the units, strings from the DSP plugin say:

speakerType A: VoiceCoil: DC resistance [Ohms]
speakerType A: VoiceCoil: thermal resistance [C/Watt]
speakerType A: Magnet: thermal resistance  [C/Watt]
speakerType A: Voice Coil: thermal time constant [s]
speakerType A: Magnet: thermal time constant [s]
speakerType A: Ambient temperature, [C]
speakerType A: Temperature limit [C]
speakerType A: Attack time
speakerType A: Release time
speakerType A: Temperature hard limit headroom [C]

Of course, I can't know whether those map 1:1 to the parameters in the plist, but it would be weird if they don't.

@povik
Copy link
Member

povik commented Sep 13, 2022

OK, convinces me

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

Oh yeah, and this math is for sine waves, but a square wave has 1.414 times the RMS voltage, and therefore delivers 2 times the power. Assuming the amp definition of output gains is for sine waves, that means we need another factor of 2 (3dB) in our safety calculation (this is the I Won The Loudness War safety factor ;) - and this checks out, since the song is ~+3dB LUFS!).

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

Also, since coil resistance increases with temperature, it wouldn't actually hit 600°C. With the tempco from the plist, you get 2x resistance at ~285°C, so let's say equilibrium at 300°C or so. That's ~soldering temperature, which also checks out with the damage not being an instant open circuit. High enough to solidly melt the plastic, low enough that the coil itself probably survived (though might've partially shorted out).

@marcan
Copy link
Member Author

marcan commented Sep 13, 2022

So, before I go to sleep, let's see how badly I abused the woofers. 3.9Ω, same amp settings, so 1.21 A and about 5.7W.

Those have a T_sett_vc of 32.1, so 183°C plus ambient, call it 200°C. Due to the tempco, equilibrium is more like at +126°C, where resistance is 45% higher. That's ~150°C with ambient. Max temperature is... 140°C.

Checks out. I abused them but not too badly. Which means the existing kernel limit is actually not too shabby for the woofers. -2dB for the signal, -3dB for square waves, -1dB so we don't actually exceed 140°C, let's say -6dB would get us into 100% safe territory.

Honestly? They're loud enough, -6dB is usable, and if I put them first in the channel map no-userspace playback will default to the woofers. So I'm actually tempted to enable audio for this model, putting a crazy cap on the tweeters but just -6dB on the woofers, and at least then people can start playing around with userspace and get some audio.

@marcan
Copy link
Member Author

marcan commented Sep 14, 2022

Ah, but we need to consider the magnet temp too, that's 20°C/W. Let's round that up to same as the voice coil, and call it 2x in total. So -3dB on top of the prior calculation, for sustained power safety.

@mbrennwa
Copy link

The parameters are each 16 bits it seems. The first one that has data is DC resistance in milliohms. The second one is called 't' judging by some strings, and I'm not sure what it is (but it doesn't look individually calibrated?)

My guess would be 't' refers to Qt (or Qts), which is maybe the most relevant Thielle-Small parameter. What's the value of the 't' parameter?

@marcan
Copy link
Member Author

marcan commented Sep 17, 2022

The parameters are each 16 bits it seems. The first one that has data is DC resistance in milliohms. The second one is called 't' judging by some strings, and I'm not sure what it is (but it doesn't look individually calibrated?)

My guess would be 't' refers to Qt (or Qts), which is maybe the most relevant Thielle-Small parameter. What's the value of the 't' parameter?

After going through the speaker protection model stuff, I'm pretty sure it's temperature at which the DC resistance was measured. For this laptop it's 2801 for all speakers (28.01°C). It doesn't make much sense for a Thiele-Small parameter to be the same for woofers and tweeters, and temperature makes sense since it gives you the reference point for the DC impedance vs. temperature curve, which allows you to calculate voice coil temperature given that and the tempco (which is in the speaker protection profile).

@mbrennwa
Copy link

Makes sense -- except that temperature is not part of the Thielle-Small parameter set.

@marcan
Copy link
Member Author

marcan commented Feb 6, 2023

Turns out this all works pretty much as expected. J314 tweeter voice coil temperature (blue) vs. model estimate (orange). Green is the magnet temperature estimate. The woofer model is a bit worse, but not much.

Now we just need to implement this in the production daemon, and take a guess as to what a good temperature governor model is (since it's not entirely clear what Apple does here exactly).

Screenshot_20230206_221432

marcan pushed a commit that referenced this issue Jul 14, 2023
In current design:
1. PD and clt_path->s.dev are shared among connections.
2. every con[n]'s cleanup phase will call destroy_con_cq_qp()
3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and
   when clt_path->s.dev become zero, it will destroy PD.
4. when con[1] failed to create, con[1] will not take clt_path->s.dev,
   but it try to decreased clt_path->s.dev

So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails,
destroy_con_cq_qp(con[1]) will be called first which will destroy the PD
while this PD is still taken by con[0].

Here, we refactor the error path of create_cm() and init_conns(), so that
we do the cleanup in the order they are created.

The warning occurs when destroying RXE PD whose reference count is not
zero.

 rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0)
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe]
 Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod
 CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe]
 Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff
 RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff
 RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001
 R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001
 R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00
 FS:  00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  rxe_dealloc_pd+0x16/0x20 [rdma_rxe]
  ib_dealloc_pd_user+0x4b/0x80 [ib_core]
  rtrs_ib_dev_put+0x79/0xd0 [rtrs_core]
  destroy_con_cq_qp+0x8a/0xa0 [rtrs_client]
  init_path+0x1e7/0x9a0 [rtrs_client]
  ? __pfx_autoremove_wake_function+0x10/0x10
  ? lock_is_held_type+0xd7/0x130
  ? rcu_read_lock_sched_held+0x43/0x80
  ? pcpu_alloc+0x3dd/0x7d0
  ? rtrs_clt_init_stats+0x18/0x40 [rtrs_client]
  rtrs_clt_open+0x24f/0x5a0 [rtrs_client]
  ? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client]
  rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client]

Fixes: 6a98d71 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/1682384563-2-4-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
@hackgrid
Copy link

hackgrid commented Aug 2, 2023

Could you maybe give a status update? Thanks :)

@DavidBuchanan314
Copy link

@hackgrid I have an unfounded theory that that this will be a surprise feature drop, along with the official fedora distro release coming at some point later this month

@Markus-Be
Copy link

@DavidBuchanan314 That would be very nice. The next fedora release is scheduled to be released on 06.11.

But can you please answer the question how you come to the assumption that the speaker support could be included in the fedora release?

@marcan and everyone else, i would like to thank you for the great work. Can you say anything about how to support to move forward on the problem with the speaker? For me it is the last piece of the puzzle to be able to use Linux more or less productively on the Macbook. I assume some others feel the same way as I do.

@hackgrid
Copy link

hackgrid commented Sep 29, 2023 via email

@DavidBuchanan314
Copy link

@Markus-Be I don't have any evidence, hence "unfounded theory". I am reading tea leaves. It would be pointless to speculate any further.

Nevertheless, I'm in the mood to write a bunch of pointless words, so here goes. It is my understanding that most of the speaker-related research has been completed, and the remaining work required is "just" integrating it all. Given that Fedora is the new flagship asahi distro, that's where I expect to see it integrated first. Given that both Fedora and Asahi have high quality standards, I'd expect speaker support out-of-the-box in an "official" release.

Estimating software engineering progress is famously impossible, so when that official release arrives is anyone's guess.

svenpeter42 pushed a commit that referenced this issue Apr 17, 2024
commit b63f904 upstream.

When unregister pd capabilitie in tcpm, KASAN will capture below double
-free issue. The root cause is the same capabilitiy will be kfreed twice,
the first time is kfreed by pd_capabilities_release() and the second time
is explicitly kfreed by tcpm_port_unregister_pd().

[    3.988059] BUG: KASAN: double-free in tcpm_port_unregister_pd+0x1a4/0x3dc
[    3.995001] Free of addr ffff0008164d3000 by task kworker/u16:0/10
[    4.001206]
[    4.002712] CPU: 2 PID: 10 Comm: kworker/u16:0 Not tainted 6.8.0-rc5-next-20240220-05616-g52728c567a55 #53
[    4.012402] Hardware name: Freescale i.MX8QXP MEK (DT)
[    4.017569] Workqueue: events_unbound deferred_probe_work_func
[    4.023456] Call trace:
[    4.025920]  dump_backtrace+0x94/0xec
[    4.029629]  show_stack+0x18/0x24
[    4.032974]  dump_stack_lvl+0x78/0x90
[    4.036675]  print_report+0xfc/0x5c0
[    4.040289]  kasan_report_invalid_free+0xa0/0xc0
[    4.044937]  __kasan_slab_free+0x124/0x154
[    4.049072]  kfree+0xb4/0x1e8
[    4.052069]  tcpm_port_unregister_pd+0x1a4/0x3dc
[    4.056725]  tcpm_register_port+0x1dd0/0x2558
[    4.061121]  tcpci_register_port+0x420/0x71c
[    4.065430]  tcpci_probe+0x118/0x2e0

To fix the issue, this will remove kree() from tcpm_port_unregister_pd().

Fixes: cd099cd ("usb: typec: tcpm: Support multiple capabilities")
cc: stable@vger.kernel.org
Suggested-by: Aisheng Dong <aisheng.dong@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240311065219.777037-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants