Announcement

Collapse
No announcement yet.

Thoughts on this AP20 Automation Glitch?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Carsten Kurz View Post
    As the AP20 offers plenty of presets, I would just create a preset for 'mute' and name it like that. Switching to 'Mute' would then also indicate at least '0' on the display.
    Having a long fade attached to Mute generically is probably not a good idea, as usually, when you want to mute, you really want it to mute quickly.
    Maybe i'm missing a nuance. But if you have a DCI format with 0 fade configured. And you have a "Mute" format like you suggest with fades configured. My impression is that triggering the switch will instantly mute the playing DCI format, and then "fade up" in your configured time to 0.0 on the your suggested Mute custom format.

    Is that incorrect? Does it actually fade out the DCI format that was configured with no fades?

    Getting to a mute button with zero fades is not a problem, as that is how the factory mute behaves (and does not seem changable), and is doable in automation too.

    I could see your suggestion being useful if there WAS a way to change the factory mute fade behavior, to preserve an instant mute. Maybe that was what you were really pointing out? In the hypothetical if it were possible to change the factory mute behavior.
    Last edited by Ryan Gallagher; 07-31-2025, 01:43 PM.

    Comment


    • #32
      Your point about mute having long fades is valid though, not ideal when what you are actually trying to do is mute, like during tech-checks etc. And re-training people to use some other button when they really are trying to mute in a rush seems problematic too.

      With that in mind, if automation methods are out, i'm starting to come around to the idea of having a "DCI 5.1 Preroll" and "DCI 7.1 Preroll" format. Those could be configured with zero or minimal fades. Then the features would play with our traditional "DCI 5.1 or 7.1" where a 5s fade out configuration could exist. You'd never hear it under normal conditions cause you you don't switch mid feature... but in the event of a Q&A interruption request, you could change away to any other format and trigger the 5s format fade.

      It just means a couple more formats to maintain, but as they inherit most everything, not much extra work... if Dolby visits and makes a new EQ, I'd just have to remember to apply them to those preroll formats too. We'd only have to train operators to use them for pre-show content... and I feel like the labeling helps encourage that.

      Comment


      • #33
        A scheduled reboot bothers me as well. I cringe when our users suggest they need to do it. To me that is a 'work-around' and not a 'fix'. Even worse, it assumes that there is in fact an issue and it is the device, not something else easily resolvable if we were to actually take the time to debug it. Instead, just start over. It is probably the most successful generic technical support solution ever.

        To clarify, the AP20 works properly but after some time it appears that the fade macro locks up the AP20 in some mode where its displayed status does not align with reality? Is that a fair assessment?

        Now bumping the hard controls (kicking it) gets it to again do what it says it is doing. But from that point on it does not perform the fader actions? Not until you start over with a reboot?

        So this is not a network problem with the Doremi (for which there is precedent). We all agree that this is likely a failure in the AP20 macro execution engine given the uncommon stress that Ryan has brought upon it? Am I on the right page here?

        Now there are alternatives (maybe not perfect) to avoid the issue but Ryan has lost confidence in the AP20. Ergo the testing.

        I am guessing that Datasat has moved on to the AP25 and the AP20 firmware is likely no longer easily debugged?

        This is one of those issues that you run into with mature products. It takes some weird combination of events and time to trigger some dormant issue that no one has seen before. We call those 'gremlins' by the way.

        An example would be a bad pointer in the macro engine. Maybe that pointer addressed a memory buffer that has been freed. But it still gets used and written to when the macro completes or something like that. So some memory location gets improperly written. Now 999 out of 1000 times that memory location has not been reallocated. So nothing comes of it. But given enough time it eventually does overwrite something that locks the system up somehow.

        Another twist is when a memory block gets freed twice. The second free() doesn't cause any problem as the system decides the block isn't valid. But there is a chance that code elsewhere needed some memory and it got a block from the same location. Now that gets incorrectly freed and the rug is pulled out from underneath the system.

        Those are a real challenge to find. It is fun though. I added a memory checker to the JNIOR OS (JANOS) which catches and reports these things. So we don't get far (not to a release) without knowing about it. That then tells me exactly what code is at fault. Works nice. I don't think Linux has quite the same thing. It catches memory leaks too. Those can, given enough time, run the system too low on memory and cause portions to lock up.

        Ryan, if you can find a work-around, that seems to be the best path. Maybe abandoning the macros is the thing to do. But I would (repeatedly) remind Datasat that they should look into it. Until they know exactly what it is, they don't know how many people are experiencing it. Those aren't likely going to be happily upgrading to the AP25. Maybe you can borrow an AP25 from them to see if you can break that? Then you would get their attention.

        Comment


        • #34
          Originally posted by Bruce Cloutier View Post
          To clarify, the AP20 works properly but after some time it appears that the fade macro locks up the AP20 in some mode where its displayed status does not align with reality? Is that a fair assessment?

          Now bumping the hard controls (kicking it) gets it to again do what it says it is doing. But from that point on it does not perform the fader actions? Not until you start over with a reboot?

          So this is not a network problem with the Doremi (for which there is precedent). We all agree that this is likely a failure in the AP20 macro execution engine given the uncommon stress that Ryan has brought upon it? Am I on the right page here?

          [[ SNIP ]]

          Ryan, if you can find a work-around, that seems to be the best path. Maybe abandoning the macros is the thing to do. But I would (repeatedly) remind Datasat that they should look into it. Until they know exactly what it is, they don't know how many people are experiencing it. Those aren't likely going to be happily upgrading to the AP25. Maybe you can borrow an AP25 from them to see if you can break that? Then you would get their attention.
          Yup. That is pretty much accurate. The individual fader setting commands do come from Doremi, I'm not having AP20 do any scripting there. I had only built out a couple scripted ones for executing fades mid content (called by the Doremi, and executed by the AP20 macro engine talking to itself).

          I have stopped using my fade to 0.0 scripts where not needed... leaning on an added format fade time coming out of walkin music etc. Q&As that interrupt the playback audio is the one place where I'd still enjoy having something, but because of my current AP20 automation trust issues... i'm leaning towards duplicating a 5.1 format without fades for preroll use, and letting all the typical formats we use for features include a long fade-out via the AP20 format config. If you ever "break away" from the feature format, it will execute that 5s fade, but otherwise won't impact pre-show format changes that are faster paced. I suppose it might make a Q&A that actually waits till the feature ends a bit harder to transition too... but for most Q&As we have full staffing and their audio is through the live PA, so it doesn't matter (to us).

          Also the issue has not recurred since my OP. I still do a quick test, but I don't execute the abusive scripts in doing so, which I'll probably delete soon if the format approach satisfies most situations.

          Comment


          • #35
            I wonder if avoiding the "abusive scripts" just makes it take more time for the system to crash. If it were a memory leak that happens on each command, and it crashes on 10,000 commands, shorter scripts just makes it take longer go get to the 10,000 commands.

            Most of the embedded stuff I've written does malloc() to set up various buffers during initialization and never frees them. They are used continuously by the system. So, there's never a memory leak (no missing free()). I realize, however, that on more complex systems, dynamic memory allocation is appropriate (and dangerous).

            Comment


            • #36
              Only a lot of testing will reveal a better idea of what is going on there. I don't know how the master volume is physically adjusted on the AP20. There are different ways to do this, some could be pure DSP software, some may involve setting registers in a hardware device, which may act up under stress or tight timing. I would still bring this issue up to Datasat/ATI, maybe, when seeing the macro, they will be able to track the problem down. If it is solved by a soft-powercycle, this is also an easy workaround, as it takes just a few seconds to recover. Why not always switch the processor off that way? The fans will live longer.

              Comment


              • #37
                Back to the "just turn it off / reboot" it method. I had asked earlier in the thread if anyone knew if the soft power cycle is "safe" with the amps unmuted?

                While we will do it once the glitch occurs, was have not resorted to that as a "preventative" step because of the extra care that one typically takes when powering on equipment upstream of the amplifiers. So when we do have to do it, out of an abundance of caution, it involves facepanel mutes on the surround amps, and opening up EAWPilot to get to the stage amps to do the same. A bit advanced to expect for every single operator that might be in the booth, house staff have been trained to do it, but have had little real situation practice, but on the occasion we have overhire replacements it might be less reliable. I also don't necessarily even want overhire operators to even interact with EAWPilot!

                Doubling back down on macros, maybe an "amp mute/unmute" is possible to script into the AP20 automation buttons, assuming those would do what they are supposed to with outbound automation if the glitch were occurring? That would make it a little more accessible. But again, not even sure it's needed in practice.

                There is an hdmi disembedder used to get a booth feed of hdmi walkin music at a volume disconnected from fader level, and it certainly freaks out a little when the soft power cycle is done, if you had anything playing it will come blasting out of the poor little booth speakers during the power cycle... but AP20 is not controlling that volume, it's just some weird behavior when the other device loses it's HDMI connection.

                Comment


                • #38
                  Originally posted by Harold Hallikainen View Post
                  Most of the embedded stuff I've written does malloc() to set up various buffers during initialization and never frees them. They are used continuously by the system. So, there's never a memory leak (no missing free()). I realize, however, that on more complex systems, dynamic memory allocation is appropriate (and dangerous).
                  The Series 3 JNIOR used an OS supplied by Dallas Semiconductor. Those were based upon an embedded JVM. Part of Sun's 90s push to embed Java everywhere. But their memory manager and garbage collector are just horrible. So to the memory handling concerns I mentioned in the prior post you can add memory fragmentation. After a while the memory on those old JNIORs gets so broken up that there isn't a large enough contiguous block of memory available to execute processes. Those being run in memory on those units. So a reboot is (unfortunately) the answer. Actually a REBOOT -A which reformats the memory. There is a reboot that runs a defrag procedure as an alternative. But that isn't always successful at cleaning things up depending on what is unmovable. We tried hard to reuse memory and not challenge the memory manager. We had to get away from that.

                  With the Series 4 (JANOS) we used no 3rd party code. Nothing was given to us. No initial seeding of bugs and performance traps. I wrote a proper memory manager which has now stood the test of time (knock on wood). Each process (the system is preemptive multi-tasking) owns the memory it allocates. So when a process terminates, all of its memory is released. So your malloc() doesn't need a free() if the process is to be short-lived. I don't like to assume that and when temporary buffers are needed they are still released. We all make mistakes and so I have a little code at a low level (memory checker) that catches slop on my part. So not so 'dangerous' but requiring attention to detail.

                  The memory manager in JANOS uses AVL trees to manage free blocks. Adjacent blocks when freed are coagulated on the fly. Immutable memory is biased to one end of memory and the rest to the other. Only the applications on the Series 4 are Java and each application runs its own instance of the JVM. Each instance is asked periodically to mark all of its in-scope objects so anything no longer needed is swept up. This is integrated with the low level memory management. All works really well.

                  But if something like what Ryan has experienced with the AP20 happened with the JNIOR, we probably also wouldn't have been able to reproduce it. I can say that when we have reports from users of weird things that we cannot reproduce, we end up testing and trying everything we can for days afterwards. I eventually get involved and start to lose sleep over it. We have, in many cases, eventually distilled things to a point where we find a suspected cause. We cut an update and have the customer validate our possible fix.

                  We've had to deal with some interesting things along these lines. The worst I think was some overzealous IT people running scanners on internal networks looking for systems that are vulnerable to attack. Basically, the approach to that is to attack everything yourself I guess, and aggressively! The JNIOR (Series 4) generally survives that abuse but there was one creative attack that caused a glitch. Took us a long time to discover that was at fault. The customer didn't even know that their IT was doing anything.

                  So these things can be anything. It still might not be Ryan's overzealous macros.


                  Comment


                  • #39
                    We usually soft- and hard-power down the AP20 with the amps on, never noticed something. Okay, usually I'm in the booth, but often also me or somebody else is cleaning the auditorium when that is done. In 12 years using the AP20, we certainly would have noticed something.

                    Comment


                    • #40
                      Originally posted by Carsten Kurz View Post
                      We usually soft- and hard-power down the AP20 with the amps on, never noticed something. Okay, usually I'm in the booth, but often also me or somebody else is cleaning the auditorium when that is done. In 12 years using the AP20, we certainly would have noticed something.
                      Good to know. I was just going on learned habits of best practices. I expect the relays you hear clicking in the unit are there specifically to avoid such transients.

                      Comment


                      • #41
                        Ideally you can power off the sound processor without sending a damaging transient to the amplifier (or, for that matter, turn the processor on). We paid special attention to this in the USL JSD series. There are analog switches between the line driver chips and the outputs. We made sure the switches were off at power up and during power down. The line driver output voltage would swing all over the place as the supply voltages went up and down.

                        On this specific instance, if you suspect that there is an issue after a large number of commands are received, it would be great to try to generate a test case so the manufacturer can duplicate the problem and know when it is fixed. Problems that can't be duplicated are very difficult to solve.

                        But, I remember something from the radio program "Car Talk" where they discussed an issue that could not be reproduced. In that case (assuming it was not a design problem), they said to replace the part that was most likely to cause the problem. Sending the customer away without doing anything does not fix the problem, and this might...

                        Anyway, a repeatable test would be good!

                        Comment


                        • #42
                          I could easily create an SPL with a ton of copies of my fade macro being called. If I could force the issue in short order after a reboot I think they would at least get it to reproducible territory.

                          Comment

                          Working...
                          X