Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 914614 - media-video/ffmpeg: 'av1_twopass_postencode_update: Assertion `cpi->twopass_frame.stats_in > twopass->stats_buf_ctx->stats_in_start' failed' with media-libs/libaom-3.6.1
Summary: media-video/ffmpeg: 'av1_twopass_postencode_update: Assertion `cpi->twopass_f...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Media-video project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-24 21:10 UTC by bzipitidoo
Modified: 2024-02-02 00:14 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
ffmpeg command and output (libaom_error.txt,5.61 KB, text/plain)
2023-09-24 21:15 UTC, bzipitidoo
Details
emerge --info (libaom_error_emerge_info.txt,6.44 KB, text/plain)
2023-09-24 22:11 UTC, bzipitidoo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bzipitidoo 2023-09-24 21:10:42 UTC
4 days into transcoding 48 minutes of MPEG4 video to AV1, ffmpeg aborted with this message:

ffmpeg: /var/tmp/portage/media-libs/libaom-3.6.1/work/libaom-3.6.1/av1/encoder/pass2_strategy.c:4047: av1_twopass_postencode_update: Assertion `cpi->twopass_frame.stats_in > twopass->stats_buf_ctx->stats_in_start' failed.
Aborted

It produced 13 minutes 21 seconds of working AV1 video before aborting.

Yes, that's right, libaom is so incredibly slow that it needs a full day to encode a mere 3 to 4 minutes of video.  So, no, I am not going to try to reproduce.

Reproducible: Didn't try

Steps to Reproduce:
Perhaps this abort would happen no matter what video was being encoded, as long as it is at least 15 minutes long?  Have had success encoding very short videos of less than 1 minute.



Bug 908139 may be related.
Comment 1 bzipitidoo 2023-09-24 21:15:17 UTC
Created attachment 871262 [details]
ffmpeg command and output

The files in question:

-rwxr-xr-x 1 u u 456792454 Sep 19 22:55 10144-p3_H264_1200kbps_AAC_und_ch2_64kbps.mp4
-rw-r--r-- 1 u u  12320768 Sep 23 04:40 p3.webm
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-24 21:16:53 UTC
Please include ffmpeg -version and emerge --info.

I'll add libaom-3.7.0 in a minute, but you'll likely need to report this upstream if you can reproduce it.
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-24 21:17:15 UTC
I also think it's probably time to just stable ffmpeg-6. ffmpeg-4 is technically supported still upstream but they don't backport loads of fixes..
Comment 4 bzipitidoo 2023-09-24 22:09:28 UTC
It's ffmpeg 4.4.4, as you can see in the attachment.

$ emerge --info
Portage 3.0.49 (python 3.11.5-final-0, default/linux/amd64/17.1/desktop, gcc-12, glibc-2.37-r3, 5.17.4-x86_64 x86_64)
=================================================================
System uname: Linux-5.17.4-x86_64-x86_64-AMD_Ryzen_5_5600G_with_Radeon_Graphics-with-glibc2.37
KiB Mem:    32281968 total,   2643564 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Sun, 17 Sep 2023 04:00:01 +0000
Head commit of repository gentoo: 8ade673e48a5d41eca988c291c0a0f2c9a28c5d5
Timestamp of repository steam-overlay: Thu, 14 Sep 2023 05:49:13 +0000
Head commit of repository steam-overlay: 603076369c21489ea4ddbdbed19616cd437dde4f

Rest of emerge --info is in the next attachment.
Comment 5 bzipitidoo 2023-09-24 22:11:07 UTC
Created attachment 871263 [details]
emerge --info
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-24 22:13:04 UTC
(In reply to bzipitidoo from comment #4)
> It's ffmpeg 4.4.4, as you can see in the attachment.

I'd written the comment when you hadn't posted it, then collided later.
Comment 7 7e3 2023-09-26 09:16:57 UTC
I can confirm I've has this problem across multiple ffmpeg 4.X and since libaom went past 3.5.0 (I think at some stage 3.5.1 was in portage?).  Strangely I also have an AMD which might contribute (but read more below)?  I also see the problem after the processing command executes and starts to process for some time.

It's an esoteric error that I've looked into elsewhere and the code for some time and was just camping out until I could return to the problem, all the while I parked myself at libaom 3.5.0.

I do have some notes on the issue, but sorry I don't have them to hand.  Something lead me to believe months ago that the parsing of parameters from ffmpeg 4 to libaom > 3.5.0 is at fault.

Sam's comments combined with my research+instinct might suggest it is a Gentoo specific problem, or at least, the combination of an older ffmpeg 4.X + a newer libaom>3.5.0?

Further, I *think*, there was a change in parameter option parsing in later ffmpegs.  Incidentally, the discovery of this was from diverting myself to play with the libsvt1av1 as a preferred encoder in ffmpeg!

I think Sam's instinct is already on it and it might be worth experimenting with the libaom3.6.1+ffmpeg6.X combination and see if it fixes it?
Comment 8 7e3 2023-09-27 06:45:43 UTC
Excuse my spelling errors above.

More info:  same behaviour with amd64 and ~arm64

Will try ffmpeg 6 next...
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-27 07:04:19 UTC
Came up at https://www.reddit.com/r/AV1/comments/11quo2t/aom_360_fails_to_encode/ ...

I can't find any bug that they would've filed though.
Comment 10 7e3 2023-09-27 11:25:41 UTC
(In reply to Sam James from comment #9)
> Came up at
> https://www.reddit.com/r/AV1/comments/11quo2t/aom_360_fails_to_encode/ ...
> 
> I can't find any bug that they would've filed though.

Nice find Sam, I wish I'd found that back then, a compadre would've helped.
I think from memory there was something else similar quoting the error but about as useful in some github entry, or the like, obscure but about as equally looked into.  Honest! I DID look into it!! :)

Is it just me or did I notice ffmpeg for gentoo amd64 jumped?
Mine was skipped as it had many dependency conflicts.
I'm about to force a less important amd64 built to go-forth.

Did I read that correctly; that reddit post seems to imply it's nothing to do with ffmpeg and is using aomenc raw, huh?

Maybe I should try that first, hmm...
Comment 11 7e3 2023-09-27 11:30:17 UTC
One thing I *am* sure of is that I'm camped out on libaom-3.5.0 and that works, thereafter gives the error.  So their reddit of 3.6.0 correlates to some degree (I vaguely remember picking it up just prior with 3.5.1 *I think* but I didn't do any diffs or anything that deep).
Comment 12 7e3 2023-09-27 13:23:15 UTC
OK, some limited progress, re-involving ffmpeg...

FFMPEG does not pass the requisite parameters >libaom-3.5.0 just a limited subset.

Somewhere between 3.5.0 and 3.6.0 there was a default change in libaom or if using aomenc of:
--force-video-mode=<arg>
Force video mode even for a single frame (0: false (default), 1: true)

It changed from "1" to "0".

I believe this is largely in relation to 2-pass as default, as well as other parameters.  It *might* be related to this post:
https://www.reddit.com/r/AV1/comments/lfheh9/encoder_tuning_part_2_making_aomencav1libaomav1/

Where it states:
"The behaviour of 2-pass mode seems to have changed in late 2020, making it less critical to overall quality than in libvpx-vp9.

--webm

– Enables WebM output for the encoder, and passes the encoder flags set. It is not necessary to enable it, but since it passes the encoder flags, I would use it."

I have not idea why passing encoder flags is tied to webm format but it seems to be partly what led me to some progress here.

This may not be the reason but it seems that 2-pass is important.  2-pass is the usual now for modern codecs like AV1, because of key-frame placement and how that can be best done automatically and efficiently relative to compression ratio.  (see here: https://www.reddit.com/r/AV1/comments/ge7hvu/keyframes_libaom_and_you_a_basic_primer/)

Either way, I don't see a way to pass the usual standard of the default of "force-video-mode" except for directly to the libaom encoder via:
" --aom-params force-video-mode=1"

This give the old behaviour, al la 3.5.0, in the ffmpeg command to evoke the libaom-av1 encoder.  I *think* this may very well be expected using ffmpeg 4.X because the old ffmpeg is capable of the old style of two-pass and not aware of the modern AV1 2-pass seeking efficiently for key-frames and such rather than ffmpeg being in control and doing a dead-headed temporary full conversion in two separate passes, maybe?

[[I know, sorry, bad formatting, but posting now in-case anyone is interested and spending time elsewhere]]
Comment 13 7e3 2023-09-28 00:43:04 UTC
Well, still more testing has proven my ramblings above both possibly partly correct, but mostly wrong.  Interestingly, the ffmpeg 6.X + libaom 3.6 wasn't a fix as predicted.

The same problem exists with ffmpeg 6.0.0-r2 + libaom 3.6.1 on amd64.
I used an old script with options for keyframe lookahead, various thresholds, multithreaded, with manual rows+columns processing and constant quality settings parsed from ffmpeg to cause the issue.

So might be a libaom bug with certain settings, possibly nonsensicle settings being allowed from ffmpeg, or libaom allowing nonsensicle combinations.  

More testing to isolate...
Comment 14 7e3 2023-09-28 15:49:54 UTC
Latest tests suggest that most libaom versions after 3.5.0 all the way through to 3.7.0 are just buggy.  There is definitely something I'm missing here.

I'm reminding myself why I threw out this investigation months ago.
All my contortions of AV1 scripts for ffmpeg work using 3.5.0 or 3.7.0

What ever happened in the 3.5.1-3.6.X series had something very hard to pin down (at least for my testing).  My latest success with 3.6.X shows reducing complexity of threads and blocking *might* solve the issue but that reduces the processing speed to unusable, so hard to even know, yet.

I would formally file a bug if I could pin it down, but it's such an esoteric fail mode that it seems only to occur with me and a few others, and possibly with Gentoo.  Surely there are more people using libaom via ffmpeg out there??  I don't really want to file it with upstream without knowing that is the root cause.

I think my comment#13 is close to the answer but I'm still baffled by how to properly fix this one.  It's definitely buried deep in the libaom settings about 2-pass vs. ffmpeg traditional 2-pass combined with how it figures keyframes, but more detail beyond that could be quite the lengthy investigation.

If anyone is actually following this verbose rambling, I suggest simply avoid libaom-3.6.X and jump to 3.7.0 as it has significant gains over 3.5.0 if you specifically wish to use libaom.

Did you get any further "bzipitidoo@gmail.com"?
Comment 15 bzipitidoo 2023-09-28 23:34:07 UTC
(In reply to 7e3 from comment #14)

Patience!

I have upgraded to ffmpeg 6.0-r6 and libaom 3.7.0, and for the heck of it, started the same job I ran with ffmpeg 4 and 3.6.1.   Will take another 2 to 3 days to find out if it gets further than the previous run.
Comment 16 bzipitidoo 2023-09-30 16:52:49 UTC
I can now report that ffmpeg 6 with libaom 3.7.0 has progressed past the point where ffmpeg 4 with libaom 3.6.1 aborted.  So perhaps the bug is fixed.  I estimate it will take another 9 days to finish transcoding this 48 minute MPEG4 file.
Comment 17 7e3 2023-10-01 04:31:19 UTC
Yeah, good to be cautious on how the success is reported!
I too saw progression past previous fail points, and in the case of the first _four_ video files all completed with no errors...

But since, I've found one file, my 5th test, that returns the same error using ffmpeg-6.0-r6+libaom-3.7.0 at the precise same point in the file, somewhere about half way on a ~30m HD stream,

I've since been on the bugfix list upstream and found such weird behaviour that is dependent on load of other processes.

https://bugs.chromium.org/p/aomedia/issues/detail?id=3284

So strange bugs that appear when using:
"Errors when encoding under high CPU load with -threads set to any value but 1"

I've also seen that certain files can consistently evoke various errors and some nearly identical files do not.  It seems completely spurious.

Difficult testing to do when it takes some systems more than a week per testcase, phew.  More difficult to test is the fact that some files exhibit the behaviour while other similar files do not, and even then, between versions of the software seems to behave differently on different files.

I'm starting to think it's the format of the video input.  Currently I've tested predominantly with DVB-T .ts with both "mpeg2video" according to ffmpeg (viz. h262?) and h264 but considering ffmpeg abstracts away that complexity it would probably require switching to aomenc.

What a bizarre bug.  Summary: still NOT fixed.
Comment 18 bzipitidoo 2023-10-05 23:09:58 UTC
Finally done, and success!  

time ffmpeg -i 10144-p3_H264_1200kbps_AAC_und_ch2_64kbps.mp4 -c:v av1 -b:v 100k -c:a libopus -b:a 16k p3v3.webm
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12 (Gentoo 12.3.1_p20230526 p2)

...

[libaom-av1 @ 0x56270ee9b250] v3.7.0

...

frame=86854 fps=0.1 q=0.0 Lsize=   44259kB time=00:48:18.05 bitrate= 125.1kbits/s speed=0.00465x
video:37173kB audio:5862kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.845705%

real	10380m37.363s
user	16842m31.428s
sys	4m48.813s
Comment 19 7e3 2023-10-23 00:28:24 UTC
I can confirm that 3.7.X is _far_ less prone to failure than the 3.6.X series.

The fail rate for aomenc is low enough that it's usable again with 3.7.X but annoying enough to consider SVT.  I've not seen the SVT encoder fail; it performs better in all ways over aomenc.

In mostly samples of a few minutes to a few hours transport streams from DVB-T, the two pass fail scenario is in the lower-single-digit-percentage-of-the-time in 3.7.X rather than the usual behaviour of 3.6.X

I've isolated fail segments and would start providing samples upstream but I'm uncertain about the copyright of the free-to-air transport streams.  Interestingly it seems to occur highly complex, yet similar, scene changes in certain types of 720p programming viz. video style and not in others, but I could be reading into it too deeply as it's difficult to isolate and I'd need many more samples to draw and firm conclusion.

Another point, is that it is a consistent fail point, a file that fails will always fail at the same point, one that processes will consistently finish.
Comment 20 7e3 2023-10-25 02:42:58 UTC
Any one want to add some theories or even wild guesses at what might be productive to look at next.  It looks like I'm one of the few looking at this for a variety of factors.

I found my next failed video example yesterday and it seems consistent with a pattern.

I once again found an error in a 720p mpeg2video (H.262?) transport stream of a documentary and both times it happens when presenting similar artwork of a subtle and similar complexity but related scenes where it jumps from one artwork to a different artwork.  Lots of subtle but related video changes, similar palette, similar layout, saturation etc.  So who knows what is happening in 'yuv' land, but I think I remember the theory of auto calculation of keyframes is important with relation to the yuv statistics.  Maybe related to this?

I also, *think*, that the aomenc changes that were causing a lot of errors that were similarly anomalous, like some I came across on the forums, talked about above, were often about similar new technology changes in the placement and automatic estimation of placement for keyframes.

Another wild extrapolation I've thought of might be the key difference is the H.262 nature of blocking and how that differs to the modern AV1 codec.  Though, saying that I have reduced the tiles, columns and processor threading of aomenc to try to isolate that away without any noticeable correlation.

Not sure this bug-thread is staying relevant to Gentoo specifically, mostly upsteam but I'm also not sure how many others are encountering this error or even have exposure to it in the first place.  I might be commenting in a mostly lonely echo-chamber.
Comment 21 Larry the Git Cow gentoo-dev 2024-02-02 00:08:09 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6d3e8e731cc4f8c341762c5d15569ae19cf4d471

commit 6d3e8e731cc4f8c341762c5d15569ae19cf4d471
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-02-02 00:06:53 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-02-02 00:06:53 +0000

    media-libs/libaom: build without asserts by default
    
    Both to follow upstream recommendations (see bug #921438 -- I did also have a better
    source for this than the README but I can't find it now, or maybe it's changed
    in the meantime), but also to avoid asserts firing during daily use which don't
    seem to be bothering anybody upstream (bug #914614). Not ideal but it's been reported
    already and went nowhere.
    
    Closes: https://bugs.gentoo.org/914614
    Closes: https://bugs.gentoo.org/921438
    Signed-off-by: Sam James <sam@gentoo.org>

 media-libs/libaom/libaom-3.8.0-r1.ebuild | 148 +++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)