Recherche avancée

Médias (0)

Mot : - Tags -/signalement

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (94)

  • Contribute to documentation

    13 avril 2011

    Documentation is vital to the development of improved technical capabilities.
    MediaSPIP welcomes documentation by users as well as developers - including : critique of existing features and functions articles contributed by developers, administrators, content producers and editors screenshots to illustrate the above translations of existing documentation into other languages
    To contribute, register to the project users’ mailing (...)

  • Use, discuss, criticize

    13 avril 2011, par

    Talk to people directly involved in MediaSPIP’s development, or to people around you who could use MediaSPIP to share, enhance or develop their creative projects.
    The bigger the community, the more MediaSPIP’s potential will be explored and the faster the software will evolve.
    A discussion list is available for all exchanges between users.

  • Sélection de projets utilisant MediaSPIP

    29 avril 2011, par

    Les exemples cités ci-dessous sont des éléments représentatifs d’usages spécifiques de MediaSPIP pour certains projets.
    Vous pensez avoir un site "remarquable" réalisé avec MediaSPIP ? Faites le nous savoir ici.
    Ferme MediaSPIP @ Infini
    L’Association Infini développe des activités d’accueil, de point d’accès internet, de formation, de conduite de projets innovants dans le domaine des Technologies de l’Information et de la Communication, et l’hébergement de sites. Elle joue en la matière un rôle unique (...)

Sur d’autres sites (7541)

  • Using FFmpeg to stitch together H.264 videos and variably-spaced JPEG pictures ; dealing with ffmpeg warnings

    19 octobre 2022, par LB2

    Context

    


    I have a process flow that may output either H.264 Annex B streams, variably-spaced JPEGs, or a mixture of two. By variably-spaced I mean where elapsed time between any two adjacent JPEGs may (and likely to be) different from any other two adjacent JPEGs. So an example of possible inputs are :

    


      

    1. stream1.h264
    2. 


    3. {Set of JPEGs}
    4. 


    5. stream1.h264 + stream2.h264
    6. 


    7. stream1.h264 + {Set of JPEGs}
    8. 


    9. stream1.h264 + {Set of JPEGs} + stream2.h264
    10. 


    11. stream1.h264 + {Set of JPEGs} + stream2.h264 + {Set of JPEGs} + ...
    12. 


    13. stream1.h264 + stream2.h264 + {Set of JPEGs} + ...
    14. 


    


    The output needs to be a single stitched (i.e. concatenated) output in MPEG-4 container.

    


    Requirements : No re-encoding or transcoding of existing video compression (One time conversion of JPEG sets to video format is OKay).

    


    Solution Prototype

    


    To prototype the solution I have found that ffmpeg has concat demuxer that would let me specify an ordered sequence of inputs that ffmpeg would then concatenate together, but all inputs must be of the same format. So, to meet that requirement, I :

    


      

    1. Convert every JPEG set to an .mp4 using concat (and using delay # directive to specify time-spacing between each JPEG)
    2. 


    3. Convert every .h264 to .mp4 using -c copy to avoid transcoding.
    4. 


    5. Stitch all generated interim .mp4 files into the single final .mp4 using -f concat and -c copy.
    6. 


    


    Here's the bash script, in parts, that performs the above :

    


      

    1. Ignore the curl comment ; that's from originally generating a 100 jpeg images with numbers and these are simply saved locally. What the loop does is it generates concat input file with file sequence#.jpeg directives and duration # directive where each successive JPEG delay is incremented by 0.1 seconds (0.1 between first and second, 0.2 b/w 2nd and 3rd, 0.3 b/w 3rd and 4th, and so on). Then it runs ffmpeg command to convert the set of JPEGs to .mp4 interim file.

      


      echo "ffconcat version 1.0" >ffconcat-jpeg.txt
echo >>ffconcat-jpeg.txt

for i in {1..100}
do
    echo "file $i.jpeg" >>ffconcat-jpeg.txt
    d=$(echo "$i" | awk '{printf "%f", $1 / 10}')
    # d=$(echo "scale=2; $i/10" | bc)
    echo "duration $d" >>ffconcat-jpeg.txt
    echo "" >>ffconcat-jpeg.txt
    # curl -o "$i.jpeg" "https://math.tools/equation/get_equaimages?equation=$i&fontsize=256"
done

ffmpeg \
    -hide_banner \
    -vsync vfr \
    -f concat \
    -i ffconcat-jpeg.txt \
    -r 30 \
    -video_track_timescale 90000 \
    video-jpeg.mp4


      


    2. 


    3. Convert two streams from .h264 to .mp4 via copy (no transcoding).

      


      ffmpeg \
    -hide_banner \
    -i low-motion-video.h264 \
    -c copy \
    -vsync vfr \
    -video_track_timescale 90000 \
    low-motion-video.mp4

ffmpeg \
    -hide_banner \
    -i full-video.h264 \
    -c copy \
    -video_track_timescale 90000 \
    -vsync vfr \
    full-video.mp4


      


    4. 


    5. Stitch all together by generating another concat directive file.

      


      echo "ffconcat version 1.0" >ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file low-motion-video.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file full-video.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt
echo "file video-jpeg.mp4" >>ffconcat-h264.txt
echo >>ffconcat-h264.txt

ffmpeg \
    -hide_banner \
    -f concat \
    -i ffconcat-h264.txt \
    -pix_fmt yuv420p \
    -c copy \
    -video_track_timescale 90000 \
    -vsync vfr \
    video-out.mp4



      


    6. 


    


    Problem (and attempted troubleshooting)

    


    The above does produce a reasonable output — it plays first video, then plays second video with no timing/rate issues AFAICT, then plays JPEGs with time between each JPEG "frame" growing successively, as expected.

    


    But, the conversion process produces warnings that concern me (for compatibility with players ; or potentially other IRL streams that may result in some issue my prototyping content doesn't make obvious). Initial attempts generated 100s of warnings, but with some arguments added, I reduced it down to just a handful, but this handful is stubborn and nothing I tried would help.

    


    The first conversion of JPEGs to .mp4 goes fine with the following output :

    


    Input #0, concat, from 'ffconcat-jpeg.txt':
  Duration: 00:08:25.00, start: 0.000000, bitrate: 0 kb/s
  Stream #0:0: Video: png, pal8(pc), 176x341 [SAR 3780:3780 DAR 16:31], 25 fps, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x7fe418008e00] using SAR=1/1
[libx264 @ 0x7fe418008e00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
[libx264 @ 0x7fe418008e00] profile High 4:4:4 Predictive, level 1.3, 4:4:4, 8-bit
[libx264 @ 0x7fe418008e00] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=4 threads=11 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'video-jpeg.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 176x341 [SAR 1:1 DAR 16:31], q=2-31, 30 fps, 90k tbn
    Metadata:
      encoder         : Lavc58.134.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=  100 fps=0.0 q=-1.0 Lsize=     157kB time=00:07:55.33 bitrate=   2.7kbits/s speed=2.41e+03x    
video:155kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.800846%
[libx264 @ 0x7fe418008e00] frame I:1     Avg QP:20.88  size:   574
[libx264 @ 0x7fe418008e00] frame P:43    Avg QP:14.96  size:  2005
[libx264 @ 0x7fe418008e00] frame B:56    Avg QP:21.45  size:  1266
[libx264 @ 0x7fe418008e00] consecutive B-frames: 14.0% 24.0% 30.0% 32.0%
[libx264 @ 0x7fe418008e00] mb I  I16..4: 36.4% 55.8%  7.9%
[libx264 @ 0x7fe418008e00] mb P  I16..4:  5.1%  7.5% 11.2%  P16..4:  5.6%  8.1%  4.5%  0.0%  0.0%    skip:57.9%
[libx264 @ 0x7fe418008e00] mb B  I16..4:  2.4%  0.9%  3.9%  B16..8: 16.2%  8.8%  4.6%  direct: 1.2%  skip:62.0%  L0:56.6% L1:38.7% BI: 4.7%
[libx264 @ 0x7fe418008e00] 8x8 transform intra:28.3% inter:3.7%
[libx264 @ 0x7fe418008e00] coded y,u,v intra: 26.5% 0.0% 0.0% inter: 9.8% 0.0% 0.0%
[libx264 @ 0x7fe418008e00] i16 v,h,dc,p: 82% 13%  4%  0%
[libx264 @ 0x7fe418008e00] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20%  8% 71%  1%  0%  0%  0%  0%  0%
[libx264 @ 0x7fe418008e00] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 41% 11% 29%  4%  2%  3%  1%  7%  1%
[libx264 @ 0x7fe418008e00] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x7fe418008e00] ref P L0: 44.1%  4.2% 28.4% 23.3%
[libx264 @ 0x7fe418008e00] ref B L0: 56.2% 32.1% 11.6%
[libx264 @ 0x7fe418008e00] ref B L1: 92.4%  7.6%
[libx264 @ 0x7fe418008e00] kb/s:2.50


    


    The conversion of individual streams from .h264 to .mp4 generates two types of warnings each. One is [mp4 @ 0x7faee3040400] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly, and the other is [mp4 @ 0x7faee3040400] pts has no value.

    


    Some posts on SO (can't find my original finds on that now) suggested that it's safe to ignore and comes from H.264 being an elementary stream that supposedly doesn't contain timestamps. It surprises me a bit since I produce that stream using NVENC API and clearly supply timing information for each frame via PIC_PARAMS structure : NV_STRUCT(PIC_PARAMS, pp); ...; pp.inputTimeStamp = _frameIndex++ * (H264_CLOCK_RATE / _params.frameRate);, where #define H264_CLOCK_RATE 9000 and _params.frameRate = 30.

    


    Input #0, h264, from 'low-motion-video.h264':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], 30 fps, 30 tbr, 1200k tbn, 60 tbc
Output #0, mp4, to 'low-motion-video.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 30 fps, 30 tbr, 90k tbn, 1200k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mp4 @ 0x7faee3040400] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 @ 0x7faee3040400] pts has no value
[mp4 @ 0x7faee3040400] pts has no value0kB time=-00:00:00.03 bitrate=N/A speed=N/A    
    Last message repeated 17985 times
frame=17987 fps=0.0 q=-1.0 Lsize=   79332kB time=00:09:59.50 bitrate=1084.0kbits/s speed=1.59e+03x    
video:79250kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.103804%
Input #0, h264, from 'full-video.h264':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], 30 fps, 30 tbr, 1200k tbn, 60 tbc
Output #0, mp4, to 'full-video.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 30 fps, 30 tbr, 90k tbn, 1200k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mp4 @ 0x7f9381864600] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 @ 0x7f9381864600] pts has no value
[mp4 @ 0x7f9381864600] pts has no value0kB time=-00:00:00.03 bitrate=N/A speed=N/A    
    Last message repeated 17981 times
frame=17983 fps=0.0 q=-1.0 Lsize=   52976kB time=00:09:59.36 bitrate= 724.1kbits/s speed=1.33e+03x    
video:52893kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.156232%


    


    But the most worrisome error for me is from stitching together all interim .mp4 files into one :

    


    [mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f9ff2010e00] Auto-inserting h264_mp4toannexb bitstream filter
Input #0, concat, from 'ffconcat-h264.txt':
  Duration: N/A, bitrate: 1082 kb/s
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1440x3040 [SAR 1:1 DAR 9:19], 1082 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Output #0, mp4, to 'video-out.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1440x3040 [SAR 1:1 DAR 9:19], q=2-31, 1082 kb/s, 30 fps, 30 tbr, 90k tbn, 90k tbc
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f9fe1009c00] Auto-inserting h264_mp4toannexb bitstream filter
[mp4 @ 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 53954460, current: 53954460; changing to 53954461. This may result in incorrect timestamps in the output file.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f9fd1008a00] Auto-inserting h264_mp4toannexb bitstream filter
[mp4 @ 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 107900521, current: 107874150; changing to 107900522. This may result in incorrect timestamps in the output file.
[mp4 @ 0x7f9ff2023400] Non-monotonous DTS in output stream 0:0; previous: 107900522, current: 107886150; changing to 107900523. This may result in incorrect timestamps in the output file.
frame=36070 fps=0.0 q=-1.0 Lsize=  132464kB time=00:27:54.26 bitrate= 648.1kbits/s speed=6.54e+03x    
video:132296kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.126409%


    


    I'm not sure how to deal with those non-monotonous DTS errors, and no matter what I try, nothing budges. I analyzed the interim .mp4 files using ffprobe -show_frames and found that the last frame of each interim .mp4 does not have DTS, while previous frames do. E.g. :

    


    ...
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=53942461
pkt_pts_time=599.360678
pkt_dts=53942461
pkt_dts_time=599.360678
best_effort_timestamp=53942461
best_effort_timestamp_time=599.360678
pkt_duration=3600
pkt_duration_time=0.040000
pkt_pos=54161377
pkt_size=1034
width=1440
height=3040
pix_fmt=yuv420p
sample_aspect_ratio=1:1
pict_type=B
coded_picture_number=17982
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=53927461
pkt_pts_time=599.194011
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=53927461
...


    


    My guess is that as concat demuxer reads in (or elsewhere in ffmpeg's conversion pipeline), for the last frame it sees no DTS set, and produces a virtual value equal to the last seen. Then further in pipeline it consumes this input, sees that DTS value is being repeated, issues a warning and offsets it with increment by one, which might be somewhat nonsensical/unrealistic timing value.

    


    I tried using -fflags +genpts as suggested in this SO answer, but that doesn't change anything.

    


    Per yet other posts suggesting issue being with incompatible tbn and tbc values and possible timebase issues, I tried adding -time_base 1:90000 and -enc_time_base 1:90000 and -copytb 1 and nothing budges. The -video_track_timescale 90000 is there b/c it helped reduce those DTS warnings from 100s down to 3, but doesn't eliminate them all.

    


    Question

    


    What is missing and how can I get ffmpeg to perform conversions without these warnings, to be sure it produces proper, well-formed output ?

    


  • Getting green screen in ffplay : Streaming desktop (DirectX surface) as H264 video over RTP stream using Live555

    7 novembre 2019, par Ram

    I’m trying to stream the desktop(DirectX surface in NV12 format) as H264 video over RTP stream using Live555 & Windows media foundation’s hardware encoder on Windows10, and expecting it to be rendered by ffplay (ffmpeg 4.2). But only getting a green screen like below,

    enter image description here

    enter image description here

    enter image description here

    enter image description here

    I referred MFWebCamToRTP mediafoundation-sample & Encoding DirectX surface using hardware MFT for implementing live555’s FramedSource and changing the input source to DirectX surface instead of webCam.

    Here is an excerpt of my implementation for Live555’s doGetNextFrame callback to feed input samples from directX surface :

    virtual void doGetNextFrame()
    {
       if (!_isInitialised)
       {
           if (!initialise()) {
               printf("Video device initialisation failed, stopping.");
               return;
           }
           else {
               _isInitialised = true;
           }
       }

       //if (!isCurrentlyAwaitingData()) return;

       DWORD processOutputStatus = 0;
       HRESULT mftProcessOutput = S_OK;
       MFT_OUTPUT_STREAM_INFO StreamInfo;
       IMFMediaBuffer *pBuffer = NULL;
       IMFSample *mftOutSample = NULL;
       DWORD mftOutFlags;
       bool frameSent = false;
       bool bTimeout = false;

       // Create sample
       CComPtr<imfsample> videoSample = NULL;

       // Create buffer
       CComPtr<imfmediabuffer> inputBuffer;
       // Get next event
       CComPtr<imfmediaevent> event;
       HRESULT hr = eventGen->GetEvent(0, &amp;event);
       CHECK_HR(hr, "Failed to get next event");

       MediaEventType eventType;
       hr = event->GetType(&amp;eventType);
       CHECK_HR(hr, "Failed to get event type");


       switch (eventType)
       {
       case METransformNeedInput:
           {
               hr = MFCreateDXGISurfaceBuffer(__uuidof(ID3D11Texture2D), surface, 0, FALSE, &amp;inputBuffer);
               CHECK_HR(hr, "Failed to create IMFMediaBuffer");

               hr = MFCreateSample(&amp;videoSample);
               CHECK_HR(hr, "Failed to create IMFSample");
               hr = videoSample->AddBuffer(inputBuffer);
               CHECK_HR(hr, "Failed to add buffer to IMFSample");

               if (videoSample)
               {
                   _frameCount++;

                   CHECK_HR(videoSample->SetSampleTime(mTimeStamp), "Error setting the video sample time.\n");
                   CHECK_HR(videoSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error getting video sample duration.\n");

                   // Pass the video sample to the H.264 transform.

                   hr = _pTransform->ProcessInput(inputStreamID, videoSample, 0);
                   CHECK_HR(hr, "The resampler H264 ProcessInput call failed.\n");

                   mTimeStamp += VIDEO_FRAME_DURATION;
               }
           }

           break;

       case METransformHaveOutput:

           {
               CHECK_HR(_pTransform->GetOutputStatus(&amp;mftOutFlags), "H264 MFT GetOutputStatus failed.\n");

               if (mftOutFlags == MFT_OUTPUT_STATUS_SAMPLE_READY)
               {
                   MFT_OUTPUT_DATA_BUFFER _outputDataBuffer;
                   memset(&amp;_outputDataBuffer, 0, sizeof _outputDataBuffer);
                   _outputDataBuffer.dwStreamID = outputStreamID;
                   _outputDataBuffer.dwStatus = 0;
                   _outputDataBuffer.pEvents = NULL;
                   _outputDataBuffer.pSample = nullptr;

                   mftProcessOutput = _pTransform->ProcessOutput(0, 1, &amp;_outputDataBuffer, &amp;processOutputStatus);

                   if (mftProcessOutput != MF_E_TRANSFORM_NEED_MORE_INPUT)
                   {
                       if (_outputDataBuffer.pSample) {

                           //CHECK_HR(_outputDataBuffer.pSample->SetSampleTime(mTimeStamp), "Error setting MFT sample time.\n");
                           //CHECK_HR(_outputDataBuffer.pSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error setting MFT sample duration.\n");

                           IMFMediaBuffer *buf = NULL;
                           DWORD bufLength;
                           CHECK_HR(_outputDataBuffer.pSample->ConvertToContiguousBuffer(&amp;buf), "ConvertToContiguousBuffer failed.\n");
                           CHECK_HR(buf->GetCurrentLength(&amp;bufLength), "Get buffer length failed.\n");
                           BYTE * rawBuffer = NULL;

                           fFrameSize = bufLength;
                           fDurationInMicroseconds = 0;
                           gettimeofday(&amp;fPresentationTime, NULL);

                           buf->Lock(&amp;rawBuffer, NULL, NULL);
                           memmove(fTo, rawBuffer, fFrameSize);

                           FramedSource::afterGetting(this);

                           buf->Unlock();
                           SafeRelease(&amp;buf);

                           frameSent = true;
                           _lastSendAt = GetTickCount();

                           _outputDataBuffer.pSample->Release();
                       }

                       if (_outputDataBuffer.pEvents)
                           _outputDataBuffer.pEvents->Release();
                   }

                   //SafeRelease(&amp;pBuffer);
                   //SafeRelease(&amp;mftOutSample);

                   break;
               }
           }

           break;
       }

       if (!frameSent)
       {
           envir().taskScheduler().triggerEvent(eventTriggerId, this);
       }

       return;

    done:

       printf("MediaFoundationH264LiveSource doGetNextFrame failed.\n");
       envir().taskScheduler().triggerEvent(eventTriggerId, this);
    }
    </imfmediaevent></imfmediabuffer></imfsample>

    Initialise method :

    bool initialise()
    {
       HRESULT hr;
       D3D11_TEXTURE2D_DESC desc = { 0 };

       HDESK CurrentDesktop = nullptr;
       CurrentDesktop = OpenInputDesktop(0, FALSE, GENERIC_ALL);
       if (!CurrentDesktop)
       {
           // We do not have access to the desktop so request a retry
           return false;
       }

       // Attach desktop to this thread
       bool DesktopAttached = SetThreadDesktop(CurrentDesktop) != 0;
       CloseDesktop(CurrentDesktop);
       CurrentDesktop = nullptr;
       if (!DesktopAttached)
       {
           printf("SetThreadDesktop failed\n");
       }

       UINT32 activateCount = 0;

       // h264 output
       MFT_REGISTER_TYPE_INFO info = { MFMediaType_Video, MFVideoFormat_H264 };

       UINT32 flags =
           MFT_ENUM_FLAG_HARDWARE |
           MFT_ENUM_FLAG_SORTANDFILTER;

       // ------------------------------------------------------------------------
       // Initialize D3D11
       // ------------------------------------------------------------------------

       // Driver types supported
       D3D_DRIVER_TYPE DriverTypes[] =
       {
           D3D_DRIVER_TYPE_HARDWARE,
           D3D_DRIVER_TYPE_WARP,
           D3D_DRIVER_TYPE_REFERENCE,
       };
       UINT NumDriverTypes = ARRAYSIZE(DriverTypes);

       // Feature levels supported
       D3D_FEATURE_LEVEL FeatureLevels[] =
       {
           D3D_FEATURE_LEVEL_11_0,
           D3D_FEATURE_LEVEL_10_1,
           D3D_FEATURE_LEVEL_10_0,
           D3D_FEATURE_LEVEL_9_1
       };
       UINT NumFeatureLevels = ARRAYSIZE(FeatureLevels);

       D3D_FEATURE_LEVEL FeatureLevel;

       // Create device
       for (UINT DriverTypeIndex = 0; DriverTypeIndex &lt; NumDriverTypes; ++DriverTypeIndex)
       {
           hr = D3D11CreateDevice(nullptr, DriverTypes[DriverTypeIndex], nullptr,
               D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
               FeatureLevels, NumFeatureLevels, D3D11_SDK_VERSION, &amp;device, &amp;FeatureLevel, &amp;context);
           if (SUCCEEDED(hr))
           {
               // Device creation success, no need to loop anymore
               break;
           }
       }

       CHECK_HR(hr, "Failed to create device");

       // Create device manager
       UINT resetToken;
       hr = MFCreateDXGIDeviceManager(&amp;resetToken, &amp;deviceManager);
       CHECK_HR(hr, "Failed to create DXGIDeviceManager");

       hr = deviceManager->ResetDevice(device, resetToken);
       CHECK_HR(hr, "Failed to assign D3D device to device manager");


       // ------------------------------------------------------------------------
       // Create surface
       // ------------------------------------------------------------------------
       desc.Format = DXGI_FORMAT_NV12;
       desc.Width = surfaceWidth;
       desc.Height = surfaceHeight;
       desc.MipLevels = 1;
       desc.ArraySize = 1;
       desc.SampleDesc.Count = 1;

       hr = device->CreateTexture2D(&amp;desc, NULL, &amp;surface);
       CHECK_HR(hr, "Could not create surface");

       hr = MFTEnumEx(
           MFT_CATEGORY_VIDEO_ENCODER,
           flags,
           NULL,
           &amp;info,
           &amp;activateRaw,
           &amp;activateCount
       );
       CHECK_HR(hr, "Failed to enumerate MFTs");

       CHECK(activateCount, "No MFTs found");

       // Choose the first available encoder
       activate = activateRaw[0];

       for (UINT32 i = 0; i &lt; activateCount; i++)
           activateRaw[i]->Release();

       // Activate
       hr = activate->ActivateObject(IID_PPV_ARGS(&amp;_pTransform));
       CHECK_HR(hr, "Failed to activate MFT");

       // Get attributes
       hr = _pTransform->GetAttributes(&amp;attributes);
       CHECK_HR(hr, "Failed to get MFT attributes");

       // Unlock the transform for async use and get event generator
       hr = attributes->SetUINT32(MF_TRANSFORM_ASYNC_UNLOCK, TRUE);
       CHECK_HR(hr, "Failed to unlock MFT");

       eventGen = _pTransform;
       CHECK(eventGen, "Failed to QI for event generator");

       // Get stream IDs (expect 1 input and 1 output stream)
       hr = _pTransform->GetStreamIDs(1, &amp;inputStreamID, 1, &amp;outputStreamID);
       if (hr == E_NOTIMPL)
       {
           inputStreamID = 0;
           outputStreamID = 0;
           hr = S_OK;
       }
       CHECK_HR(hr, "Failed to get stream IDs");

        // ------------------------------------------------------------------------
       // Configure hardware encoder MFT
      // ------------------------------------------------------------------------
       CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast(deviceManager.p)), "Failed to set device manager.\n");

       // Set low latency hint
       hr = attributes->SetUINT32(MF_LOW_LATENCY, TRUE);
       CHECK_HR(hr, "Failed to set MF_LOW_LATENCY");

       hr = MFCreateMediaType(&amp;outputType);
       CHECK_HR(hr, "Failed to create media type");

       hr = outputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
       CHECK_HR(hr, "Failed to set MF_MT_MAJOR_TYPE on H264 output media type");

       hr = outputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
       CHECK_HR(hr, "Failed to set MF_MT_SUBTYPE on H264 output media type");

       hr = outputType->SetUINT32(MF_MT_AVG_BITRATE, TARGET_AVERAGE_BIT_RATE);
       CHECK_HR(hr, "Failed to set average bit rate on H264 output media type");

       hr = MFSetAttributeSize(outputType, MF_MT_FRAME_SIZE, desc.Width, desc.Height);
       CHECK_HR(hr, "Failed to set frame size on H264 MFT out type");

       hr = MFSetAttributeRatio(outputType, MF_MT_FRAME_RATE, TARGET_FRAME_RATE, 1);
       CHECK_HR(hr, "Failed to set frame rate on H264 MFT out type");

       hr = outputType->SetUINT32(MF_MT_INTERLACE_MODE, 2);
       CHECK_HR(hr, "Failed to set MF_MT_INTERLACE_MODE on H.264 encoder MFT");

       hr = outputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE);
       CHECK_HR(hr, "Failed to set MF_MT_ALL_SAMPLES_INDEPENDENT on H.264 encoder MFT");

       hr = _pTransform->SetOutputType(outputStreamID, outputType, 0);
       CHECK_HR(hr, "Failed to set output media type on H.264 encoder MFT");

       hr = MFCreateMediaType(&amp;inputType);
       CHECK_HR(hr, "Failed to create media type");

       for (DWORD i = 0;; i++)
       {
           inputType = nullptr;
           hr = _pTransform->GetInputAvailableType(inputStreamID, i, &amp;inputType);
           CHECK_HR(hr, "Failed to get input type");

           hr = inputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
           CHECK_HR(hr, "Failed to set MF_MT_MAJOR_TYPE on H264 MFT input type");

           hr = inputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_NV12);
           CHECK_HR(hr, "Failed to set MF_MT_SUBTYPE on H264 MFT input type");

           hr = MFSetAttributeSize(inputType, MF_MT_FRAME_SIZE, desc.Width, desc.Height);
           CHECK_HR(hr, "Failed to set MF_MT_FRAME_SIZE on H264 MFT input type");

           hr = MFSetAttributeRatio(inputType, MF_MT_FRAME_RATE, TARGET_FRAME_RATE, 1);
           CHECK_HR(hr, "Failed to set MF_MT_FRAME_RATE on H264 MFT input type");

           hr = _pTransform->SetInputType(inputStreamID, inputType, 0);
           CHECK_HR(hr, "Failed to set input type");

           break;
       }

       CheckHardwareSupport();

       CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, NULL), "Failed to process FLUSH command on H.264 MFT.\n");
       CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL), "Failed to process BEGIN_STREAMING command on H.264 MFT.\n");
       CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, NULL), "Failed to process START_OF_STREAM command on H.264 MFT.\n");

       return true;

    done:

       printf("MediaFoundationH264LiveSource initialisation failed.\n");
       return false;
    }


       HRESULT CheckHardwareSupport()
       {
           IMFAttributes *attributes;
           HRESULT hr = _pTransform->GetAttributes(&amp;attributes);
           UINT32 dxva = 0;

           if (SUCCEEDED(hr))
           {
               hr = attributes->GetUINT32(MF_SA_D3D11_AWARE, &amp;dxva);
           }

           if (SUCCEEDED(hr))
           {
               hr = attributes->SetUINT32(CODECAPI_AVDecVideoAcceleration_H264, TRUE);
           }

    #if defined(CODECAPI_AVLowLatencyMode) // Win8 only

           hr = _pTransform->QueryInterface(IID_PPV_ARGS(&amp;mpCodecAPI));

           if (SUCCEEDED(hr))
           {
               VARIANT var = { 0 };

               // FIXME: encoder only
               var.vt = VT_UI4;
               var.ulVal = 0;

               hr = mpCodecAPI->SetValue(&amp;CODECAPI_AVEncMPVDefaultBPictureCount, &amp;var);

               var.vt = VT_BOOL;
               var.boolVal = VARIANT_TRUE;
               hr = mpCodecAPI->SetValue(&amp;CODECAPI_AVEncCommonLowLatency, &amp;var);
               hr = mpCodecAPI->SetValue(&amp;CODECAPI_AVEncCommonRealTime, &amp;var);

               hr = attributes->SetUINT32(CODECAPI_AVLowLatencyMode, TRUE);

               if (SUCCEEDED(hr))
               {
                   var.vt = VT_UI4;
                   var.ulVal = eAVEncCommonRateControlMode_Quality;
                   hr = mpCodecAPI->SetValue(&amp;CODECAPI_AVEncCommonRateControlMode, &amp;var);

                   // This property controls the quality level when the encoder is not using a constrained bit rate. The AVEncCommonRateControlMode property determines whether the bit rate is constrained.
                   VARIANT quality;
                   InitVariantFromUInt32(50, &amp;quality);
                   hr = mpCodecAPI->SetValue(&amp;CODECAPI_AVEncCommonQuality, &amp;quality);
               }
           }
    #endif

           return hr;
       }

    ffplay command :

    ffplay -protocol_whitelist file,udp,rtp -i test.sdp -x 800 -y 600 -profile:v baseline

    SDP :

    v=0
    o=- 0 0 IN IP4 127.0.0.1
    s=No Name
    t=0 0
    c=IN IP4 127.0.0.1
    m=video 1234 RTP/AVP 96
    a=rtpmap:96 H264/90000
    a=fmtp:96 packetization-mode=1

    I don’t know what am I missing, I have been trying to fix this for almost a week without any progress, and tried almost everything I could. Also, the online resources for encoding a DirectX surface as video are very limited.

    Any help would be appreciated.

  • increasing memory occupancy while recording screen save to disk with ffmpeg [on hold]

    19 mai 2016, par vbtang

    wonder if there are any resoures i didn’t free ?or do i need to do something special so that i can free these momery ?

    ps. run the demo step by step, and found it has the same problem, until it quit the main function, it still have 40MB memory occupancy. And i found the memory increase obviously in the screen capture thread,but when it increased to about 150MB, it won’t increase, until i quit the program, it will have 40MB memory left. i feel confused.

    pss. i download ffmpeg dev and shared version form here https://ffmpeg.zeranoe.com/builds/,it seems a dll of debug version ? do i need a release version ?
    here is my demo code.

       #include "stdafx.h"

    #ifdef  __cplusplus
    extern "C"
    {
    #endif
    #include "libavcodec/avcodec.h"
    #include "libavformat/avformat.h"
    #include "libswscale/swscale.h"
    #include "libavdevice/avdevice.h"
    #include "libavutil/audio_fifo.h"

    #pragma comment(lib, "avcodec.lib")
    #pragma comment(lib, "avformat.lib")
    #pragma comment(lib, "avutil.lib")
    #pragma comment(lib, "avdevice.lib")
    #pragma comment(lib, "avfilter.lib")

    //#pragma comment(lib, "avfilter.lib")
    //#pragma comment(lib, "postproc.lib")
    //#pragma comment(lib, "swresample.lib")
    #pragma comment(lib, "swscale.lib")
    #ifdef __cplusplus
    };
    #endif

    AVFormatContext *pFormatCtx_Video = NULL, *pFormatCtx_Audio = NULL, *pFormatCtx_Out = NULL;
    AVCodecContext  *pCodecCtx_Video;
    AVCodec         *pCodec_Video;
    AVFifoBuffer    *fifo_video = NULL;
    AVAudioFifo     *fifo_audio = NULL;
    int VideoIndex, AudioIndex;

    CRITICAL_SECTION AudioSection, VideoSection;



    SwsContext *img_convert_ctx;
    int frame_size = 0;

    uint8_t *picture_buf = NULL, *frame_buf = NULL;

    bool bCap = true;

    DWORD WINAPI ScreenCapThreadProc( LPVOID lpParam );
    DWORD WINAPI AudioCapThreadProc( LPVOID lpParam );

    int OpenVideoCapture()
    {
       AVInputFormat *ifmt=av_find_input_format("gdigrab");
       //
       AVDictionary *options = NULL;
       av_dict_set(&amp;options, "framerate", "15", NULL);
       //av_dict_set(&amp;options,"offset_x","20",0);
       //The distance from the top edge of the screen or desktop
       //av_dict_set(&amp;options,"offset_y","40",0);
       //Video frame size. The default is to capture the full screen
       //av_dict_set(&amp;options,"video_size","320x240",0);
       if(avformat_open_input(&amp;pFormatCtx_Video, "desktop", ifmt, &amp;options)!=0)
       {
           printf("Couldn't open input stream.(无法打开视频输入流)\n");
           return -1;
       }
       if(avformat_find_stream_info(pFormatCtx_Video,NULL)&lt;0)
       {
           printf("Couldn't find stream information.(无法获取视频流信息)\n");
           return -1;
       }
       if (pFormatCtx_Video->streams[0]->codec->codec_type != AVMEDIA_TYPE_VIDEO)
       {
           printf("Couldn't find video stream information.(无法获取视频流信息)\n");
           return -1;
       }
       pCodecCtx_Video = pFormatCtx_Video->streams[0]->codec;
       pCodec_Video = avcodec_find_decoder(pCodecCtx_Video->codec_id);
       if(pCodec_Video == NULL)
       {
           printf("Codec not found.(没有找到解码器)\n");
           return -1;
       }
       if(avcodec_open2(pCodecCtx_Video, pCodec_Video, NULL) &lt; 0)
       {
           printf("Could not open codec.(无法打开解码器)\n");
           return -1;
       }



       img_convert_ctx = sws_getContext(pCodecCtx_Video->width, pCodecCtx_Video->height, pCodecCtx_Video->pix_fmt,
           pCodecCtx_Video->width, pCodecCtx_Video->height, PIX_FMT_YUV420P, SWS_BICUBIC, NULL, NULL, NULL);

       frame_size = avpicture_get_size(pCodecCtx_Video->pix_fmt, pCodecCtx_Video->width, pCodecCtx_Video->height);
       //
       fifo_video = av_fifo_alloc(30 * avpicture_get_size(AV_PIX_FMT_YUV420P, pCodecCtx_Video->width, pCodecCtx_Video->height));

       return 0;
    }

    static char *dup_wchar_to_utf8(wchar_t *w)
    {
       char *s = NULL;
       int l = WideCharToMultiByte(CP_UTF8, 0, w, -1, 0, 0, 0, 0);
       s = (char *) av_malloc(l);
       if (s)
           WideCharToMultiByte(CP_UTF8, 0, w, -1, s, l, 0, 0);
       return s;
    }

    int OpenAudioCapture()
    {
       //
       AVInputFormat *pAudioInputFmt = av_find_input_format("dshow");

       //
       char * psDevName = dup_wchar_to_utf8(L"audio=virtual-audio-capturer");

       if (avformat_open_input(&amp;pFormatCtx_Audio, psDevName, pAudioInputFmt,NULL) &lt; 0)
       {
           printf("Couldn't open input stream.(无法打开音频输入流)\n");
           return -1;
       }

       if(avformat_find_stream_info(pFormatCtx_Audio,NULL)&lt;0)  
           return -1;

       if(pFormatCtx_Audio->streams[0]->codec->codec_type != AVMEDIA_TYPE_AUDIO)
       {
           printf("Couldn't find video stream information.(无法获取音频流信息)\n");
           return -1;
       }

       AVCodec *tmpCodec = avcodec_find_decoder(pFormatCtx_Audio->streams[0]->codec->codec_id);
       if(0 > avcodec_open2(pFormatCtx_Audio->streams[0]->codec, tmpCodec, NULL))
       {
           printf("can not find or open audio decoder!\n");
       }



       return 0;
    }

    int OpenOutPut()
    {
       AVStream *pVideoStream = NULL, *pAudioStream = NULL;
       const char *outFileName = "test.mp4";
       avformat_alloc_output_context2(&amp;pFormatCtx_Out, NULL, NULL, outFileName);

       if (pFormatCtx_Video->streams[0]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
       {
           VideoIndex = 0;
           pVideoStream = avformat_new_stream(pFormatCtx_Out, NULL);

           if (!pVideoStream)
           {
               printf("can not new stream for output!\n");
               return -1;
           }

           //set codec context param
           pVideoStream->codec->codec = avcodec_find_encoder(AV_CODEC_ID_MPEG4);
           pVideoStream->codec->height = pFormatCtx_Video->streams[0]->codec->height;
           pVideoStream->codec->width = pFormatCtx_Video->streams[0]->codec->width;

           pVideoStream->codec->time_base = pFormatCtx_Video->streams[0]->codec->time_base;
           pVideoStream->codec->sample_aspect_ratio = pFormatCtx_Video->streams[0]->codec->sample_aspect_ratio;
           // take first format from list of supported formats
           pVideoStream->codec->pix_fmt = pFormatCtx_Out->streams[VideoIndex]->codec->codec->pix_fmts[0];

           //open encoder
           if (!pVideoStream->codec->codec)
           {
               printf("can not find the encoder!\n");
               return -1;
           }

           if (pFormatCtx_Out->oformat->flags &amp; AVFMT_GLOBALHEADER)
               pVideoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;

           if ((avcodec_open2(pVideoStream->codec, pVideoStream->codec->codec, NULL)) &lt; 0)
           {
               printf("can not open the encoder\n");
               return -1;
           }
       }

       if(pFormatCtx_Audio->streams[0]->codec->codec_type == AVMEDIA_TYPE_AUDIO)
       {
           AVCodecContext *pOutputCodecCtx;
           AudioIndex = 1;
           pAudioStream = avformat_new_stream(pFormatCtx_Out, NULL);

           pAudioStream->codec->codec = avcodec_find_encoder(pFormatCtx_Out->oformat->audio_codec);

           pOutputCodecCtx = pAudioStream->codec;

           pOutputCodecCtx->sample_rate = pFormatCtx_Audio->streams[0]->codec->sample_rate;
           pOutputCodecCtx->channel_layout = pFormatCtx_Out->streams[0]->codec->channel_layout;
           pOutputCodecCtx->channels = av_get_channel_layout_nb_channels(pAudioStream->codec->channel_layout);
           if(pOutputCodecCtx->channel_layout == 0)
           {
               pOutputCodecCtx->channel_layout = AV_CH_LAYOUT_STEREO;
               pOutputCodecCtx->channels = av_get_channel_layout_nb_channels(pOutputCodecCtx->channel_layout);

           }
           pOutputCodecCtx->sample_fmt = pAudioStream->codec->codec->sample_fmts[0];
           AVRational time_base={1, pAudioStream->codec->sample_rate};
           pAudioStream->time_base = time_base;
           //audioCodecCtx->time_base = time_base;

           pOutputCodecCtx->codec_tag = 0;  
           if (pFormatCtx_Out->oformat->flags &amp; AVFMT_GLOBALHEADER)  
               pOutputCodecCtx->flags |= CODEC_FLAG_GLOBAL_HEADER;

           if (avcodec_open2(pOutputCodecCtx, pOutputCodecCtx->codec, 0) &lt; 0)
           {
               //
               return -1;
           }
       }

       if (!(pFormatCtx_Out->oformat->flags &amp; AVFMT_NOFILE))
       {
           if(avio_open(&amp;pFormatCtx_Out->pb, outFileName, AVIO_FLAG_WRITE) &lt; 0)
           {
               printf("can not open output file handle!\n");
               return -1;
           }
       }

       if(avformat_write_header(pFormatCtx_Out, NULL) &lt; 0)
       {
           printf("can not write the header of the output file!\n");
           return -1;
       }

       return 0;
    }

    int _tmain(int argc, _TCHAR* argv[])
    {
       av_register_all();
       avdevice_register_all();
       if (OpenVideoCapture() &lt; 0)
       {
           return -1;
       }
       if (OpenAudioCapture() &lt; 0)
       {
           return -1;
       }
       if (OpenOutPut() &lt; 0)
       {
           return -1;
       }

       InitializeCriticalSection(&amp;VideoSection);
       InitializeCriticalSection(&amp;AudioSection);

       AVFrame *picture = av_frame_alloc();
       int size = avpicture_get_size(pFormatCtx_Out->streams[VideoIndex]->codec->pix_fmt,
           pFormatCtx_Out->streams[VideoIndex]->codec->width, pFormatCtx_Out->streams[VideoIndex]->codec->height);
       picture_buf = new uint8_t[size];

       avpicture_fill((AVPicture *)picture, picture_buf,
           pFormatCtx_Out->streams[VideoIndex]->codec->pix_fmt,
           pFormatCtx_Out->streams[VideoIndex]->codec->width,
           pFormatCtx_Out->streams[VideoIndex]->codec->height);



       //star cap screen thread
       CreateThread( NULL, 0, ScreenCapThreadProc, 0, 0, NULL);
       //star cap audio thread
       CreateThread( NULL, 0, AudioCapThreadProc, 0, 0, NULL);
       int64_t cur_pts_v=0,cur_pts_a=0;
       int VideoFrameIndex = 0, AudioFrameIndex = 0;

       while(1)
       {
           if (_kbhit() != 0 &amp;&amp; bCap)
           {
               bCap = false;
               Sleep(2000);//
           }
           if (fifo_audio &amp;&amp; fifo_video)
           {
               int sizeAudio = av_audio_fifo_size(fifo_audio);
               int sizeVideo = av_fifo_size(fifo_video);
               //
               if (av_audio_fifo_size(fifo_audio) &lt;= pFormatCtx_Out->streams[AudioIndex]->codec->frame_size &amp;&amp;
                   av_fifo_size(fifo_video) &lt;= frame_size &amp;&amp; !bCap)
               {
                   break;
               }
           }

           if(av_compare_ts(cur_pts_v, pFormatCtx_Out->streams[VideoIndex]->time_base,
               cur_pts_a,pFormatCtx_Out->streams[AudioIndex]->time_base) &lt;= 0)
           {
               //read data from fifo
               if (av_fifo_size(fifo_video) &lt; frame_size &amp;&amp; !bCap)
               {
                   cur_pts_v = 0x7fffffffffffffff;
               }
               if(av_fifo_size(fifo_video) >= size)
               {
                   EnterCriticalSection(&amp;VideoSection);
                   av_fifo_generic_read(fifo_video, picture_buf, size, NULL);
                   LeaveCriticalSection(&amp;VideoSection);

                   avpicture_fill((AVPicture *)picture, picture_buf,
                       pFormatCtx_Out->streams[VideoIndex]->codec->pix_fmt,
                       pFormatCtx_Out->streams[VideoIndex]->codec->width,
                       pFormatCtx_Out->streams[VideoIndex]->codec->height);

                   //pts = n * ((1 / timbase)/ fps);
                   picture->pts = VideoFrameIndex * ((pFormatCtx_Video->streams[0]->time_base.den / pFormatCtx_Video->streams[0]->time_base.num) / 15);

                   int got_picture = 0;
                   AVPacket pkt;
                   av_init_packet(&amp;pkt);

                   pkt.data = NULL;
                   pkt.size = 0;
                   int ret = avcodec_encode_video2(pFormatCtx_Out->streams[VideoIndex]->codec, &amp;pkt, picture, &amp;got_picture);
                   if(ret &lt; 0)
                   {
                       //
                       continue;
                   }

                   if (got_picture==1)
                   {
                       pkt.stream_index = VideoIndex;
                       pkt.pts = av_rescale_q_rnd(pkt.pts, pFormatCtx_Video->streams[0]->time_base,
                           pFormatCtx_Out->streams[VideoIndex]->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  
                       pkt.dts = av_rescale_q_rnd(pkt.dts,  pFormatCtx_Video->streams[0]->time_base,
                           pFormatCtx_Out->streams[VideoIndex]->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  

                       pkt.duration = ((pFormatCtx_Out->streams[0]->time_base.den / pFormatCtx_Out->streams[0]->time_base.num) / 15);

                       cur_pts_v = pkt.pts;

                       ret = av_interleaved_write_frame(pFormatCtx_Out, &amp;pkt);
                       //delete[] pkt.data;
                       av_free_packet(&amp;pkt);
                   }
                   VideoFrameIndex++;
               }
           }
           else
           {
               if (NULL == fifo_audio)
               {
                   continue;//
               }
               if (av_audio_fifo_size(fifo_audio) &lt; pFormatCtx_Out->streams[AudioIndex]->codec->frame_size &amp;&amp; !bCap)
               {
                   cur_pts_a = 0x7fffffffffffffff;
               }
               if(av_audio_fifo_size(fifo_audio) >=
                   (pFormatCtx_Out->streams[AudioIndex]->codec->frame_size > 0 ? pFormatCtx_Out->streams[AudioIndex]->codec->frame_size : 1024))
               {
                   AVFrame *frame;
                   frame = av_frame_alloc();
                   frame->nb_samples = pFormatCtx_Out->streams[AudioIndex]->codec->frame_size>0 ? pFormatCtx_Out->streams[AudioIndex]->codec->frame_size: 1024;
                   frame->channel_layout = pFormatCtx_Out->streams[AudioIndex]->codec->channel_layout;
                   frame->format = pFormatCtx_Out->streams[AudioIndex]->codec->sample_fmt;
                   frame->sample_rate = pFormatCtx_Out->streams[AudioIndex]->codec->sample_rate;
                   av_frame_get_buffer(frame, 0);

                   EnterCriticalSection(&amp;AudioSection);
                   av_audio_fifo_read(fifo_audio, (void **)frame->data,
                       (pFormatCtx_Out->streams[AudioIndex]->codec->frame_size > 0 ? pFormatCtx_Out->streams[AudioIndex]->codec->frame_size : 1024));
                   LeaveCriticalSection(&amp;AudioSection);

                   if (pFormatCtx_Out->streams[0]->codec->sample_fmt != pFormatCtx_Audio->streams[AudioIndex]->codec->sample_fmt
                       || pFormatCtx_Out->streams[0]->codec->channels != pFormatCtx_Audio->streams[AudioIndex]->codec->channels
                       || pFormatCtx_Out->streams[0]->codec->sample_rate != pFormatCtx_Audio->streams[AudioIndex]->codec->sample_rate)
                   {
                       //
                   }

                   AVPacket pkt_out;
                   av_init_packet(&amp;pkt_out);
                   int got_picture = -1;
                   pkt_out.data = NULL;
                   pkt_out.size = 0;

                   frame->pts = AudioFrameIndex * pFormatCtx_Out->streams[AudioIndex]->codec->frame_size;
                   if (avcodec_encode_audio2(pFormatCtx_Out->streams[AudioIndex]->codec, &amp;pkt_out, frame, &amp;got_picture) &lt; 0)
                   {
                       printf("can not decoder a frame");
                   }
                   av_frame_free(&amp;frame);
                   if (got_picture)
                   {
                       pkt_out.stream_index = AudioIndex;
                       pkt_out.pts = AudioFrameIndex * pFormatCtx_Out->streams[AudioIndex]->codec->frame_size;
                       pkt_out.dts = AudioFrameIndex * pFormatCtx_Out->streams[AudioIndex]->codec->frame_size;
                       pkt_out.duration = pFormatCtx_Out->streams[AudioIndex]->codec->frame_size;

                       cur_pts_a = pkt_out.pts;

                       int ret = av_interleaved_write_frame(pFormatCtx_Out, &amp;pkt_out);
                       av_free_packet(&amp;pkt_out);
                   }
                   AudioFrameIndex++;
               }
           }
       }

       delete[] picture_buf;

       delete[]frame_buf;
       av_fifo_free(fifo_video);
       av_audio_fifo_free(fifo_audio);

       av_write_trailer(pFormatCtx_Out);

       avio_close(pFormatCtx_Out->pb);
       avformat_free_context(pFormatCtx_Out);

       if (pFormatCtx_Video != NULL)
       {
           avformat_close_input(&amp;pFormatCtx_Video);
           pFormatCtx_Video = NULL;
       }
       if (pFormatCtx_Audio != NULL)
       {
           avformat_close_input(&amp;pFormatCtx_Audio);
           pFormatCtx_Audio = NULL;
       }
       if (NULL != img_convert_ctx)
       {
           sws_freeContext(img_convert_ctx);
           img_convert_ctx = NULL;
       }

       return 0;
    }

    DWORD WINAPI ScreenCapThreadProc( LPVOID lpParam )
    {
       AVPacket packet;/* = (AVPacket *)av_malloc(sizeof(AVPacket))*/;
       int got_picture;
       AVFrame *pFrame;
       pFrame= av_frame_alloc();

       AVFrame *picture = av_frame_alloc();
       int size = avpicture_get_size(pFormatCtx_Out->streams[VideoIndex]->codec->pix_fmt,
           pFormatCtx_Out->streams[VideoIndex]->codec->width, pFormatCtx_Out->streams[VideoIndex]->codec->height);
       //picture_buf = new uint8_t[size];

       avpicture_fill((AVPicture *)picture, picture_buf,
           pFormatCtx_Out->streams[VideoIndex]->codec->pix_fmt,
           pFormatCtx_Out->streams[VideoIndex]->codec->width,
           pFormatCtx_Out->streams[VideoIndex]->codec->height);

    //  FILE *p = NULL;
    //  p = fopen("proc_test.yuv", "wb+");
       av_init_packet(&amp;packet);
       int height = pFormatCtx_Out->streams[VideoIndex]->codec->height;
       int width = pFormatCtx_Out->streams[VideoIndex]->codec->width;
       int y_size=height*width;
       while(bCap)
       {
           packet.data = NULL;
           packet.size = 0;
           if (av_read_frame(pFormatCtx_Video, &amp;packet) &lt; 0)
           {
               av_free_packet(&amp;packet);
               continue;
           }
           if(packet.stream_index == 0)
           {
               if (avcodec_decode_video2(pCodecCtx_Video, pFrame, &amp;got_picture, &amp;packet) &lt; 0)
               {
                   printf("Decode Error.(解码错误)\n");
                   continue;
               }
               if (got_picture)
               {
                   sws_scale(img_convert_ctx, (const uint8_t* const*)pFrame->data, pFrame->linesize, 0,
                       pFormatCtx_Out->streams[VideoIndex]->codec->height, picture->data, picture->linesize);

                   if (av_fifo_space(fifo_video) >= size)
                   {
                       EnterCriticalSection(&amp;VideoSection);                    
                       av_fifo_generic_write(fifo_video, picture->data[0], y_size, NULL);
                       av_fifo_generic_write(fifo_video, picture->data[1], y_size/4, NULL);
                       av_fifo_generic_write(fifo_video, picture->data[2], y_size/4, NULL);
                       LeaveCriticalSection(&amp;VideoSection);
                   }
               }
           }
           av_free_packet(&amp;packet);
           //Sleep(50);
       }
       av_frame_free(&amp;pFrame);
       av_frame_free(&amp;picture);
       //delete[] picture_buf;
       return 0;
    }

    DWORD WINAPI AudioCapThreadProc( LPVOID lpParam )
    {
       AVPacket pkt;
       AVFrame *frame;
       frame = av_frame_alloc();
       int gotframe;
       while(bCap)
       {
           pkt.data = NULL;
           pkt.size = 0;
           if(av_read_frame(pFormatCtx_Audio,&amp;pkt) &lt; 0)
           {
               av_free_packet(&amp;pkt);
               continue;
           }

           if (avcodec_decode_audio4(pFormatCtx_Audio->streams[0]->codec, frame, &amp;gotframe, &amp;pkt) &lt; 0)
           {
               av_frame_free(&amp;frame);
               printf("can not decoder a frame");
               break;
           }
           av_free_packet(&amp;pkt);

           if (!gotframe)
           {
               continue;//
           }

           if (NULL == fifo_audio)
           {
               fifo_audio = av_audio_fifo_alloc(pFormatCtx_Audio->streams[0]->codec->sample_fmt,
                   pFormatCtx_Audio->streams[0]->codec->channels, 30 * frame->nb_samples);
           }

           int buf_space = av_audio_fifo_space(fifo_audio);
           if (av_audio_fifo_space(fifo_audio) >= frame->nb_samples)
           {
               EnterCriticalSection(&amp;AudioSection);
               av_audio_fifo_write(fifo_audio, (void **)frame->data, frame->nb_samples);
               LeaveCriticalSection(&amp;AudioSection);
           }
       }
       av_frame_free(&amp;frame);
       return 0;
    }