
Recherche avancée
Médias (91)
-
999,999
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
The Slip - Artworks
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Texte
-
Demon seed (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
The four of us are dying (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
Corona radiata (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
Lights in the sky (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
Autres articles (95)
-
MediaSPIP 0.1 Beta version
25 avril 2011, parMediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
The zip file provided here only contains the sources of MediaSPIP in its standalone version.
To get a working installation, you must manually install all-software dependencies on the server.
If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...) -
Multilang : améliorer l’interface pour les blocs multilingues
18 février 2011, parMultilang est un plugin supplémentaire qui n’est pas activé par défaut lors de l’initialisation de MediaSPIP.
Après son activation, une préconfiguration est mise en place automatiquement par MediaSPIP init permettant à la nouvelle fonctionnalité d’être automatiquement opérationnelle. Il n’est donc pas obligatoire de passer par une étape de configuration pour cela. -
HTML5 audio and video support
13 avril 2011, parMediaSPIP uses HTML5 video and audio tags to play multimedia files, taking advantage of the latest W3C innovations supported by modern browsers.
The MediaSPIP player used has been created specifically for MediaSPIP and can be easily adapted to fit in with a specific theme.
For older browsers the Flowplayer flash fallback is used.
MediaSPIP allows for media playback on major mobile platforms with the above (...)
Sur d’autres sites (9218)
-
Bootstrapping an AI UGC system — video generation is expensive, APIs are limiting, and I need help navigating it all [closed]
24 juin, par Barack _ OumaI’m building a solo AI-powered UGC (User-Generated Content) platform — something that automates the creation of short-form content using AI avatars, voices, visuals, and scripts. But I’ve hit a wall with video generation and API limitations.


So far, I’ve integrated TTS and voice cloning (using ElevenLabs), and I’ve gotten image generation working. But video generation (especially talking avatars) has been a nightmare — both financially and technically.


🛠️ Features I’m trying to build :


AI avatars (face + lip-syncing)
Script generation (LLM-driven)
Image generation
Video composition


I’m trying to build an AI faceless content creation automtion platform alternative to Makeugc.com or Reelfarm.org or postbridge.com — just trying to create a working pipeline for automated content.


❌ Challenges so far :


Services like D-ID, Synthesia, Magic Hour, and Luma are either paywalled, have no trials, or are very expensive.


D-ID does support avatar creation, but you need to pay upfront to even access those features. There's no easy/free entry point.


Tools like Google Veo 3 are powerful but clearly not accessible for indie builders.
I’ve looked into open-source models like WAN 2.1, CogVideo, etc., but I have no clue how to run them or what infra is needed.


Now I’m torn between buying my own GPU or renting compute power to self-host these models.


💸 Cost is a huge blocker


I’ve been looking through Replicate’s pricing, and while some models (especially image gen) are manageable, video models get expensive fast. Even GPU rental rates stack up quickly, especially if you’re testing often or experimenting with pipelines. Plus, idle time billing doesn’t help.


💭 What I could really use help with :


Has anyone successfully stitched together APIs (voice, avatar, video) into a working UGC pipeline ?


Should I use separate services (e.g. ElevenLabs + Synthesia + WAN) or try to host my own end-to-end system ?


Is it cheaper (long term) to buy a used GPU like a 4090 and run things locally ? Or better to rent compute short-term ?


Any open-source solutions that are beginner-friendly or have minimal setup ?
Any existing frameworks or wrappers for UGC media pipelines that make all this easier ?


I’ve spent weeks researching, testing APIs, and hitting walls — and while I’ve learned a lot, I’d really appreciate any guidance from folks who’ve been here before.
Thanks in advance 🙏


And good luck to everyone else trying to build with AI on a budget — this stuff isn’t as plug-and-play as it looks on launch videos 💀


-
Linux Media Player Survey Circa 2001
2 septembre 2010, par Multimedia Mike — GeneralHere’s a document I scavenged from my archives. It was dated September 1, 2001 and I now publish it 9 years later. It serves as sort of a time capsule for the state of media player programs at the time. Looking back on this list, I can’t understand why I couldn’t find MPlayer while I was conducting this survey, especially since MPlayer is the project I eventually started to work for a few months after writing this piece.
For a little context, I had been studying multimedia concepts and tech for a year and was itching to get my hands dirty with practical multimedia coding. But I wanted to tackle what I perceived as unsolved problems– like playback of proprietary codecs. I didn’t want to have to build a new media playback framework just to start working on my problems. So I surveyed the players available to see which ones I could plug into and use as a testbed for implementing new decoders.
Regarding Real Player, I wrote : “We’re trying to move away from the proprietary, closed-source “solutions”. Heh. Was I really an insufferable open source idealist back in the day ?
Anyway, here’s the text with some Where are they now ? commentary [in brackets] :
Towards an All-Inclusive Media Playing Solution for Linux
I don’t feel that the media playing solutions for Linux set their sights high enough, even though they do tend to be quite ambitious.
I want to create a media player for Linux that can open a file, figure out what type of file it is (AVI, MOV, etc.), determine the compression algorithms used to encode the audio and video chunks inside (MPEG, Cinepak, Sorenson, etc.) and replay the file using the best audio, video, and CPU facilities available on the computer.
Video and audio playback is a solved problem on Linux ; I don’t wish to solve that problem again. The problem that isn’t solved is reliance on proprietary multimedia solutions through some kind of WINE-like layer in order to decode compressed multimedia files.
Survey of Linux solutions for decoding proprietary multimedia
updated 2001-09-01AVI Player for XMMS
This is based on Avifile. All the same advantages and limitations apply.
[Top Google hit is a Freshmeat page that doesn’t indicate activity since 2001-2002.]Avifile
This player does a great job at taking apart AVI and ASF files and then feeding the compressed chunks of multimedia data through to the binary Win32 decoders.The program is written in C++ and I’m not very good at interpreting that kind of code. But I’m learning all over again. Examining the object hierarchy, it appears that the designers had the foresight to include native support for decoders that are compiled into the program from source code. However, closer examination reveals that there is support for ONE source decoder and that’s the “decoder” for uncompressed data. Still, I tried to manipulate this routine to accept and decode data from other codecs but no dice. It’s really confounding. The program always crashes when I feed non-uncompressed data through the source decoder.
[Lives at http://avifile.sourceforge.net/ ; not updated since 2006.]Real Player
There’s not much to do with this since it is closed source and proprietary. Even though there is a plugin architecture, that’s not satisfactory. We’re trying to move away from the proprietary, closed-source “solutions”.
[Still kickin’ with version 11.]XAnim
This is a well-established Unix media player. To his credit, the author does as well as he can with the resources he has. In other words, he supports the non-proprietary video codecs well, and even has support for some proprietary video codecs through binary-only decoders.The source code is extremely difficult to work with as the author chose to use the X coding format which I’ve never seen used anywhere else except for X header files. The infrastructure for extending the program and supporting other codecs and file formats is there, I suppose, but I would have to wrap my head around the coding style. Maybe I can learn to work past that. The other thing that bothers me about this program is the decoding approach : It seems that each video decoder includes routines to decompress the multimedia data into every conceivable RGB and YUV output format. This seems backwards to me ; it seems better to have one decoder function that decodes the data into its native format it was compressed from (e.g., YV12 for MPEG data) and then pass that data to another layer of the program that’s in charge of presenting the data and possibly converting it if necessary. This layer would encompass highly-optimized software conversion routines including special CPU-specific instructions (e.g., MMX and SSE) and eliminate the need to place those routines in lots of other routines. But I’m getting ahead of myself.
[This one was pretty much dead before I made this survey, the most recent update being in 1999. Still, we owe it much respect as the granddaddy of Unix multimedia playback programs.]Xine
This seems like a promising program. It was originally designed to play MPEGs from DVDs. It can also play MPEG files on a hard drive and utilizes the Xv extensions for hardware YUV playback. It’s also supposed to play AVI files using the same technique as Avifile but I have never, ever gotten it to work. If an AVI file has both video and sound, the binary video decoder can’t decode any frames. If the AVI file has video and no sound, the program gets confused and crashes, as far as I can tell.Still, it’s promising, and I’ve been trying to work around these crashes. It doesn’t yet have the type of modularization I’d like to see. Right now, it tailored to suit MPEG playback and AVI playback is an afterthought. Still, it appears to have a generalized interface for dropping in new file demultiplexers.
I tried to extend the program for supporting source decoders by rewriting w32codec.c from scratch. I’m not having a smooth time of it so far. I’m able to perform some manipulations on the output window. However, I can’t get the program to deal with an RGB image format. It has trouble allocating an RGB surface with XvShmCreateImage(). This isn’t suprising, per my limited knowledge of X which is that Xv applies to YUV images, but it could also apply to RGB images as well. Anyway, the program should be able to fall back on regular RGB pixmaps if that Xv call fails.
Right now, this program is looking the most promising. It will take some work to extend the underlying infrastructure, but it seems doable since I know C quite well and can understand the flow of this program, as opposed to Avifile and its C++. The C code also compiles about 10 times faster.
[My home project for many years after a brief flirtation with MPlayer. It is still alive ; its latest release was just a month ago.]XMovie
This library is a Quicktime movie player. I haven’t looked at it too extensively yet, but I do remember looking at it at one point and reading the documentation that said it doesn’t support key frames. Still, I should examine it again since they released a new version recently.
[Heroine Virtual still puts out some software but XMovie has not been updated since 2005.]XMPS
This program compiles for me, but doesn’t do much else. It can play an MP3 file. I have been able to get MPEG movies to play through it, but it refuses to show the full video frame, constricting it to a small window (obviously a bug).
[This project is hosted on SourceForge and is listed with a registration date of 2003, well after this survey was made. So the project obviously lived elsewhere in 2001. Meanwhile, it doesn’t look like any files ever made it to SF for hosting.]XTheater
I can’t even get this program to compile. It’s supposed to be an MPEG player based on SMPEG. As such, it probably doesn’t hold much promise for being easily extended into a general media player.
[Last updated in 2002.]GMerlin
I can’t get this to compile yet. I have a bug report in to the dev group.
[Updated consistently in the last 9 years. Last update was in February of this year. I can’t find any record of my bug report, though.] -
Inside WebM Technology : VP8 Intra and Inter Prediction
20 juillet 2010, par noreply@blogger.com (Lou Quillio)Continuing our series on WebM technology, I will discuss the use of prediction methods in the VP8 video codec, with special attention to the TM_PRED and SPLITMV modes, which are unique to VP8.First, some background. To encode a video frame, block-based codecs such as VP8 first divide the frame into smaller segments called macroblocks. Within each macroblock, the encoder can predict redundant motion and color information based on previously processed blocks. The redundant data can be subtracted from the block, resulting in more efficient compression.
Image by Fido Factor, licensed under Creative Commons Attribution License.
Based on a work at www.flickr.comA VP8 encoder uses two classes of prediction :- Intra prediction uses data within a single video frame
- Inter prediction uses data from previously encoded frames
The residual signal data is then encoded using other techniques, such as transform coding.VP8 Intra Prediction ModesVP8 intra prediction modes are used with three types of macroblocks :- 4x4 luma
- 16x16 luma
- 8x8 chroma
Four common intra prediction modes are shared by these macroblocks :- H_PRED (horizontal prediction). Fills each column of the block with a copy of the left column, L.
- V_PRED (vertical prediction). Fills each row of the block with a copy of the above row, A.
- DC_PRED (DC prediction). Fills the block with a single value using the average of the pixels in the row above A and the column to the left of L.
- TM_PRED (TrueMotion prediction). A mode that gets its name from a compression technique developed by On2 Technologies. In addition to the row A and column L, TM_PRED uses the pixel P above and to the left of the block. Horizontal differences between pixels in A (starting from P) are propagated using the pixels from L to start each row.
For 4x4 luma blocks, there are six additional intra modes similar to V_PRED and H_PRED, but correspond to predicting pixels in different directions. These modes are outside the scope of this post, but if you want to learn more see the VP8 Bitstream Guide.As mentioned above, the TM_PRED mode is unique to VP8. The following figure uses an example 4x4 block of pixels to illustrate how the TM_PRED mode works :Where C, As and Ls represent reconstructed pixel values from previously coded blocks, and X00 through X33 represent predicted values for the current block. TM_PRED uses the following equation to calculate Xij :Xij = Li + Aj - C (i, j=0, 1, 2, 3)Although the above example uses a 4x4 block, the TM_PRED mode for 8x8 and 16x16 blocks works in the same fashion.TM_PRED is one of the more frequently used intra prediction modes in VP8, and for common video sequences it is typically used by 20% to 45% of all blocks that are intra coded. Overall, together with other intra prediction modes, TM_PRED helps VP8 to achieve very good compression efficiency, especially for key frames, which can only use intra modes (key frames by their very nature cannot refer to previously encoded frames).VP8 Inter Prediction ModesIn VP8, inter prediction modes are used only on inter frames (non-key frames). For any VP8 inter frame, there are typically three previously coded reference frames that can be used for prediction. A typical inter prediction block is constructed using a motion vector to copy a block from one of the three frames. The motion vector points to the location of a pixel block to be copied. In most video compression schemes, a good portion of the bits are spent on encoding motion vectors ; the portion can be especially large for video encoded at lower datarates.Like previous VPx codecs, VP8 encodes motion vectors very efficiently by reusing vectors from neighboring macroblocks (a macroblock includes one 16x16 luma block and two 8x8 chroma blocks). VP8 uses a similar strategy in the overall design of inter prediction modes. For example, the prediction modes "NEAREST" and "NEAR" make use of last and second-to-last, non-zero motion vectors from neighboring macroblocks. These inter prediction modes can be used in combination with any of the three different reference frames.In addition, VP8 has a very sophisticated, flexible inter prediction mode called SPLITMV. This mode was designed to enable flexible partitioning of a macroblock into sub-blocks to achieve better inter prediction. SPLITMV is very useful when objects within a macroblock have different motion characteristics. Within a macroblock coded using SPLITMV mode, each sub-block can have its own motion vector. Similar to the strategy of reusing motion vectors at the macroblock level, a sub-block can also use motion vectors from neighboring sub-blocks above or left to the current block. This strategy is very flexible and can effectively encode any shape of sub-macroblock partitioning, and does so efficiently. Here is an example of a macroblock with 16x16 luma pixels that is partitioned to 16 4x4 blocks :where New represents a 4x4 bock coded with a new motion vector, and Left and Above represent a 4x4 block coded using the motion vector from the left and above, respectively. This example effectively partitions the 16x16 macroblock into 3 different segments with 3 different motion vectors (represented below by 1, 2 and 3) :Through effective use of intra and inter prediction modes, WebM encoder implementations can achieve great compression quality on a wide range of source material. If you want to delve further into VP8 prediction modes, read the VP8 Bitstream Guide or examine the reconintra.c and rdopt.c files in the VP8 source tree.Yaowu Xu, Ph.D. is a codec engineer at Google.