
Recherche avancée
Autres articles (65)
-
Le profil des utilisateurs
12 avril 2011, parChaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...) -
Configurer la prise en compte des langues
15 novembre 2010, parAccéder à la configuration et ajouter des langues prises en compte
Afin de configurer la prise en compte de nouvelles langues, il est nécessaire de se rendre dans la partie "Administrer" du site.
De là, dans le menu de navigation, vous pouvez accéder à une partie "Gestion des langues" permettant d’activer la prise en compte de nouvelles langues.
Chaque nouvelle langue ajoutée reste désactivable tant qu’aucun objet n’est créé dans cette langue. Dans ce cas, elle devient grisée dans la configuration et (...) -
XMP PHP
13 mai 2011, parDixit Wikipedia, XMP signifie :
Extensible Metadata Platform ou XMP est un format de métadonnées basé sur XML utilisé dans les applications PDF, de photographie et de graphisme. Il a été lancé par Adobe Systems en avril 2001 en étant intégré à la version 5.0 d’Adobe Acrobat.
Étant basé sur XML, il gère un ensemble de tags dynamiques pour l’utilisation dans le cadre du Web sémantique.
XMP permet d’enregistrer sous forme d’un document XML des informations relatives à un fichier : titre, auteur, historique (...)
Sur d’autres sites (14750)
-
Progress with rtc.io
12 août 2014, par silviaAt the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.
Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.
One of the most exciting opportunities is still under-exploited : the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.
Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.
We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone ?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections :
Packages per day across popular platforms (Shamelessly copied from : http://blog.nodejitsu.com/npm-innovation-through-modularity/) For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.
For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.
Using rtc.io rtc JS library So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course : enjoy WebRTC ! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team !
On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating !!
The post Progress with rtc.io first appeared on ginger’s thoughts.
-
FFmpeg overlay positioning issue : Converting frontend center coordinates to FFmpeg top-left coordinates
25 janvier, par tarunI'm building a web-based video editor where users can :


Add multiple videos
Add images
Add text overlays with background color


Frontend sends coordinates where each element's (x,y) represents its center position.
on click of the export button i want all data to be exported as one final video
on click i send the data to the backend like -


const exportAllVideos = async () => {
 try {
 const formData = new FormData();
 
 
 const normalizedVideos = videos.map(video => ({
 ...video,
 startTime: parseFloat(video.startTime),
 endTime: parseFloat(video.endTime),
 duration: parseFloat(video.duration)
 })).sort((a, b) => a.startTime - b.startTime);

 
 for (const video of normalizedVideos) {
 const response = await fetch(video.src);
 const blobData = await response.blob();
 const file = new File([blobData], `${video.id}.mp4`, { type: "video/mp4" });
 formData.append("videos", file);
 }

 
 const normalizedImages = images.map(image => ({
 ...image,
 startTime: parseFloat(image.startTime),
 endTime: parseFloat(image.endTime),
 x: parseInt(image.x),
 y: parseInt(image.y),
 width: parseInt(image.width),
 height: parseInt(image.height),
 opacity: parseInt(image.opacity)
 }));

 
 for (const image of normalizedImages) {
 const response = await fetch(image.src);
 const blobData = await response.blob();
 const file = new File([blobData], `${image.id}.png`, { type: "image/png" });
 formData.append("images", file);
 }

 
 const normalizedTexts = texts.map(text => ({
 ...text,
 startTime: parseFloat(text.startTime),
 endTime: parseFloat(text.endTime),
 x: parseInt(text.x),
 y: parseInt(text.y),
 fontSize: parseInt(text.fontSize),
 opacity: parseInt(text.opacity)
 }));

 
 formData.append("metadata", JSON.stringify({
 videos: normalizedVideos,
 images: normalizedImages,
 texts: normalizedTexts
 }));

 const response = await fetch("my_flask_endpoint", {
 method: "POST",
 body: formData
 });

 if (!response.ok) {
 
 console.log('wtf', response);
 
 }

 const finalVideo = await response.blob();
 const url = URL.createObjectURL(finalVideo);
 const a = document.createElement("a");
 a.href = url;
 a.download = "final_video.mp4";
 a.click();
 URL.revokeObjectURL(url);

 } catch (e) {
 console.log(e, "err");
 }
 };



the frontend data for each object that is text image and video we are storing it as an array of objects below is the Data strcutre for each object -


// the frontend data for each
 const newVideo = {
 id: uuidv4(),
 src: URL.createObjectURL(videoData.videoBlob),
 originalDuration: videoData.duration,
 duration: videoData.duration,
 startTime: 0,
 playbackOffset: 0,
 endTime: videoData.endTime || videoData.duration,
 isPlaying: false,
 isDragging: false,
 speed: 1,
 volume: 100,
 x: window.innerHeight / 2,
 y: window.innerHeight / 2,
 width: videoData.width,
 height: videoData.height,
 };
 const newTextObject = {
 id: uuidv4(),
 description: text,
 opacity: 100,
 x: containerWidth.width / 2,
 y: containerWidth.height / 2,
 fontSize: 18,
 duration: 20,
 endTime: 20,
 startTime: 0,
 color: "#ffffff",
 backgroundColor: hasBG,
 padding: 8,
 fontWeight: "normal",
 width: 200,
 height: 40,
 };

 const newImage = {
 id: uuidv4(),
 src: URL.createObjectURL(imageData),
 x: containerWidth.width / 2,
 y: containerWidth.height / 2,
 width: 200,
 height: 200,
 borderRadius: 0,
 startTime: 0,
 endTime: 20,
 duration: 20,
 opacity: 100,
 };




BACKEND CODE -


import os
import shutil
import subprocess
from flask import Flask, request, send_file
import ffmpeg
import json
from werkzeug.utils import secure_filename
import uuid
from flask_cors import CORS


app = Flask(__name__)
CORS(app, resources={r"/*": {"origins": "*"}})



UPLOAD_FOLDER = 'temp_uploads'
if not os.path.exists(UPLOAD_FOLDER):
 os.makedirs(UPLOAD_FOLDER)


@app.route('/')
def home():
 return 'Hello World'


OUTPUT_WIDTH = 1920
OUTPUT_HEIGHT = 1080



@app.route('/process', methods=['POST'])
def process_video():
 work_dir = None
 try:
 work_dir = os.path.abspath(os.path.join(UPLOAD_FOLDER, str(uuid.uuid4())))
 os.makedirs(work_dir)
 print(f"Created working directory: {work_dir}")

 metadata = json.loads(request.form['metadata'])
 print("Received metadata:", json.dumps(metadata, indent=2))
 
 video_paths = []
 videos = request.files.getlist('videos')
 for idx, video in enumerate(videos):
 filename = f"video_{idx}.mp4"
 filepath = os.path.join(work_dir, filename)
 video.save(filepath)
 if os.path.exists(filepath) and os.path.getsize(filepath) > 0:
 video_paths.append(filepath)
 print(f"Saved video to: {filepath} Size: {os.path.getsize(filepath)}")
 else:
 raise Exception(f"Failed to save video {idx}")

 image_paths = []
 images = request.files.getlist('images')
 for idx, image in enumerate(images):
 filename = f"image_{idx}.png"
 filepath = os.path.join(work_dir, filename)
 image.save(filepath)
 if os.path.exists(filepath):
 image_paths.append(filepath)
 print(f"Saved image to: {filepath}")

 output_path = os.path.join(work_dir, 'output.mp4')

 filter_parts = []

 base_duration = metadata["videos"][0]["duration"] if metadata["videos"] else 10
 filter_parts.append(f'color=c=black:s={OUTPUT_WIDTH}x{OUTPUT_HEIGHT}:d={base_duration}[canvas];')

 for idx, (path, meta) in enumerate(zip(video_paths, metadata['videos'])):
 x_pos = int(meta.get("x", 0) - (meta.get("width", 0) / 2))
 y_pos = int(meta.get("y", 0) - (meta.get("height", 0) / 2))
 
 filter_parts.extend([
 f'[{idx}:v]setpts=PTS-STARTPTS,scale={meta.get("width", -1)}:{meta.get("height", -1)}[v{idx}];',
 f'[{idx}:a]asetpts=PTS-STARTPTS[a{idx}];'
 ])

 if idx == 0:
 filter_parts.append(
 f'[canvas][v{idx}]overlay=x={x_pos}:y={y_pos}:eval=init[temp{idx}];'
 )
 else:
 filter_parts.append(
 f'[temp{idx-1}][v{idx}]overlay=x={x_pos}:y={y_pos}:'
 f'enable=\'between(t,{meta["startTime"]},{meta["endTime"]})\':eval=init'
 f'[temp{idx}];'
 )

 last_video_temp = f'temp{len(video_paths)-1}'

 if video_paths:
 audio_mix_parts = []
 for idx in range(len(video_paths)):
 audio_mix_parts.append(f'[a{idx}]')
 filter_parts.append(f'{"".join(audio_mix_parts)}amix=inputs={len(video_paths)}[aout];')

 
 if image_paths:
 for idx, (img_path, img_meta) in enumerate(zip(image_paths, metadata['images'])):
 input_idx = len(video_paths) + idx
 
 
 x_pos = int(img_meta["x"] - (img_meta["width"] / 2))
 y_pos = int(img_meta["y"] - (img_meta["height"] / 2))
 
 filter_parts.extend([
 f'[{input_idx}:v]scale={img_meta["width"]}:{img_meta["height"]}[img{idx}];',
 f'[{last_video_temp}][img{idx}]overlay=x={x_pos}:y={y_pos}:'
 f'enable=\'between(t,{img_meta["startTime"]},{img_meta["endTime"]})\':'
 f'alpha={img_meta["opacity"]/100}[imgout{idx}];'
 ])
 last_video_temp = f'imgout{idx}'

 if metadata.get('texts'):
 for idx, text in enumerate(metadata['texts']):
 next_output = f'text{idx}' if idx < len(metadata['texts']) - 1 else 'vout'
 
 escaped_text = text["description"].replace("'", "\\'")
 
 x_pos = int(text["x"] - (text["width"] / 2))
 y_pos = int(text["y"] - (text["height"] / 2))
 
 text_filter = (
 f'[{last_video_temp}]drawtext=text=\'{escaped_text}\':'
 f'x={x_pos}:y={y_pos}:'
 f'fontsize={text["fontSize"]}:'
 f'fontcolor={text["color"]}'
 )
 
 if text.get('backgroundColor'):
 text_filter += f':box=1:boxcolor={text["backgroundColor"]}:boxborderw=5'
 
 if text.get('fontWeight') == 'bold':
 text_filter += ':font=Arial-Bold'
 
 text_filter += (
 f':enable=\'between(t,{text["startTime"]},{text["endTime"]})\''
 f'[{next_output}];'
 )
 
 filter_parts.append(text_filter)
 last_video_temp = next_output
 else:
 filter_parts.append(f'[{last_video_temp}]null[vout];')

 
 filter_complex = ''.join(filter_parts)

 
 cmd = [
 'ffmpeg',
 *sum([['-i', path] for path in video_paths], []),
 *sum([['-i', path] for path in image_paths], []),
 '-filter_complex', filter_complex,
 '-map', '[vout]'
 ]
 
 
 if video_paths:
 cmd.extend(['-map', '[aout]'])
 
 cmd.extend(['-y', output_path])

 print(f"Running ffmpeg command: {' '.join(cmd)}")
 result = subprocess.run(cmd, capture_output=True, text=True)
 
 if result.returncode != 0:
 print(f"FFmpeg error output: {result.stderr}")
 raise Exception(f"FFmpeg processing failed: {result.stderr}")

 return send_file(
 output_path,
 mimetype='video/mp4',
 as_attachment=True,
 download_name='final_video.mp4'
 )

 except Exception as e:
 print(f"Error in video processing: {str(e)}")
 return {'error': str(e)}, 500
 
 finally:
 if work_dir and os.path.exists(work_dir):
 try:
 print(f"Directory contents before cleanup: {os.listdir(work_dir)}")
 if not os.environ.get('FLASK_DEBUG'):
 shutil.rmtree(work_dir)
 else:
 print(f"Keeping directory for debugging: {work_dir}")
 except Exception as e:
 print(f"Cleanup error: {str(e)}")

 
if __name__ == '__main__':
 app.run(debug=True, port=8000)




I'm also attaching what the final thing looks like on the frontend web vs in the downloaded video
and as u can see the downloaded video has all coords and positions messed up be it of the texts, images as well as videos




can somebody please help me figure this out :)


-
The problems with wavelets
I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.
Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression. Their methodology is basically opposite : each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block. Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image. DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.
Both are complete transforms, offering equally accurate frequency-domain representations of pixel data. I won’t go into the mathematical details of each here ; the real question is whether one offers better compression opportunities for real-world video.
DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data. Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks. Wavelet transforms typically overlap, avoiding such a need. But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.
Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel. Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors. The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements. For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step. With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.
At this point, it would seem wavelets would have pretty big advantages : when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations. Why then hasn’t everyone switched over to wavelets then ? Dirac and Snow offer modern implementations. Yet despite decades of research, wavelets have consistently disappointed for image and video compression. It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.
1. No known method exists for efficient intra coding. H.264′s spatial intra prediction is extraordinarily powerful, but relies on knowing the exact decoded pixels to the top and left of the current block. Since there is no such boundary in overlapped-wavelet coding, such prediction is impossible. Newer intra prediction methods, such as markov-chain intra prediction, also seem to require an H.264-like situation with exactly-known neighboring pixels. Intra coding in wavelets is in the same state that DCT intra coding was in 20 years ago : the best known method was to simply transform the block with no prediction at all besides DC. NB : as described by Pengvado in the comments, the switching between inter and intra coding is potentially even more costly than the inefficient intra coding.
2. Mixing partition sizes has serious practical problems. Because the overlap between two motion partitions depends on the partitions’ size, mixing block sizes becomes quite difficult to define. While in H.264 an smaller partition always gives equal or better compression than a larger one when one ignores the extra overhead, it is actually possible for a larger partition to win when using OBMC due to the larger overlap. All of this makes both the problem of defining the result of mixed block sizes and making decisions about them very difficult.
Both Snow and Dirac offer variable block size, but the overlap amount is constant ; larger blocks serve only to save bits on motion vectors, not offer better overlap characteristics.
3. Lack of spatial adaptive quantization. As shown in x264 with VAQ, and correspondingly in HCEnc’s implementation and Theora’s recent implementation, spatial adaptive quantization has staggeringly impressive (before, after) effects on visual quality. Only Dirac seems to have such a feature, and the encoder doesn’t even use it. No other wavelet formats (Snow, JPEG2K, etc) seem to have such a feature. This results in serious blurring problems in areas with subtle texture (as in the comparison below).
4. Wavelets don’t seem to code visual energy effectively. Remember that a single coefficient in a DCT represents a pattern which applies across an entire block : this makes it very easy to create apparent “detail” with a DCT. Furthermore, the sharp edges of DCT blocks, despite being an apparent weakness, often result in a “fake sharpness” that can actually improve the visual appearance of videos, as was seen with Xvid. Thus wavelet codecs have a tendency to look much blurrier than DCT-based codecs, but since PSNR likes blur, this is often seen as a benefit during video compression research. Some of the consequences of these factors can be seen in this comparison ; somewhat outdated and not general-case, but which very effectively shows the difference in how wavelets handle sharp edges and subtle textures.
Another problem that periodically crops up is the visual aliasing that tends to be associated with wavelets at lower bitrates. Standard wavelets effectively consist of a recursive function that upscales the coefficients coded by the previous level by a factor of 2 and then adds a new set of coefficients. If the upscaling algorithm is naive — as it often is, for the sake of speed — the result can look quite ugly, as if parts of the image were coded at a lower resolution and then badly scaled up. Of course, it looks like that because they were coded at a lower resolution and then badly scaled up.
JPEG2000 is a classic example of wavelet failure : despite having more advanced entropy coding, being designed much later than JPEG, being much more computationally intensive, and having much better PSNR, comparisons have consistently shown it to be visually worse than JPEG at sane filesizes. Here’s an example from Wikipedia. By comparison, H.264′s intra coding, when used for still image compression, can beat JPEG by a factor of 2 or more (I’ll make a post on this later). With the various advancements in DCT intra coding since H.264, I suspect that a state-of-the-art DCT compressor could win by an even larger factor.
Despite the promised benefits of wavelets, a wavelet encoder even close to competitive with x264 has yet to be created. With some tests even showing Dirac losing to Theora in visual comparisons, it’s clear that many problems remain to be solved before wavelets can eliminate the ugliness of block-based transforms once and for all.