Recherche avancée

Médias (1)

Mot : - Tags -/Rennes

Autres articles (51)

  • Mise à jour de la version 0.1 vers 0.2

    24 juin 2013, par

    Explications des différents changements notables lors du passage de la version 0.1 de MediaSPIP à la version 0.3. Quelles sont les nouveautés
    Au niveau des dépendances logicielles Utilisation des dernières versions de FFMpeg (>= v1.2.1) ; Installation des dépendances pour Smush ; Installation de MediaInfo et FFprobe pour la récupération des métadonnées ; On n’utilise plus ffmpeg2theora ; On n’installe plus flvtool2 au profit de flvtool++ ; On n’installe plus ffmpeg-php qui n’est plus maintenu au (...)

  • Personnaliser en ajoutant son logo, sa bannière ou son image de fond

    5 septembre 2013, par

    Certains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;

  • Ecrire une actualité

    21 juin 2013, par

    Présentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
    Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
    Vous pouvez personnaliser le formulaire de création d’une actualité.
    Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...)

Sur d’autres sites (6956)

  • Dealing with long conversion times on nginx, ffmpeg and Ruby on Rails

    19 avril 2013, par Graeme

    I have developed a Ruby on Rails-based app which allows users to upload videos to one of our local servers (Ubunto 10.04 LTS). The server uses nginx.

    Through the paperclip-ffmpeg gem, videos are converted to mp4 format using the ffmpeg library.

    Everything appears to be working fine in production, except Rails' own 500 page (not the customised version I have provided - but that's a different issue) is displayed whenever certain videos are uploaded. Otherwise, videos are being converted as expected.

    Having done a bit of investigation, I think the default 500 page is being displayed because a 502 error has occurred. I think what is happening, having uploaded the videos locally, is that some videos are taking an extensive amount of time to convert, and that an interruption is occurring on the server (I'm not a server expert by any means).

    Using the excellent Railscasts episode on deployment, I use Capistrano to deploy the app. Here's the unicorn.rb file :

    root = "XXXXXXX"
    working_directory root
    pid "#{root}/tmp/pids/unicorn.pid"
    stderr_path "#{root}/log/unicorn.log"
    stdout_path "#{root}/log/unicorn.log"

    listen "/tmp/unicorn.XXXXXXXXX.sock"
    worker_processes 2
    timeout 200

    And here's the nginx.conf file. Note that client_max_body_size has been set to a fairly hefty 4Gb ! :

    upstream unicorn {
     server unix:/tmp/unicorn.XXXXXXXXX.sock fail_timeout=0;
    }

    server {
     listen 80 default deferred;
     root XXXXXXXXX;


     location ^~ /assets/ {
       gzip_static on;
       expires max;
       add_header Cache-Control public;
     }

     try_files $uri/index.html $uri @unicorn;
     location @unicorn {
       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
       proxy_set_header Host $http_host;
       proxy_read_timeout 600;
       proxy_redirect off;
       proxy_pass http://unicorn;
     }

     error_page 500 502 503 504 /500.html;
     client_max_body_size 4G;
     keepalive_timeout 10;

    }

    So, my question is...how could I edit (either of) the above two files to deal with the extensive time that certain videos take to convert through ffmpeg - possibly up to an hour, 2 hours or even more ?

    Should I extend timeout in the former and/or keepalive_timeout in the latter - or is there a more efficient way (given that I've no idea how long certain videos will take to convert) ?

    Or, is there possibly a more significant issue I should consider - e.g. the amount of memory in the server ?

    I'm not an nginx/server expert, so any advice would be useful (particularly where to put extra lines of code) - however, as the rest of the app just "works", I'm not keen to make a huge amount of changes !

  • Method For Crawling Google

    28 mai 2011, par Multimedia Mike — Big Data

    I wanted to crawl Google in order to harvest a large corpus of certain types of data as yielded by a certain search term (we’ll call it “term” for this exercise). Google doesn’t appear to offer any API to automatically harvest their search results (why would they ?). So I sat down and thought about how to do it. This is the solution I came up with.



    FAQ
    Q : Is this legal / ethical / compliant with Google’s terms of service ?
    A : Does it look like I care ? Moving right along…

    Manual Crawling Process
    For this exercise, I essentially automated the task that would be performed by a human. It goes something like this :

    1. Search for “term”
    2. On the first page of results, download each of the 10 results returned
    3. Click on the next page of results
    4. Go to step 2, until Google doesn’t return anymore pages of search results

    Google returns up to 1000 results for a given search term. Fetching them 10 at a time is less than efficient. Fortunately, the search URL can easily be tweaked to return up to 100 results per page.

    Expanding Reach
    Problem : 1000 results for the “term” search isn’t that many. I need a way to expand the search. I’m not aiming for relevancy ; I’m just searching for random examples of some data that occurs around the internet.

    My solution for this is to refine the search using the “site” wildcard. For example, you can ask Google to search for “term” at all Canadian domains using “site :.ca”. So, the manual process now involves harvesting up to 1000 results for every single internet top level domain (TLD). But many TLDs can be more granular than that. For example, there are 50 sub-domains under .us, one for each state (e.g., .ca.us, .ny.us). Those all need to be searched independently. Same for all the sub-domains under TLDs which don’t allow domains under the main TLD, such as .uk (search under .co.uk, .ac.uk, etc.).

    Another extension is to combine “term” searches with other terms that are likely to have a rich correlation with “term”. For example, if “term” is relevant to various scientific fields, search for “term” in conjunction with various scientific disciplines.

    Algorithmically
    My solution is to create an SQLite database that contains a table of search seeds. Each seed is essentially a “site :” string combined with a starting index.

    Each TLD and sub-TLD is inserted as a searchseed record with a starting index of 0.

    A script performs the following crawling algorithm :

    • Fetch the next record from the searchseed table which has not been crawled
    • Fetch search result page from Google
    • Scrape URLs from page and insert each into URL table
    • Mark the searchseed record as having been crawled
    • If the results page indicates there are more results for this search, insert a new searchseed for the same seed but with a starting index 100 higher

    Digging Into Sites
    Sometimes, Google notes that certain sites are particularly rich sources of “term” and offers to let you search that site for “term”. This basically links to another search for ‘term site:somesite”. That site gets its own search seed and the program might harvest up to 1000 URLs from that site alone.

    Harvesting the Data
    Armed with a database of URLs, employ the following algorithm :

    • Fetch a random URL from the database which has yet to be downloaded
    • Try to download it
    • For goodness sake, have a mechanism in place to detect whether the download process has stalled and automatically kill it after a certain period of time
    • Store the data and update the database, noting where the information was stored and that it is already downloaded

    This step is easy to parallelize by simply executing multiple copies of the script. It is useful to update the URL table to indicate that one process is already trying to download a URL so multiple processes don’t duplicate work.

    Acting Human
    A few factors here :

    • Google allegedly doesn’t like automated programs crawling its search results. Thus, at the very least, don’t let your script advertise itself as an automated program. At a basic level, this means forging the User-Agent : HTTP header. By default, Python’s urllib2 will identify itself as a programming language. Change this to a well-known browser string.
    • Be patient ; don’t fire off these search requests as quickly as possible. My crawling algorithm inserts a random delay of a few seconds in between each request. This can still yield hundreds of useful URLs per minute.
    • On harvesting the data : Even though you can parallelize this and download data as quickly as your connection can handle, it’s a good idea to randomize the URLs. If you hypothetically had 4 download processes running at once and they got to a point in the URL table which had many URLs from a single site, the server might be configured to reject too many simultaneous requests from a single client.

    Conclusion
    Anyway, that’s just the way I would (and did) do it. What did I do with all the data ? That’s a subject for a different post.

    Adorable spider drawing from here.

  • ffmpeg Python Subprocess Error returned non-zero exit status 1

    21 avril 2019, par Varda Elbereth

    So I have a line here that is meant to dump frames from a movie via python and ffmpeg.

    subprocess.check_output([ffmpeg, "-i", self.moviefile, "-ss 00:01:00.000 -t 00:00:05 -vf scale=" + str(resolution) + ":-1 -r", str(framerate), "-qscale:v 6", self.processpath + "/" + self.filetitles + "-output%03d.jpg"])

    And currently it’s giving me the error :

    'CalledProcessError: Command ... returned non-zero exit status 1'

    The command python SAYS it’s running is :

    '['/var/lib/openshift/id/app-root/data/programs/ffmpeg/ffmpeg', '-i', u'/var/lib/openshift/id/app-root/data/moviefiles/moviename/moviename.mp4', '-ss 00:01:00.000 -t 00:00:05 -vf scale=320:-1 -r', '10', '-qscale:v 6', '/var/lib/openshift/id/app-root/data/process/moviename/moviename-output%03d.jpg']'

    But when I run the following command via ssh...

    '/var/lib/openshift/id/app-root/data/programs/ffmpeg/ffmpeg' -i '/var/lib/openshift/id/app-root/data/moviefiles/moviename/moviename.mp4' -ss 00:01:00.000 -t 00:00:05 -vf scale=320:-1 -r 10 -qscale:v 6 '/var/lib/openshift/id/app-root/data/process/moviename/moviename-output%03d.jpg'

    It works just fine. What am I doing wrong ? I think I’m misunderstanding the way subprocess field parsing works...