
Recherche avancée
Médias (91)
-
Spitfire Parade - Crisis
15 mai 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Wired NextMusic
14 mai 2011, par
Mis à jour : Février 2012
Langue : English
Type : Video
-
Video d’abeille en portrait
14 mai 2011, par
Mis à jour : Février 2012
Langue : français
Type : Video
-
Sintel MP4 Surround 5.1 Full
13 mai 2011, par
Mis à jour : Février 2012
Langue : English
Type : Video
-
Carte de Schillerkiez
13 mai 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Texte
-
Publier une image simplement
13 avril 2011, par ,
Mis à jour : Février 2012
Langue : français
Type : Video
Autres articles (2)
-
MediaSPIP 0.1 Beta version
25 avril 2011, parMediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
The zip file provided here only contains the sources of MediaSPIP in its standalone version.
To get a working installation, you must manually install all-software dependencies on the server.
If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...) -
MediaSPIP v0.2
21 juin 2013, parMediaSPIP 0.2 is the first MediaSPIP stable release.
Its official release date is June 21, 2013 and is announced here.
The zip file provided here only contains the sources of MediaSPIP in its standalone version.
To get a working installation, you must manually install all-software dependencies on the server.
If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...)
Sur d’autres sites (1499)
-
ffmpeg smooth video upscaling
21 janvier 2013, par bbbonthemoonI've got some old videos in 320x240 that I want to upscale to 640x480
DISCLAIMER : I know upscaling always ruins quality.
My problem is that, "regular" upscaling options in ffmpeg(-s or scale filter) always give much worse results comparing to manual resizing in any media player when watching video(Resize 2:1 option in Ubuntu's Movie Player for example). So, it's clear that there are better algorithms, which are used by the player to upscale the video. I just dont know how ta make ffmpeg use them, I'm newbie video converter :-) I need your help with converting options for ffmpeg for smooth as possible upscale.
Thanks in advance ! -
Method For Crawling Google
28 mai 2011, par Multimedia Mike — Big DataI wanted to crawl Google in order to harvest a large corpus of certain types of data as yielded by a certain search term (we’ll call it “term” for this exercise). Google doesn’t appear to offer any API to automatically harvest their search results (why would they ?). So I sat down and thought about how to do it. This is the solution I came up with.
FAQ
Q : Is this legal / ethical / compliant with Google’s terms of service ?
A : Does it look like I care ? Moving right along…Manual Crawling Process
For this exercise, I essentially automated the task that would be performed by a human. It goes something like this :- Search for “term”
- On the first page of results, download each of the 10 results returned
- Click on the next page of results
- Go to step 2, until Google doesn’t return anymore pages of search results
Google returns up to 1000 results for a given search term. Fetching them 10 at a time is less than efficient. Fortunately, the search URL can easily be tweaked to return up to 100 results per page.
Expanding Reach
Problem : 1000 results for the “term” search isn’t that many. I need a way to expand the search. I’m not aiming for relevancy ; I’m just searching for random examples of some data that occurs around the internet.My solution for this is to refine the search using the “site” wildcard. For example, you can ask Google to search for “term” at all Canadian domains using “site :.ca”. So, the manual process now involves harvesting up to 1000 results for every single internet top level domain (TLD). But many TLDs can be more granular than that. For example, there are 50 sub-domains under .us, one for each state (e.g., .ca.us, .ny.us). Those all need to be searched independently. Same for all the sub-domains under TLDs which don’t allow domains under the main TLD, such as .uk (search under .co.uk, .ac.uk, etc.).
Another extension is to combine “term” searches with other terms that are likely to have a rich correlation with “term”. For example, if “term” is relevant to various scientific fields, search for “term” in conjunction with various scientific disciplines.
Algorithmically
My solution is to create an SQLite database that contains a table of search seeds. Each seed is essentially a “site :” string combined with a starting index.Each TLD and sub-TLD is inserted as a searchseed record with a starting index of 0.
A script performs the following crawling algorithm :
- Fetch the next record from the searchseed table which has not been crawled
- Fetch search result page from Google
- Scrape URLs from page and insert each into URL table
- Mark the searchseed record as having been crawled
- If the results page indicates there are more results for this search, insert a new searchseed for the same seed but with a starting index 100 higher
Digging Into Sites
Sometimes, Google notes that certain sites are particularly rich sources of “term” and offers to let you search that site for “term”. This basically links to another search for ‘term site:somesite”. That site gets its own search seed and the program might harvest up to 1000 URLs from that site alone.Harvesting the Data
Armed with a database of URLs, employ the following algorithm :- Fetch a random URL from the database which has yet to be downloaded
- Try to download it
- For goodness sake, have a mechanism in place to detect whether the download process has stalled and automatically kill it after a certain period of time
- Store the data and update the database, noting where the information was stored and that it is already downloaded
This step is easy to parallelize by simply executing multiple copies of the script. It is useful to update the URL table to indicate that one process is already trying to download a URL so multiple processes don’t duplicate work.
Acting Human
A few factors here :- Google allegedly doesn’t like automated programs crawling its search results. Thus, at the very least, don’t let your script advertise itself as an automated program. At a basic level, this means forging the User-Agent : HTTP header. By default, Python’s urllib2 will identify itself as a programming language. Change this to a well-known browser string.
- Be patient ; don’t fire off these search requests as quickly as possible. My crawling algorithm inserts a random delay of a few seconds in between each request. This can still yield hundreds of useful URLs per minute.
- On harvesting the data : Even though you can parallelize this and download data as quickly as your connection can handle, it’s a good idea to randomize the URLs. If you hypothetically had 4 download processes running at once and they got to a point in the URL table which had many URLs from a single site, the server might be configured to reject too many simultaneous requests from a single client.
Conclusion
Anyway, that’s just the way I would (and did) do it. What did I do with all the data ? That’s a subject for a different post. -
Further Dreamcast Hacking
3 février 2011, par Multimedia Mike — Sega DreamcastI’m still haunted by Sega Dreamcast programming, specifically the fact that I used to be able to execute custom programs on the thing (roughly 8-10 years ago) and now I cannot. I’m going to compose a post to describe my current adventures on this front. There are 3 approaches I have been using : Raw, Kallistios, and the almighty Linux.
Raw
What I refer to as "raw" is an assortment of programs that lived in a small number of source files (sometimes just one ASM file) and could be compiled with the most basic SH-4 toolchain. The advantage here is that there aren’t many moving parts and not many things that can possibly go wrong, so it provides a good functional baseline.One of the original Dreamcast hackers was Marcus Comstedt, who still has his original DC material hosted at the reasonably easy-to-remember URL mc.pp.se/dc. I can get some of these simple demos to work, but not others.
I also successfully assembled and ran a pair of 256-byte (!!) demos from this old DC scene page.
KallistiOS
KallistiOS (or just KOS) was a real-time OS developed for the DC and was popular among the DC homebrew community. All the programming I did back in the day was based around KOS. Now I can’t get any of it to work. More specifically, KOS can’t seem to make it past a certain point in its system initialization.The Linux Option
I was never that excited about running Linux on my Dreamcast. For some hackers, running Linux on a given piece of consumer electronics is the highest attainable goal. Back in the day, I looked at it from a much more pragmatic perspective— I didn’t see much use in running Linux on the DC, not as much as running KOS which was developed to be a much more appropriate fit.However, I was able to burn a CD-R of an old binary image of Linux 2.4.5 compiled for the Dreamcast and boot it some months ago. So I at least have a feeling that this should work. I have never cross-compiled a kernel of my own (though I have compiled many, many x86 kernels in my time, so I’m not a total n00b in this regard). I figured this might be a good time to start.
The first item that worries me is getting a functional cross-compiling toolchain. Fortunately, a little digging in the Linux kernel documentation pointed me in the direction of a bunch of ready-made toolchains hosted at kernel.org. So I grabbed one of the SH toolchains (gcc-4.3.3-nolibc) and got rolling.
I’m well familiar with the cycle of
'make menuconfig'
in order to pick configuration options, and then'make'
to build a kernel (or usually'make zImage'
or'make bzImage'
to create compressed images). For cross compiling, the primary difference seems to be editing the root Makefile in the Linux source code tree (I’m using 2.6.37, the latest stable as of this writing) and setting a value for the CROSS_COMPILE variable. Then, run'make menuconfig'
followed by'make'
as normal.The Linux 2.6 series is supposed to support a range of Renesas (formerly Hitachi) SH processors and board configurations. This includes reasonable defaults for the Sega Dreamcast hardware. I got it all compiling except for a series of .S files. Linus Torvalds once helped me debug a program I work on so I thought I’d see if there was something I could help debug here.
The first issue was with ASM statements of a form similar to :
mov #0xffffffe0, r1
Now, the DC’s SH-4 is a RISC CPU. A lot of RISC architectures adopt a fixed instruction size of 32 bits. You can’t encode an entire 32-bit immediate value inside of a 32-bit instruction (there would be no room for the instruction encoding). Further, the SH series encoded instructions with a mere 16 bits. The move immediate data instruction only allows for an 8-bit, sign-extended value.
I decided that the above statement is equivalent to :
mov #-32, r1
I’ll give this statement the benefit of the doubt that it used to work with the gcc toolchain somewhere along the line. I assume that the assembler is supposed to know enough to substitute the first form with the second.
The next problem is that an ’sti’ instruction shows up in a number of spots. Using Intel x86 conventions, this is a "set interrupt flag" instruction (I remember that the 6502 CPU had the same instruction mnemonic, though its interrupt flag’s operation was opposite that of the x86). The SH-4 reference manual lists no ’sti’ instruction. When it gets to these lines, the assembler complains about immediate move instructions with too large data, like the instructions above. I’m guessing they must be macro’d to something else but I failed to find where. I commented out those lines for the time being. Probably not that smart, but I want to keep this moving for now.
So I got the code to compile into a kernel file called ’vmlinux’. I’ve seen this file many times before but never thought about how to get it to run directly. The process has usually been to compress it and send it over to lilo or grub for loading, as that is the job of the bootloader. I have never even wondered what format the vmlinux file takes until now. It seems that ’vmlinux’ is just a plain old ELF file :
$ file vmlinux vmlinux : ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), statically linked, not stripped
The ’dc-tool’ program that uploads executables to the waiting bootloader on the Dreamcast is perfectly cool accepting ELF files (and S-record files, and raw binary files). After a very lengthy upload process, execution fails (resets the system).
For the sake of comparison, I dusted off that Linux 2.4.5 bootable Dreamcast CD-ROM and directly uploaded the vmlinux file from that disc. That works just fine (until it’s time to go to the next loading phase, i.e., finding a filesystem). Possible issues here could include the commented ’sti’ instructions (could be that they aren’t just decoration). I’m also trying to understand the memory organization— perhaps the bootloader wants the ELF to be based at a different address. Or maybe the kernel and the bootloader don’t like each other in the first place— in this case, I need to study the bootable Linux CD-ROM to see how it’s done.
Optimism
Even though I’m meeting with rather marginal success, this is tremendously educational. I greatly enjoy these exercises if only for the deeper understanding they bring for the lowest-level system details.