
Recherche avancée
Médias (91)
-
Spoon - Revenge !
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
My Morning Jacket - One Big Holiday
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Zap Mama - Wadidyusay ?
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
David Byrne - My Fair Lady
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Beastie Boys - Now Get Busy
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Granite de l’Aber Ildut
9 septembre 2011, par
Mis à jour : Septembre 2011
Langue : français
Type : Texte
Autres articles (51)
-
MediaSPIP v0.2
21 juin 2013, parMediaSPIP 0.2 est la première version de MediaSPIP stable.
Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...) -
MediaSPIP version 0.1 Beta
16 avril 2011, parMediaSPIP 0.1 beta est la première version de MediaSPIP décrétée comme "utilisable".
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Pour avoir une installation fonctionnelle, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...) -
Support audio et vidéo HTML5
10 avril 2011MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)
Sur d’autres sites (4877)
-
Using PyAV to encode mono audio to file, params match docs, but still causes Errno 22
20 février 2023, par andrew8088While trying to use PyAV to encode live mono audio from a microphone to a compressed audio stream (using mp2 or flac as encoder), the program kept raising an exception
ValueError: [Errno 22] Invalid argument
.

To remove the live microphone source as a cause of the problem, and to make the problematic code easier for others to run/test, I have removed the mic source and now just generate a pure tone as a sequence of input buffers.


All attempts to figure out the missing or mismatched or incorrect argument have just resulted in seeing documentation and examples that are the same as my code.


I would like to know from someone who has used PyAV successfully for mono audio what the correct method and parameters are for encoding mono frames into the mono stream.


The package used is av 10.0.0 installed with

pip3 install av --no-binary av

so it uses my package-manager provided ffmpeg library, which is version 4.2.7.

The problematic python code is :


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Recreating an error 22 when encoding sound with PyAV.

Created on Sun Feb 19 08:10:29 2023
@author: andrewm
"""
import typing
import sys
import math
import fractions

import av
from av import AudioFrame

""" Ensure some PyAudio constants are still defined without changing 
 the PyAudio recording callback function and without depending 
 on PyAudio simply for reproducing the PyAV bug [Errno 22] thrown in 
 File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push
"""
class PA_Stub():
 paContinue = True
 paComplete= False

pyaudio = PA_Stub()


"""Generate pure tone at given frequency with amplitude 0...1.0 at 
 sampling frewuency fs and beginning at phase offset 'phase'.
 Returns the new phase after the sinusoid has cycled over the 
 sampling window length.
"""
def generate_tone(
 freq:int, phase:float, amp:float, fs, samp_fmt, buffer:bytearray
) -> float:
 assert samp_fmt == "s16", "Only s16 supported atm"
 samp_size_bytes = 2
 n_samples = int(len(buffer)/samp_size_bytes)
 window = [int(0) for i in range(n_samples)]
 theta = phase
 phase_inc = 2*math.pi * freq / fs
 for i in range(n_samples):
 v = amp * math.sin(theta)
 theta += phase_inc
 s = int((2**15-1)*v)
 window[i] = s
 for sample_i in range(len(window)):
 byte_i = sample_i * samp_size_bytes
 enc = window[sample_i].to_bytes(
 2, byteorder=sys.byteorder, signed=True
 )
 buffer[byte_i] = enc[0]
 buffer[byte_i+1] = enc[1]
 return theta


channels = 1
fs = 44100 # Record at 44100 samples per second
fft_size_samps = 256
chunk_samps = fft_size_samps * 10 # Record in chunks that are multiples of fft windows.

# print(f"fft_size_samps={fft_size_samps}\nchunk_samps={chunk_samps}")

seconds = 3.0
out_filename = "testoutput.wav"

# Store data in chunks for 3 seconds
sample_limit = int(fs * seconds)
sample_len = 0
frames = [] # Initialize array to store frames

ffmpeg_codec_name = 'mp2' # flac, mp3, or libvorbis make same error.

sample_size_bytes = 2
buffer = bytearray(int(chunk_samps*sample_size_bytes))
chunkperiod = chunk_samps / fs
total_chunks = int(math.ceil(seconds / chunkperiod))
phase = 0.0

### uncomment if you want to see the synthetic data being used as a mic input.
# with open("test.raw","wb") as raw_out:
# for ci in range(total_chunks):
# phase = generate_tone(2600, phase, 0.8, fs, "s16", buffer)
# raw_out.write(buffer)
# print("finished gen test")
# sys.exit(0)
# #---- 

# Using mp2 or mkv as the container format gets the same error.
with av.open(out_filename+'.mp2', "w", format="mp2") as output_con:
 output_con.metadata["title"] = "My title"
 output_con.metadata["key"] = "value"
 channel_layout = "mono"
 sample_fmt = "s16p"

 ostream = output_con.add_stream(ffmpeg_codec_name, fs, layout=channel_layout)
 assert ostream is not None, "No stream!"
 cctx = ostream.codec_context
 cctx.sample_rate = fs
 cctx.time_base = fractions.Fraction(numerator=1,denominator=fs)
 cctx.format = sample_fmt
 cctx.channels = channels
 cctx.layout = channel_layout
 print(cctx, f"layout#{cctx.channel_layout}")
 
 # Define PyAudio-style callback for recording plus PyAV transcoding.
 def rec_callback(in_data, frame_count, time_info, status):
 global sample_len
 global ostream
 frames.append(in_data)
 nsamples = int(len(in_data) / (channels*sample_size_bytes))
 
 frame = AudioFrame(format=sample_fmt, layout=channel_layout, samples=nsamples)
 frame.sample_rate = fs
 frame.time_base = fractions.Fraction(numerator=1,denominator=fs)
 frame.pts = sample_len
 frame.planes[0].update(in_data)
 print(frame, len(in_data))
 
 for out_packet in ostream.encode(frame):
 output_con.mux(out_packet)
 for out_packet in ostream.encode(None):
 output_con.mux(out_packet)
 
 sample_len += nsamples
 retflag = pyaudio.paContinue if sample_lencode>


If you uncomment the RAW output part you will find the generated data can be imported as PCM s16 Mono 44100Hz into Audacity and plays the expected tone, so the generated audio data does not seem to be the problem.


The normal program console output up until the exception is :


mp2 at 0x7f8e38202cf0> layout#4
Beginning
 5120
. 5120



The stack trace is :


Traceback (most recent call last):

 File "Dev/multichan_recording/av_encode.py", line 147, in <module>
 ret_data, ret_flag = rec_callback(buffer, ci, {}, 1)

 File "Dev/multichan_recording/av_encode.py", line 121, in rec_callback
 for out_packet in ostream.encode(frame):

 File "av/stream.pyx", line 153, in av.stream.Stream.encode

 File "av/codec/context.pyx", line 484, in av.codec.context.CodecContext.encode

 File "av/audio/codeccontext.pyx", line 42, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode

 File "av/audio/resampler.pyx", line 101, in av.audio.resampler.AudioResampler.resample

 File "av/filter/graph.pyx", line 211, in av.filter.graph.Graph.push

 File "av/filter/context.pyx", line 89, in av.filter.context.FilterContext.push

 File "av/error.pyx", line 336, in av.error.err_check

ValueError: [Errno 22] Invalid argument

</module>


edit : It's interesting that the error happens on the 2nd AudioFrame, as apparently the first one was encoded okay, because they are given the same attribute values aside from the Presentation Time Stamp (pts), but leaving this out and letting PyAV/ffmpeg generate the PTS by itself does not fix the error, so an incorrect PTS does not seem the cause.


After a brief glance in
av/filter/context.pyx
the exception must come from a bad return value fromres = lib.av_buffersrc_write_frame(self.ptr, frame.ptr)

Trying to dig intoav_buffersrc_write_frame
from the ffmpeg source it is not clear what could be causing this error. The only obvious one is a mismatch between channel layouts, but my code is setting the layout the same in the Stream and the Frame. That problem had been found by an old question pyav - cannot save stream as mono and their answer (that one parameter required is undocumented) is the only reason the code now has the layout='mono' argument when making the stream.

The program output shows layout #4 is being used, and from https://github.com/FFmpeg/FFmpeg/blob/release/4.2/libavutil/channel_layout.h you can see this is the value for symbol AV_CH_FRONT_CENTER which is the only channel in the MONO layout.


The mismatch is surely some other object property or an undocumented parameter requirement.


How do you encode mono audio to a compressed stream with PyAV ?


-
Use FFMPEG as a demuxer-muxer when knowing all needed information about input stream
7 février 2023, par Javierd98I'm working on a project where I need to use FFMEPG only as a demuxer-muxer in order to receive a MPEG source with an audio and video streams and output the different streams separately. I also need it to introduce the smallest possible delay.
In order to do this, I have all the needed information about the input : used container, audio and video codecs, audio and video rate and audio channels, so I would like to provide FFMPEG with that information and avoid the proving phase. However, I can't get FFMPEG to work without specifying a big probesize, which introduces latency, and, depending on the input, causes FFMPEG to fail due to not being able to get the needed info (which I have beforehand).


I've been researching the different options that FFMPEG provides, and I've found the following ones which could help me :


- 

-codec:a
and-codec:v
to provide the information about the input's audio and video codecs.-f
to provide the information about the used container.-r:v
to provide the information about the video framerate. (I've also tried with-framerate
, but it seems that is only supported by a few demuxers, and mpeg is not one of them).-probesize
to avoid spending time probing the input, as I can provide the inpromation.-copy_unknown
to simply copy the streams instead of failing if it can't be recognized.












With all this, I'm using the following command :

./ffmpeg -hide_banner -loglevel debug -codec:v h264 -codec:a pcm_alaw -flags low_delay -probesize 32 -analyzeduration 0 -r:v 15 -f mpeg -i udp://127.0.0.1:41071?listen -y -map 0:a:0? -acodec copy -copy_unknown -ar 8000 -payload_type 8 -f rtp udp://127.0.0.1:33605?pkt_size=1200 -map 0:v:0? -vcodec copy -copy_unknown -payload_type 96 -f rtp udp://127.0.0.1:48527?pkt_size=1200


Splitting the commandline.
Reading option '-hide_banner' ... matched as option 'hide_banner' (do not show program banner) with argument '1'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-codec:v' ... matched as option 'codec' (codec name) with argument 'h264'.
Reading option '-codec:a' ... matched as option 'codec' (codec name) with argument 'pcm_alaw'.
Reading option '-flags' ... matched as AVOption 'flags' with argument 'low_delay'.
Reading option '-probesize' ... matched as AVOption 'probesize' with argument '32'.
Reading option '-analyzeduration' ... matched as AVOption 'analyzeduration' with argument '0'.
Reading option '-r:v' ... matched as option 'r' (set frame rate (Hz value, fraction or abbreviation)) with argument '15'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'mpeg'.
Reading option '-i' ... matched as input url with argument 'udp://127.0.0.1:41071?listen'.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:a:0?'.
Reading option '-acodec' ... matched as option 'acodec' (force audio codec ('copy' to copy stream)) with argument 'copy'.
Reading option '-copy_unknown' ... matched as option 'copy_unknown' (Copy unknown stream types) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '8000'.
Reading option '-payload_type' ... matched as AVOption 'payload_type' with argument '8'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'rtp'.
Reading option 'udp://127.0.0.1:33605?pkt_size=1200' ... matched as output url.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:v:0?'.
Reading option '-vcodec' ... matched as option 'vcodec' (force video codec ('copy' to copy stream)) with argument 'copy'.
Reading option '-copy_unknown' ... matched as option 'copy_unknown' (Copy unknown stream types) with argument '1'.
Reading option '-payload_type' ... matched as AVOption 'payload_type' with argument '96'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'rtp'.
Reading option 'udp://127.0.0.1:48527?pkt_size=1200' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option hide_banner (do not show program banner) with argument 1.
Applying option loglevel (set logging level) with argument debug.
Applying option y (overwrite output files) with argument 1.
Applying option copy_unknown (Copy unknown stream types) with argument 1.
 Last message repeated 1 times
Successfully parsed a group of options.
Parsing a group of options: input url udp://127.0.0.1:41071?listen.
Applying option codec:v (codec name) with argument h264.
Applying option codec:a (codec name) with argument pcm_alaw.
Applying option r:v (set frame rate (Hz value, fraction or abbreviation)) with argument 15.
Applying option f (force format) with argument mpeg.
Successfully parsed a group of options.
Opening an input file: udp://127.0.0.1:41071?listen.
[mpeg @ 0x55a769cc3d40] Opening 'udp://127.0.0.1:41071?listen' for reading
[udp @ 0x55a769cc4b00] No default whitelist set
[udp @ 0x55a769cc4b00] end receive buffer size reported is 425984
[mpeg @ 0x55a769cc3d40] Before avformat_find_stream_info() pos: 6 bytes read:40 seeks:0 nb_streams:0
[mpeg @ 0x55a769cc3d40] probing stream 1 pp:2500
[mpeg @ 0x55a769cc3d40] probed stream 1
[mpeg @ 0x55a769cc3d40] parser not found for codec pcm_alaw, packets or times may be invalid.
[mpeg @ 0x55a769cc3d40] Probe buffer size limit of 32 bytes reached
[mpeg @ 0x55a769cc3d40] Stream #0: not enough frames to estimate rate; consider increasing probesize
[mpeg @ 0x55a769cc3d40] Could not find codec parameters for stream 0 (Video: h264, 1 reference frame, none): unspecified size
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (32) options
[mpeg @ 0x55a769cc3d40] After avformat_find_stream_info() pos: 87804 bytes read:88476 seeks:0 frames:1
Input #0, mpeg, from 'udp://127.0.0.1:41071?listen':
 Duration: N/A, start: 50436.772978, bitrate: 64 kb/s
 Stream #0:0[0x1e0], 0, 1/90000: Video: h264, 1 reference frame, none, 90k tbr, 90k tbn
 Stream #0:1[0x1c0], 1, 1/90000: Audio: pcm_alaw, 8000 Hz, mono, s16, 64 kb/s
Successfully opened the file.
Parsing a group of options: output url udp://127.0.0.1:33605?pkt_size=1200.
Applying option map (set input stream mapping) with argument 0:a:0?.
Applying option acodec (force audio codec ('copy' to copy stream)) with argument copy.
Applying option ar (set audio sampling rate (in Hz)) with argument 8000.
Applying option f (force format) with argument rtp.
Successfully parsed a group of options.
Opening an output file: udp://127.0.0.1:33605?pkt_size=1200.
[udp @ 0x55a769cf6500] No default whitelist set
Successfully opened the file.
Parsing a group of options: output url udp://127.0.0.1:48527?pkt_size=1200.
Applying option map (set input stream mapping) with argument 0:v:0?.
Applying option vcodec (force video codec ('copy' to copy stream)) with argument copy.
Applying option f (force format) with argument rtp.
Successfully parsed a group of options.
Opening an output file: udp://127.0.0.1:48527?pkt_size=1200.
[udp @ 0x55a769d08fc0] No default whitelist set
Successfully opened the file.
Output #0, rtp, to 'udp://127.0.0.1:33605?pkt_size=1200':
 Metadata:
 encoder : Lavf59.16.100
 Stream #0:0, 0, 1/8000: Audio: pcm_alaw, 8000 Hz, mono, s16, 64 kb/s
[rtp @ 0x55a769d06f40] dimensions not set
Could not write header for output file #1 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 1:0 -- 
Stream mapping:
 Stream #0:1 -> #0:0 (copy)
 Stream #0:0 -> #1:0 (copy)
 Last message repeated 1 times
[AVIOContext @ 0x55a769d06d40] Statistics: 0 bytes written, 0 seeks, 0 writeouts
[AVIOContext @ 0x55a769d19800] Statistics: 0 bytes written, 0 seeks, 0 writeouts
[AVIOContext @ 0x55a769cd4fc0] Statistics: 88476 bytes read, 0 seeks



I'm using FFMPEG 5.0.2, which is a fairly updated version.


As you can see, the
-copy_unknown
option is having no effect, as FFMPEG still crashes when trying to recognize the input (I've also tried specifying it on the input, but the result is the same). From the logs, I think the main problem is that FFMPEG is not interpreting the framerate correctly, which prevents it from demuxing the video. I've been searching for another way to specify it, but couldn't find it.

Is this a bug ? Or maybe 'm missing something ? Does somebody know how could I achieve my goal ?


-
Developing MobyCAIRO
26 mai 2021, par Multimedia Mike — GeneralI recently published a tool called MobyCAIRO. The ‘CAIRO’ part stands for Computer-Assisted Image ROtation, while the ‘Moby’ prefix refers to its role in helping process artifact image scans to submit to the MobyGames database. The tool is meant to provide an accelerated workflow for rotating and cropping image scans. It works on both Windows and Linux. Hopefully, it can solve similar workflow problems for other people.
As of this writing, MobyCAIRO has not been tested on Mac OS X yet– I expect some issues there that should be easily solvable if someone cares to test it.
The rest of this post describes my motivations and how I arrived at the solution.
Background
I have scanned well in excess of 2100 images for MobyGames and other purposes in the past 16 years or so. The workflow looks like this :
Image workflow
It should be noted that my original workflow featured me manually rotating the artifact on the scanner bed in order to ensure straightness, because I guess I thought that rotate functions in image editing programs constituted dark, unholy magic or something. So my workflow used to be even more arduous :
I can’t believe I had the patience to do this for hundreds of scans
Sometime last year, I was sitting down to perform some more scanning and found myself dreading the oncoming tedium of straightening and cropping the images. This prompted a pivotal question :
Why can’t a computer do this for me ?
After all, I have always been a huge proponent of making computers handle the most tedious, repetitive, mind-numbing, and error-prone tasks. So I did some web searching to find if there were any solutions that dealt with this. I also consulted with some like-minded folks who have to cope with the same tedious workflow.
I came up empty-handed. So I endeavored to develop my own solution.
Problem Statement and Prior Work
I want to develop a workflow that can automatically rotate an image so that it is straight, and also find the most likely crop rectangle, uniformly whitening the area outside of the crop area (in the case of circles).As mentioned, I checked to see if any other programs can handle this, starting with my usual workhorse, Photoshop Elements. But I can’t expect the trimmed down version to do everything. I tried to find out if its big brother could handle the task, but couldn’t find a definitive answer on that. Nor could I find any other tools that seem to take an interest in optimizing this particular workflow.
When I brought this up to some peers, I received some suggestions, including an idea that the venerable GIMP had a feature like this, but I could not find any evidence. Further, I would get responses of “Program XYZ can do image rotation and cropping.” I had to tamp down on the snark to avoid saying “Wow ! An image editor that can perform rotation AND cropping ? What a game-changer !” Rotation and cropping features are table stakes for any halfway competent image editor for the last 25 or so years at least. I am hoping to find or create a program which can lend a bit of programmatic assistance to the task.
Why can’t other programs handle this ? The answer seems fairly obvious : Image editing tools are general tools and I want a highly customized workflow. It’s not reasonable to expect a turnkey solution to do this.
Brainstorming An Approach
I started with the happiest of happy cases— A disc that needed archiving (a marketing/press assets CD-ROM from a video game company, contents described here) which appeared to have some pretty clear straight lines :
My idea was to try to find straight lines in the image and then rotate the image so that the image is parallel to the horizontal based on the longest single straight line detected.
I just needed to figure out how to find a straight line inside of an image. Fortunately, I quickly learned that this is very much a solved problem thanks to something called the Hough transform. As a bonus, I read that this is also the tool I would want to use for finding circles, when I got to that part. The nice thing about knowing the formal algorithm to use is being able to find efficient, optimized libraries which already implement it.
Early Prototype
A little searching for how to perform a Hough transform in Python led me first to scikit. I was able to rapidly produce a prototype that did some basic image processing. However, running the Hough transform directly on the image and rotating according to the longest line segment discovered turned out not to yield expected results.
It also took a very long time to chew on the 3300×3300 raw image– certainly longer than I care to wait for an accelerated workflow concept. The key, however, is that you are apparently not supposed to run the Hough transform on a raw image– you need to compute the edges first, and then attempt to determine which edges are ‘straight’. The recommended algorithm for this step is the Canny edge detector. After applying this, I get the expected rotation :
The algorithm also completes in a few seconds. So this is a good early result and I was feeling pretty confident. But, again– happiest of happy cases. I should also mention at this point that I had originally envisioned a tool that I would simply run against a scanned image and it would automatically/magically make the image straight, followed by a perfect crop.
Along came my MobyGames comrade Foxhack to disabuse me of the hope of ever developing a fully automated tool. Just try and find a usefully long straight line in this :
Darn it, Foxhack…
There are straight edges, to be sure. But my initial brainstorm of rotating according to the longest straight edge looks infeasible. Further, it’s at this point that we start brainstorming that perhaps we could match on ratings badges such as the standard ESRB badges omnipresent on U.S. video games. This gets into feature detection and complicates things.
This Needs To Be Interactive
At this point in the effort, I came to terms with the fact that the solution will need to have some element of interactivity. I will also need to get out of my safe Linux haven and figure out how to develop this on a Windows desktop, something I am not experienced with.I initially dreamed up an impressive beast of a program written in C++ that leverages Windows desktop GUI frameworks, OpenGL for display and real-time rotation, GPU acceleration for image analysis and processing tricks, and some novel input concepts. I thought GPU acceleration would be crucial since I have a fairly good GPU on my main Windows desktop and I hear that these things are pretty good at image processing.
I created a list of prototyping tasks on a Trello board and made a decent amount of headway on prototyping all the various pieces that I would need to tie together in order to make this a reality. But it was ultimately slowgoing when you can only grab an hour or 2 here and there to try to get anything done.
Settling On A Solution
Recently, I was determined to get a set of old shareware discs archived. I ripped the data a year ago but I was blocked on the scanning task because I knew that would also involve tedious straightening and cropping. So I finally got all the scans done, which was reasonably quick. But I was determined to not manually post-process them.This was fairly recent, but I can’t quite recall how I managed to come across the OpenCV library and its Python bindings. OpenCV is an amazing library that provides a significant toolbox for performing image processing tasks. Not only that, it provides “just enough” UI primitives to be able to quickly create a basic GUI for your program, including image display via multiple windows, buttons, and keyboard/mouse input. Furthermore, OpenCV seems to be plenty fast enough to do everything I need in real time, just with (accelerated where appropriate) CPU processing.
So I went to work porting the ideas from the simple standalone Python/scikit tool. I thought of a refinement to the straight line detector– instead of just finding the longest straight edge, it creates a histogram of 360 rotation angles, and builds a list of lines corresponding to each angle. Then it sorts the angles by cumulative line length and allows the user to iterate through this list, which will hopefully provide the most likely straightened angle up front. Further, the tool allows making fine adjustments by 1/10 of an angle via the keyboard, not the mouse. It does all this while highlighting in red the straight line segments that are parallel to the horizontal axis, per the current candidate angle.
The tool draws a light-colored grid over the frame to aid the user in visually verifying the straightness of the image. Further, the program has a mode that allows the user to see the algorithm’s detected edges :
For the cropping phase, the program uses the Hough circle transform in a similar manner, finding the most likely circles (if the image to be processed is supposed to be a circle) and allowing the user to cycle among them while making precise adjustments via the keyboard, again, rather than the mouse.
Running the Hough circle transform is a significantly more intensive operation than the line transform. When I ran it on a full 3300×3300 image, it ran for a long time. I didn’t let it run longer than a minute before forcibly ending the program. Is this approach unworkable ? Not quite– It turns out that the transform is just as effective when shrinking the image to 400×400, and completes in under 2 seconds on my Core i5 CPU.
For rectangular cropping, I just settled on using OpenCV’s built-in region-of-interest (ROI) facility. I tried to intelligently find the best candidate rectangle and allow fine adjustments via the keyboard, but I wasn’t having much success, so I took a path of lesser resistance.
Packaging and Residual Weirdness
I realized that this tool would be more useful to a broader Windows-using base of digital preservationists if they didn’t have to install Python, establish a virtual environment, and install the prerequisite dependencies. Thus, I made the effort to figure out how to wrap the entire thing up into a monolithic Windows EXE binary. It is available from the project’s Github release page (another thing I figured out for the sake of this project !).The binary is pretty heavy, weighing in at a bit over 50 megabytes. You might advise using compression– it IS compressed ! Before I figured out the
--onefile
command for pyinstaller.exe, the generated dist/ subdirectory was 150 MB. Among other things, there’s a 30 MB FORTRAN BLAS library packaged in !Conclusion and Future Directions
Once I got it all working with a simple tkinter UI up front in order to select between circle and rectangle crop modes, I unleashed the tool on 60 or so scans in bulk, using the Windows forfiles command (another learning experience). I didn’t put a clock on the effort, but it felt faster. Of course, I was livid with proudness the whole time because I was using my own tool. I just wish I had thought of it sooner. But, really, with 2100+ scans under my belt, I’m just getting started– I literally have thousands more artifacts to scan for preservation.The tool isn’t perfect, of course. Just tonight, I threw another scan at MobyCAIRO. Just go ahead and try to find straight lines in this specimen :
I eventually had to use the text left and right of center to line up against the grid with the manual keyboard adjustments. Still, I’m impressed by how these computer vision algorithms can see patterns I can’t, highlighting lines I never would have guessed at.
I’m eager to play with OpenCV some more, particularly the video processing functions, perhaps even some GPU-accelerated versions.
The post Developing MobyCAIRO first appeared on Breaking Eggs And Making Omelettes.