
Recherche avancée
Médias (91)
-
Corona Radiata
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Lights in the Sky
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Head Down
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Echoplex
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Discipline
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Letting You
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
Autres articles (71)
-
La sauvegarde automatique de canaux SPIP
1er avril 2010, parDans le cadre de la mise en place d’une plateforme ouverte, il est important pour les hébergeurs de pouvoir disposer de sauvegardes assez régulières pour parer à tout problème éventuel.
Pour réaliser cette tâche on se base sur deux plugins SPIP : Saveauto qui permet une sauvegarde régulière de la base de donnée sous la forme d’un dump mysql (utilisable dans phpmyadmin) mes_fichiers_2 qui permet de réaliser une archive au format zip des données importantes du site (les documents, les éléments (...) -
MediaSPIP v0.2
21 juin 2013, parMediaSPIP 0.2 est la première version de MediaSPIP stable.
Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...) -
Mise à disposition des fichiers
14 avril 2011, parPar défaut, lors de son initialisation, MediaSPIP ne permet pas aux visiteurs de télécharger les fichiers qu’ils soient originaux ou le résultat de leur transformation ou encodage. Il permet uniquement de les visualiser.
Cependant, il est possible et facile d’autoriser les visiteurs à avoir accès à ces documents et ce sous différentes formes.
Tout cela se passe dans la page de configuration du squelette. Il vous faut aller dans l’espace d’administration du canal, et choisir dans la navigation (...)
Sur d’autres sites (9445)
-
facing problem in downloading the audio data of a trimmed youtube video using youtube-dl and ffmpeg libraries
15 mai 2022, par rkcI'm trying to download the audio content of a trimmed YouTube video using youtube-dl (v2021.12.17) and FFmpeg (v4.2.2) libraries in python, on Windows 10 PC. I am able to download the audio data when the command is executed on the Pycharm terminal, but not able to download the same content when using os.system() command.


For example, I'm able to download the data when I execute


"ffmpeg -ss 30 -t 10 -i $(youtube-dl -f best -g https://www.youtube.com/watch?v=3NGZcpAZcl0) -- output1.wav"



command on the Pycharm terminal.


But, i'm getting an error, when I try to execute the same in the Editor window as shown below,


os.system('ffmpeg -ss 30 -t 10 -i $(youtube-dl -f best -g https://www.youtube.com/watch?v=3NGZcpAZcl0) -- output1.wav')

error: $(youtube-dl: No such file or directory



I also tried enclosing the $() link in quotes and then executed it. But, after doing this, I am getting a different error as shown below


os.system('ffmpeg -ss 30 -t 10 -i "$(youtube-dl -f best -g https://www.youtube.com/watch?v=3NGZcpAZcl0)" -- output1.wav')

error: $(youtube-dl -f best -g https://www.youtube.com/watch?v=3NGZcpAZcl0): Invalid argument



So, can any of you help me in resolving this error ?


-
PARADISEC catalog for Users
3 mai, par silviaThis screencast shows how a user of the PARADISEC catalog logs in and explores the collections, items and files that the archive contains.
Category : 2
Uploaded by : Silvia Pfeiffer
Hosted : youtubeThe post PARADISEC catalog for Users first appeared on ginger’s thoughts.
-
Parallelize Youtube video frame download using yt-dlp and cv2
4 mars 2023, par zulle99My task is to download multiple sequences of successive low resolution frames of Youtube videos.


I summarize the main parts of the process :


- 

- Each bag of shots have a dimension of half a second (depending on the current fps)
- In order to grab useful frames I've decided to remove the initial and final 10% of each video since it is common to have an intro and outro. Moreover
- I've made an array of pair of initial and final frame to distribute the load on multiple processes using
ProcessPoolExecutor(max_workers=multiprocessing.cpu_count())
- In case of failure/exception I completly remove the relative directory










The point is that it do not scale up, since while running I noticesd that all CPUs had always a load lower that the 20% more or less. In addition since with these shots I have to run multiple CNNs, to prevent overfitting it is suggested to have a big dataset and not a bounch of shots.


Here it is the code :


import yt_dlp
import os
from tqdm import tqdm
import cv2
import shutil
import time
import random
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import pandas as pd
import numpy as np
from pathlib import Path
import zipfile


# PARAMETERS
percentage_train_test = 50
percentage_bag_shots = 20
percentage_to_ignore = 10

zip_f_name = f'VideoClassificationDataset_{percentage_train_test}_{percentage_bag_shots}_{percentage_to_ignore}'
dataset_path = Path('/content/VideoClassificationDataset')

# DOWNOAD ZIP FILES
!wget --no-verbose https://github.com/gtoderici/sports-1m-dataset/archive/refs/heads/master.zip

# EXTRACT AND DELETE THEM
!unzip -qq -o '/content/master.zip' 
!rm '/content/master.zip'

DATA = {'train_partition.txt': {},
 'test_partition.txt': {}}

LABELS = []

train_dict = {}
test_dict = {}

path = '/content/sports-1m-dataset-master/original'

for f in os.listdir(path):
 with open(path + '/' + f) as f_txt:
 lines = f_txt.readlines()
 for line in lines:
 splitted_line = line.split(' ')
 label_indices = splitted_line[1].rstrip('\n').split(',') 
 DATA[f][splitted_line[0]] = list(map(int, label_indices))

with open('/content/sports-1m-dataset-master/labels.txt') as f_labels:
 LABELS = f_labels.read().splitlines()


TRAIN = DATA['train_partition.txt']
TEST = DATA['test_partition.txt']
print('Original Train Test length: ', len(TRAIN), len(TEST))

# sample a subset percentage_train_test
TRAIN = dict(random.sample(TRAIN.items(), (len(TRAIN)*percentage_train_test)//100))
TEST = dict(random.sample(TEST.items(), (len(TEST)*percentage_train_test)//100))

print(f'Sampled {percentage_train_test} Percentage Train Test length: ', len(TRAIN), len(TEST))


if not os.path.exists(dataset_path): os.makedirs(dataset_path)
if not os.path.exists(f'{dataset_path}/train'): os.makedirs(f'{dataset_path}/train')
if not os.path.exists(f'{dataset_path}/test'): os.makedirs(f'{dataset_path}/test')



Function to extract a sequence of continuous frames :


def extract_frames(directory, url, idx_bag, start_frame, end_frame):
 capture = cv2.VideoCapture(url)
 count = start_frame

 capture.set(cv2.CAP_PROP_POS_FRAMES, count)
 os.makedirs(f'{directory}/bag_of_shots{str(idx_bag)}')

 while count < end_frame:

 ret, frame = capture.read()

 if not ret: 
 shutil.rmtree(f'{directory}/bag_of_shots{str(idx_bag)}')
 return False

 filename = f'{directory}/bag_of_shots{str(idx_bag)}/shot{str(count - start_frame)}.png'

 cv2.imwrite(filename, frame)
 count += 1

 capture.release()
 return True



Function to spread the load along multiple processors :


def video_to_frames(video_url, labels_list, directory, dic, percentage_of_bags):
 url_id = video_url.split('=')[1]
 path_until_url_id = f'{dataset_path}/{directory}/{url_id}'
 try: 

 ydl_opts = {
 'ignoreerrors': True,
 'quiet': True,
 'nowarnings': True,
 'simulate': True,
 'ignorenoformatserror': True,
 'verbose':False,
 'cookies': '/content/all_cookies.txt',
 #https://stackoverflow.com/questions/63329412/how-can-i-solve-this-youtube-dl-429
 }
 ydl = yt_dlp.YoutubeDL(ydl_opts)
 info_dict = ydl.extract_info(video_url, download=False)

 if(info_dict is not None and info_dict['fps'] >= 20):
 # I must have a least 20 frames per seconds since I take half of second bag of shots for every video

 formats = info_dict.get('formats', None)

 # excluding the initial and final 10% of each video to avoid noise
 video_length = info_dict['duration'] * info_dict['fps']

 shots = info_dict['fps'] // 2

 to_ignore = (video_length * percentage_to_ignore) // 100
 new_len = video_length - (to_ignore * 2)
 tot_stored_bags = ((new_len // shots) * percentage_of_bags) // 100 # ((total_possbile_bags // shots) * percentage_of_bags) // 100
 if tot_stored_bags == 0: tot_stored_bags = 1 # minimum 1 bag of shots

 skip_rate_between_bags = (new_len - (tot_stored_bags * shots)) // (tot_stored_bags-1) if tot_stored_bags > 1 else 0

 chunks = [[to_ignore+(bag*(skip_rate_between_bags+shots)), to_ignore+(bag*(skip_rate_between_bags+shots))+shots] for bag in range(tot_stored_bags)]
 # sequence of [[start_frame, end_frame], [start_frame, end_frame], [start_frame, end_frame], ...]


 # ----------- For the moment I download only shots form video that has 144p resolution -----------

 res = {
 '160': '144p',
 '133': '240p',
 '134': '360p',
 '135': '360p',
 '136': '720p'
 }

 format_id = {}
 for f in formats: format_id[f['format_id']] = f
 #for res in resolution_id:
 if list(res.keys())[0] in list(format_id.keys()):
 video = format_id[list(res.keys())[0]]
 url = video.get('url', None)
 if(video.get('url', None) != video.get('manifest_url', None)):

 if not os.path.exists(path_until_url_id): os.makedirs(path_until_url_id)

 with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
 for idx_bag, f in enumerate(chunks): 
 res = executor.submit(
 extract_frames, directory = path_until_url_id, url = url, idx_bag = idx_bag, start_frame = f[0], end_frame = f[1])
 
 if res.result() is True: 
 l = np.zeros(len(LABELS), dtype=int) 
 for label in labels_list: l[label] = 1
 l = np.append(l, [shots]) # appending the number of shots taken in the list before adding it on the dictionary

 dic[f'{directory}/{url_id}/bag_of_shots{str(idx_bag)}'] = l.tolist()


 except Exception as e:
 shutil.rmtree(path_until_url_id)
 pass



Download of TRAIN bag of shots :


start_time = time.time()
pbar = tqdm(enumerate(TRAIN.items()), total = len(TRAIN.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
 video_url = url, labels_list = labels_list, directory = 'train', dic = train_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))



Download of TEST bag of shots :


start_time = time.time()
pbar = tqdm(enumerate(TEST.items()), total = len(TEST.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
 video_url = url, labels_list = labels_list, directory = 'test', dic = test_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))



Save the .csv files


train_df = pd.DataFrame.from_dict(train_dict, orient='index', dtype=int).reset_index(level=0)
train_df = train_df.rename(columns={train_df.columns[-1]: 'shots'})
train_df.to_csv('/content/VideoClassificationDataset/train.csv', index=True)

test_df = pd.DataFrame.from_dict(test_dict, orient='index', dtype=int).reset_index(level=0)
test_df = test_df.rename(columns={test_df.columns[-1]: 'shots'})
test_df.to_csv('/content/VideoClassificationDataset/test.csv', index=True)