
Recherche avancée
Médias (91)
-
Collections - Formulaire de création rapide
19 février 2013, par
Mis à jour : Février 2013
Langue : français
Type : Image
-
Les Miserables
4 juin 2012, par
Mis à jour : Février 2013
Langue : English
Type : Texte
-
Ne pas afficher certaines informations : page d’accueil
23 novembre 2011, par
Mis à jour : Novembre 2011
Langue : français
Type : Image
-
The Great Big Beautiful Tomorrow
28 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Texte
-
Richard Stallman et la révolution du logiciel libre - Une biographie autorisée (version epub)
28 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Texte
-
Rennes Emotion Map 2010-11
19 octobre 2011, par
Mis à jour : Juillet 2013
Langue : français
Type : Texte
Autres articles (64)
-
Personnaliser en ajoutant son logo, sa bannière ou son image de fond
5 septembre 2013, parCertains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;
-
Ecrire une actualité
21 juin 2013, parPrésentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
Vous pouvez personnaliser le formulaire de création d’une actualité.
Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...) -
Publier sur MédiaSpip
13 juin 2013Puis-je poster des contenus à partir d’une tablette Ipad ?
Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir
Sur d’autres sites (8152)
-
Need to Download Videos From (Ap.lk) This site [closed]
20 mars 2024, par SeamonGetting this error when using the URL with FFmpeg


[https @ 00000191353bfc00] No trailing CRLF found in HTTP header. Adding it
[ in#0 @ 00000191353aa580] Error opening input: Invalid data found when processing input

Error opening input file https://video.ap.lk/stream/ad470e41-5a62-43da-b8ab-6a0e9d0a7316 

Error opening input files: Invalid data found when processing input



This is the regular URL that I use to get the video :


https://video.ap.lk/stream/ad470e41-5a62-43da-b8ab-6a0e9d0a7316


Using this FFmpeg command :


ffmpeg -user_agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 OPR/107.0.0.0" -headers "Cookie: google-analyze=2f506b9c-9387-4727-ab57-821718fedb38; Path=/" -referer "https://ap.lk" -i "https://video.ap.lk/stream/ad470e41-5a62-43da-b8ab-6a0e9d0a7316" -c copy "ad470e41-5a62-43da-b8ab-6a0e9d0a7316.mp4"



What should I do ? Is there anything wrong with URL or what else ?


-
Download m3u8 video from web page with authorization
31 août 2024, par maya_solovevaI try download m3u8 video from webpage with access by authorization.


I use ffmpeg command.
I get headers from network tab.


ffmpeg -y -loglevel verbose -headers "$(cat headers.txt)" -i "https://apps.foxford.ru/webinar-foxford/storage/api/v2/backends/yandex/sets/hls.webinar.foxford.ru::420e27e1-1c7d-43e9-9195-fc8b99e6f50a/objects/long.v2.yandex.video.360p.master.m3u8" output2.mp4 -c copy -f mp4 file:foxford_01.mp4.part



headers.txt content :


Host: apps.foxford.ru
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:129.0) Gecko/20100101 Firefox/129.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br, zstd
Referer: https://apps.foxford.ru/webinar/?audience=foxford.ru&brand=foxford&embedded_origin=https%3A%2F%2Ffoxford.ru&external_id=5aefae2d-e810-49ef-8344-d3807535fd9e&lesson_id=250799&scope=Z2lkOi8vc3RvZWdlL1VsbXM6OlJvb21zOjpXZWJpbmFyLzUwNDcwMA&utm_referrer=https%3A%2F%2Ffoxford.ru%2Fcourses%2F9193%2Flessons%2F250799&backurl=https%253A%252F%252Fapps%252Efoxford%252Eru%252Fapi%252Fv1%252Fredirs%252Faudiences%252Ffoxford%252Eru%252Fapps%252Fwebinar
DNT: 1
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
authorization: Bearer eyJhbGciOiJFUzI1NiJ9.eyJleHAiOjE3MjUwODU1MzUsImlzcyI6InN2Yy5mb3hmb3JkLnJ1IiwiYXVkIjoidXNyLmZveGZvcmQucnUiLCJzdWIiOiJaMmxrT2k4dmMzUnZaV2RsTDFWelpYSTZPbEIxY0dsc0x6RXdPRFU0TXpJeSJ9.NE9IsraTHFKicr3lkjzTTEkGLoGP-bL74nlX1umcPjC62h9MGXjPUFXL53jsnixd7b-WPtIc8N2WhsC6Ipt8IQ
Connection: keep-alive
Cookie: _csrf_token=%2Ba0goi1UnpRzFxpPOIbnZBaxtY0YUlZyBnyGohC5NbfBaVzLvwpEtM0x%2BjEPvJqJrcvsy2qMZHTStvusdxeUmQ%3D%3D; advcake_trackid=4b724357-eb38-b2f0-6919-4a4363aef58a; _tm_lt_sid=1720871531732.305536; client_timezone=Europe/Moscow; _ga=GA1.2.506102438.1720871532; tmr_lvid=7b674da8235ea642c3bfacf244eeae53; tmr_lvidTS=1720871533984; _ym_uid=1720871534834198811; _ym_d=1720871534; _tt_enable_cookie=1; _ttp=El4ZowXxQWdsnA31_DDrCt_AT_4; popmechanic_sbjs_migrations=popmechanic_1418474375998%3D1%7C%7C%7C1471519752600%3D1%7C%7C%7C1471519752605%3D1; _foxford_cookie_consent=yes; advcake_session_id=2fb2830e-9b68-2529-f9e4-2dec39973ef9; adtech_uid=ba6ca4c8-055a-4f46-b06c-6df09c20e8a4%3Afoxford.ru; top100_id=t1.7729862.2001929594.1723667001899; t3_sid_7729862=s1.1788907302.1725084603508.1725085234295.9.11; _ym_isad=2; mindboxDeviceUUID=9d785862-f8db-4ba8-b7bc-ccd9dbba98e1; directCrm-session=%7B%22deviceGuid%22%3A%229d785862-f8db-4ba8-b7bc-ccd9dbba98e1%22%7D; qrator_jsid=1725084559.622.G14rJEU37yUAYfzF-0l4eiu40m3uudgv95tu3faluhe7up6nv; mindboxSessionIsOpen=true
Priority: u=4
Pragma: no-cache
Cache-Control: no-cache
TE: trailers



ffmpeg returns HTTP error Server returned 401 Unauthorized (authorization failed) :




Headers from network tab :



-
Parallelize Youtube video frame download using yt-dlp and cv2
4 mars 2023, par zulle99My task is to download multiple sequences of successive low resolution frames of Youtube videos.


I summarize the main parts of the process :


- 

- Each bag of shots have a dimension of half a second (depending on the current fps)
- In order to grab useful frames I've decided to remove the initial and final 10% of each video since it is common to have an intro and outro. Moreover
- I've made an array of pair of initial and final frame to distribute the load on multiple processes using
ProcessPoolExecutor(max_workers=multiprocessing.cpu_count())
- In case of failure/exception I completly remove the relative directory










The point is that it do not scale up, since while running I noticesd that all CPUs had always a load lower that the 20% more or less. In addition since with these shots I have to run multiple CNNs, to prevent overfitting it is suggested to have a big dataset and not a bounch of shots.


Here it is the code :


import yt_dlp
import os
from tqdm import tqdm
import cv2
import shutil
import time
import random
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import pandas as pd
import numpy as np
from pathlib import Path
import zipfile


# PARAMETERS
percentage_train_test = 50
percentage_bag_shots = 20
percentage_to_ignore = 10

zip_f_name = f'VideoClassificationDataset_{percentage_train_test}_{percentage_bag_shots}_{percentage_to_ignore}'
dataset_path = Path('/content/VideoClassificationDataset')

# DOWNOAD ZIP FILES
!wget --no-verbose https://github.com/gtoderici/sports-1m-dataset/archive/refs/heads/master.zip

# EXTRACT AND DELETE THEM
!unzip -qq -o '/content/master.zip' 
!rm '/content/master.zip'

DATA = {'train_partition.txt': {},
 'test_partition.txt': {}}

LABELS = []

train_dict = {}
test_dict = {}

path = '/content/sports-1m-dataset-master/original'

for f in os.listdir(path):
 with open(path + '/' + f) as f_txt:
 lines = f_txt.readlines()
 for line in lines:
 splitted_line = line.split(' ')
 label_indices = splitted_line[1].rstrip('\n').split(',') 
 DATA[f][splitted_line[0]] = list(map(int, label_indices))

with open('/content/sports-1m-dataset-master/labels.txt') as f_labels:
 LABELS = f_labels.read().splitlines()


TRAIN = DATA['train_partition.txt']
TEST = DATA['test_partition.txt']
print('Original Train Test length: ', len(TRAIN), len(TEST))

# sample a subset percentage_train_test
TRAIN = dict(random.sample(TRAIN.items(), (len(TRAIN)*percentage_train_test)//100))
TEST = dict(random.sample(TEST.items(), (len(TEST)*percentage_train_test)//100))

print(f'Sampled {percentage_train_test} Percentage Train Test length: ', len(TRAIN), len(TEST))


if not os.path.exists(dataset_path): os.makedirs(dataset_path)
if not os.path.exists(f'{dataset_path}/train'): os.makedirs(f'{dataset_path}/train')
if not os.path.exists(f'{dataset_path}/test'): os.makedirs(f'{dataset_path}/test')



Function to extract a sequence of continuous frames :


def extract_frames(directory, url, idx_bag, start_frame, end_frame):
 capture = cv2.VideoCapture(url)
 count = start_frame

 capture.set(cv2.CAP_PROP_POS_FRAMES, count)
 os.makedirs(f'{directory}/bag_of_shots{str(idx_bag)}')

 while count < end_frame:

 ret, frame = capture.read()

 if not ret: 
 shutil.rmtree(f'{directory}/bag_of_shots{str(idx_bag)}')
 return False

 filename = f'{directory}/bag_of_shots{str(idx_bag)}/shot{str(count - start_frame)}.png'

 cv2.imwrite(filename, frame)
 count += 1

 capture.release()
 return True



Function to spread the load along multiple processors :


def video_to_frames(video_url, labels_list, directory, dic, percentage_of_bags):
 url_id = video_url.split('=')[1]
 path_until_url_id = f'{dataset_path}/{directory}/{url_id}'
 try: 

 ydl_opts = {
 'ignoreerrors': True,
 'quiet': True,
 'nowarnings': True,
 'simulate': True,
 'ignorenoformatserror': True,
 'verbose':False,
 'cookies': '/content/all_cookies.txt',
 #https://stackoverflow.com/questions/63329412/how-can-i-solve-this-youtube-dl-429
 }
 ydl = yt_dlp.YoutubeDL(ydl_opts)
 info_dict = ydl.extract_info(video_url, download=False)

 if(info_dict is not None and info_dict['fps'] >= 20):
 # I must have a least 20 frames per seconds since I take half of second bag of shots for every video

 formats = info_dict.get('formats', None)

 # excluding the initial and final 10% of each video to avoid noise
 video_length = info_dict['duration'] * info_dict['fps']

 shots = info_dict['fps'] // 2

 to_ignore = (video_length * percentage_to_ignore) // 100
 new_len = video_length - (to_ignore * 2)
 tot_stored_bags = ((new_len // shots) * percentage_of_bags) // 100 # ((total_possbile_bags // shots) * percentage_of_bags) // 100
 if tot_stored_bags == 0: tot_stored_bags = 1 # minimum 1 bag of shots

 skip_rate_between_bags = (new_len - (tot_stored_bags * shots)) // (tot_stored_bags-1) if tot_stored_bags > 1 else 0

 chunks = [[to_ignore+(bag*(skip_rate_between_bags+shots)), to_ignore+(bag*(skip_rate_between_bags+shots))+shots] for bag in range(tot_stored_bags)]
 # sequence of [[start_frame, end_frame], [start_frame, end_frame], [start_frame, end_frame], ...]


 # ----------- For the moment I download only shots form video that has 144p resolution -----------

 res = {
 '160': '144p',
 '133': '240p',
 '134': '360p',
 '135': '360p',
 '136': '720p'
 }

 format_id = {}
 for f in formats: format_id[f['format_id']] = f
 #for res in resolution_id:
 if list(res.keys())[0] in list(format_id.keys()):
 video = format_id[list(res.keys())[0]]
 url = video.get('url', None)
 if(video.get('url', None) != video.get('manifest_url', None)):

 if not os.path.exists(path_until_url_id): os.makedirs(path_until_url_id)

 with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
 for idx_bag, f in enumerate(chunks): 
 res = executor.submit(
 extract_frames, directory = path_until_url_id, url = url, idx_bag = idx_bag, start_frame = f[0], end_frame = f[1])
 
 if res.result() is True: 
 l = np.zeros(len(LABELS), dtype=int) 
 for label in labels_list: l[label] = 1
 l = np.append(l, [shots]) # appending the number of shots taken in the list before adding it on the dictionary

 dic[f'{directory}/{url_id}/bag_of_shots{str(idx_bag)}'] = l.tolist()


 except Exception as e:
 shutil.rmtree(path_until_url_id)
 pass



Download of TRAIN bag of shots :


start_time = time.time()
pbar = tqdm(enumerate(TRAIN.items()), total = len(TRAIN.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
 video_url = url, labels_list = labels_list, directory = 'train', dic = train_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))



Download of TEST bag of shots :


start_time = time.time()
pbar = tqdm(enumerate(TEST.items()), total = len(TEST.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
 video_url = url, labels_list = labels_list, directory = 'test', dic = test_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))



Save the .csv files


train_df = pd.DataFrame.from_dict(train_dict, orient='index', dtype=int).reset_index(level=0)
train_df = train_df.rename(columns={train_df.columns[-1]: 'shots'})
train_df.to_csv('/content/VideoClassificationDataset/train.csv', index=True)

test_df = pd.DataFrame.from_dict(test_dict, orient='index', dtype=int).reset_index(level=0)
test_df = test_df.rename(columns={test_df.columns[-1]: 'shots'})
test_df.to_csv('/content/VideoClassificationDataset/test.csv', index=True)