Multithreading not achieving performance difference Python

meg hidey

Below is a program that makes multiple get requests and writes the response images to my directory. These get requests are meant to be in separate threads, and thus be quicker than w/o threads but I'm not seeing the performance difference.

Printing active_count() shows there are 9 threads created. However, the performance time still takes around 40 seconds whether or not I use threading.

Below is me using threading.

from threading import active_count
import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

t1 = time.perf_counter()


def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)
    print(active_count())


t2 = time.perf_counter()

print(f'Finished in {t2-t1} seconds')

Below is without threading

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


for img_url in img_urls:
    download_image(img_url)

Could someone explain why this is happening? Thanks

Vollfeiw

This is the result i got with your piece of code, with start and end time next to the download. The overall time is around the same (on my "normal network", not the slow one i talked in my comment)

The reason is that multiple thread doesn't increase I/O or bandwith, the limitation could also be the website itself. This looks like the issue is not from your code.

EDIT (misleading statement) : as mentionned by MisterMiyagi in the comment below (read his comment, he explain why), it should increase I/O, that's the reason i get 10s increase on a slow network (limited connection on my work lab). This doesn't increase the I/O or bandwith in that specific case (with full bandwith on my "normal" connection), and this may be from a lot of source, but in my opinion, not the code itself.

I also tried with max_workers=5, the same overall time appears.

    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 1.0464828 - 1.7136098
    photo-1532009324734-20a7a5813719.jpg was downloaded... 1.7140197 - 5.6327612
    photo-1524429656589-6633a470097c.jpg was downloaded... 5.6339666 - 8.3146478
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 8.3160157 - 10.474087
    photo-1564135624576-c5c88640f235.jpg was downloaded... 10.4749598 - 11.2431941
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 11.2436369 - 15.6939695
    photo-1522364723953-452d3431c267.jpg was downloaded... 15.6954112 - 18.3257819
    photo-1513938709626-033611b8cc03.jpg was downloaded... 18.3269668 - 21.0607191
    photo-1507143550189-fed454f93097.jpg was downloaded... 21.0621265 - 22.2371699
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 22.2375931 - 26.4375676
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 26.4393404 - 28.3477933
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 28.348679 - 30.4626719
    photo-1516972810927-80185027ca84.jpg was downloaded... 30.4636931 - 32.2621345
    photo-1550439062-609e1531270e.jpg was downloaded... 32.2628976 - 34.7331719
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 34.7341393 - 35.5910094
    Finished in 34.545366900000005 seconds
    21
    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 35.5960486 - 46.1692758
    photo-1564135624576-c5c88640f235.jpg was downloaded... 35.6110777 - 47.3780254
    photo-1507143550189-fed454f93097.jpg was downloaded... 35.6265503 - 47.4433963
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 35.6692061 - 49.7097683
    photo-1516972810927-80185027ca84.jpg was downloaded... 35.6420564 - 57.2326763
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 35.6340008 - 61.4597509
    photo-1550439062-609e1531270e.jpg was downloaded... 35.6637577 - 62.0488296
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 35.6072146 - 63.4139648
    photo-1513938709626-033611b8cc03.jpg was downloaded... 35.6223106 - 63.8149815
    photo-1524429656589-6633a470097c.jpg was downloaded... 35.6032493 - 63.8284464
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 35.6352735 - 65.0513042
    photo-1522364723953-452d3431c267.jpg was downloaded... 35.6182243 - 65.5005548
    photo-1532009324734-20a7a5813719.jpg was downloaded... 35.5994888 - 66.2930857
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 35.6144996 - 67.8115219
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 35.6301133 - 68.5357319
    Finished in 32.946069800000004 seconds

EDIT 2 (more testing) : I tried with one of my webserver (Same code, just different image list), and I got an overall decrease of 60-70% of downloading time. Work best with limited workers in that case. The problem come from the website, not your code.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

synchronize a method by achieving better performance?

Achieving shell-like pipeline performance in Python

Measuring performance difference between Python and Java implementations?

Multithreading performance of QtConcurrent Vs QThread with many threads

Peculiar difference in MKL matrix multiplication performance between Fortran/Python/MATLAB

Achieving the best write performance on a (local) Mnesia instance in Elixir/Erlang

Python and multithreading

Achieving Numba's performance with Cython

Pandas DataFrame Multithreading No Performance Gain

Performance difference between `is` and `as` in LINQ

Why the performance difference in += vs +?

Performance difference between numpy.random and random.random in Python

Achieving multiple inheritance using python dataclasses

Achieving consistent block sizing in python raw file IO

Achieving async performance advantages in synchronous code bases

Are we compromising performance for achieving code readability when using LINQ?

Python: Is there a performance difference between `dist` and `sdist`?

Java multithreading performance

Difference between thread.join and thread.abort in python multithreading

Achieving the same performance that Node.JS provides in an ASP.NET MVC application

if condition in for loop - performance difference

MultiThreading : Map Performance

Python concurrent.futures performance difference with little change

What is the difference between these two methods of achieving a function call?

python massive performance difference array iteration vs "if in"

I want store a specific number when achieving the condition in to a list Python

Python dynamic programming performance difference

Achieving interface without inheritance in Python

Performance issue when using multithreading in C#