data_garden data_garden - 1 month ago 12
Python Question

Python - reduce complexity using sets

I am using

url_analysis
tools from
spotify
API
(wrapper
spotipy
, with
sp.
) to process tracks, using the following code:

def loudness_drops(track_ids):

names = set()
tids = set()
tracks_with_drop_name = set()
tracks_with_drop_id = set()

for id_ in track_ids:
track_id = sp.track(id_)['uri']
tids.add(track_id)
track_name = sp.track(id_)['name']
names.add(track_name)
#get audio features
features = sp.audio_features(tids)
#and then audio analysis id
urls = {x['analysis_url'] for x in features if x}
print len(urls)
#fetch analysis data
for url in urls:
# print len(urls)
analysis = sp._get(url)
#extract loudness sections from analysis
x = [_['start'] for _ in analysis['segments']]
print len(x)
l = [_['loudness_max'] for _ in analysis['segments']]
print len(l)
#get max and min values
min_l = min(l)
max_l = max(l)
#normalize stream
norm_l = [(_ - min_l)/(max_l - min_l) for _ in l]
#define silence as a value below 0.1
silence = [l[i] for i in range(len(l)) if norm_l[i] < .1]
#more than one silence means one of them happens in the middle of the track
if len(silence) > 1:
tracks_with_drop_name.add(track_name)
tracks_with_drop_id.add(track_id)
return tracks_with_drop_id


The code works, but if the number of songs I
search
is set to, say,
limit=20
, the time it takes to process all the
audio segments
x
and
l
makes the process too expensive, e,g:

time.time()
prints
452.175742149


QUESTION:

how can I drastically reduce complexity here?

I've tried to use
sets
instead of
lists
, but working with
set
objects
prohibts
indexing
.

EDIT: 10
urls
:

[u'https://api.spotify.com/v1/audio-analysis/5H40slc7OnTLMbXV6E780Z', u'https://api.spotify.com/v1/audio-analysis/72G49GsqYeWV6QVAqp4vl0', u'https://api.spotify.com/v1/audio-analysis/6jvFK4v3oLMPfm6g030H0g', u'https://api.spotify.com/v1/audio-analysis/351LyEn9dxRxgkl28GwQtl', u'https://api.spotify.com/v1/audio-analysis/4cRnjBH13wSYMOfOF17Ddn', u'https://api.spotify.com/v1/audio-analysis/2To3PTOTGJUtRsK3nQemP4', u'https://api.spotify.com/v1/audio-analysis/4xPRxqV9qCVeKLQ31NxhYz', u'https://api.spotify.com/v1/audio-analysis/1G1MtHxrVngvGWSQ7Fj4Oj', u'https://api.spotify.com/v1/audio-analysis/3du9aoP5vPGW1h70mIoicK', u'https://api.spotify.com/v1/audio-analysis/6VIIBKYJAKMBNQreG33lBF']

Answer

This is what I see, not knowing much about spotify:

for id_ in track_ids:
    # this runs N times, where N = len(track_ids)
    ...
    tids.add(track_id)  # tids contains all track_ids processed until now
    # in the end: len(tids) == N
    ...
    features = sp.audio_features(tids)
    # features contains features of all tracks processed until now
    # in the end, I guess: len(features) == N * num_features_per_track

    urls = {x['analysis_url'] for x in features if x}
    # very probably: len(urls) == len(features)

    for url in urls:
        # for the first track, this processes features of the first track only
        # for the seconds track, this processes features of 1st and 2nd
        # etc.
        # in the end, this loop repeats N * N * num_features_per_track times

You should not any url twice. And you do, because you keep all tracks in tids and then for each track you process everything in tids, which turns the complexity of this into O(n2).

In general, always look for loops inside loops when trying to reduce complexity.

I believe in this case this should work, if audio_features expects a set of ids:

# replace this: features = sp.audio_features(tids)
# with:
features = sp.audio_features({track_id})