So I ran my summarizer yesterday and it took literally all day to run only 200 products through the lex sum function. So I went through my code and added a timer for each major step in the process like so:
start = time.time() asin_list = get_asins(limit) end = time.time() print('Get ASINs: ', end - start)
Turns out it was taking over 60 seconds per query . I did the math and at the rate it was going, it would take almost two years to complete every product in my database. So I started looking around at different ways to group large databases. Turns out databases are a lot more complicated than I believed. It felt like looking for a PHP solution back in high school when I didn't know enough to know what to look for. Finally I stumbled upon a feature called Indexing. First I added the indexing code inside of my script, which had no effect, but it seemed like it had worked properly. Still though I was not going to give up that easy and I decided to open up postgres directly in the terminal and poke around to see if the indexing was applied properly. Turns out that it was not applied at all. Here is the code I used to index the asin table in reviews:
# Remote Connect
postgres psql -U ryan -h 126.96.36.199 -p 5432 databasename
# Display table Indexes SELECT * FROM pg_indexes WHERE tablename = 'reviews'; # Create Index CREATE INDEX asin_index ON reviews (asin);
Ureka! It worked, now the script that took all day to run yesterday ran in about a minute flat! That is the biggest difference in performance time I've ever experienced and I cant wait to see where else indexing will help my databases.
Other than that, Erin showed me a bunch of stuff in illustrator and Phototshop.
- ctrl+click with select tool enables auto-select
- ctrl+d — deselect
- ctrl+shift+i — invert selection
- ctrl+j — duplicate layer
- ctrl+alt+j — duplicate and name layer