dashwood.net -

Ryan Stefan's Micro Blog

Fix PGP Keys - apt-get update error

Nov 032019

Can't install security updates:

sudo apt-get update && sudo apt-get dist-upgrade

first enter the following command in the terminal

sudo rm /var/lib/apt/lists/* -vf


then update your system by entering the following command in the terminal

sudo apt-get update && sudo apt-get upgrade


after this there should be no errors and everything should work fine.

The key(s) in the keyring /etc/apt/trusted.gpg.d/*** are ignored as the file has an unsupported filetype.

Installing from Github with pipenv - Fix pip on Linux

Jun 222019

Obviously not every pacakge on github is going to be available via pip, but downloading and installing manually clutters up your project directory. That kind of defeats the purpose of using pipenv in the first place. However, installing a package by using the git uri with pipenv is possible just like it is with pip. Here's what you type:

pipenv install -e git+git://github.com/user/project.git#egg=<project>

Pretty simple right? Here's an example of one that I've used recently just in case:

pipenv install -e git+git://github.com/miso-belica/sumy.git#egg=sumy 

Which is the command to install this package: https://github.com/miso-belica/sumy

 

If you have pipenv command not found use this to fix it:

sudo -H pip install -U pipenv

for scrapy with Python 3, you'll need

sudo apt-get install python3 python-dev python3-dev \
     build-essential libssl-dev libffi-dev \
     libxml2-dev libxslt1-dev zlib1g-dev \
     python-pip

with Python 2, you'll need

sudo apt-get install python-dev  \
     build-essential libssl-dev libffi-dev \
     libxml2-dev libxslt1-dev zlib1g-dev \
     python-pip

Multithreading and Run Once Decorator

Jun 112019

I've been coding again and just remembered how well this website works for keeping track of cool tricks I learn. Sometimes it's really hard to find simple and generic examples of things to help teach the fundamentals. I needed to write to a file without opening the text document 1000 times and I finally found a really clean example that helped me understand the pieces.

Edit** Threadpool is a lot easier and you can thread inside a loop:

from multiprocessing.pool import ThreadPool as Pool

threads = 100

p = Pool(threads)
p.map(function, list)

More complicated version:

import threading
 
lock = threading.Lock()
  
def thread_test(num):
    phrase = "I am number " + str(num)
    with lock:
        print phrase
        f.write(phrase + "\n")
 
threads = []
f = open("text.txt", 'w')
for i in range (100):
    t = threading.Thread(target = thread_test, args = (i,))
    threads.append(t)
    t.start()
  
while threading.activeCount() > 1:
    pass
else:
    f.close()

Close something on Scrapy spider close without using a pipeline:

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class MySpider(CrawlSpider):
    def __init__(self):
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
      # second param is instance of spder about to be closed.

Instead of using an if time or if count to activate something I found a decorator that will make sure the function on runs once:

def run_once(f):
    def wrapper(*args, **kwargs):
        if not wrapper.has_run:
            wrapper.has_run = True
            return f(*args, **kwargs)
    wrapper.has_run = False
    return wrapper


@run_once
def my_function(foo, bar):
    return foo+bar

You can also resize the terminal inside the code:

import sys
sys.stdout.write("\x1b[8;{rows};{cols}t".format(rows=46, cols=54))

I got stuck for a while trying to get my repository to let me login without creating an ssh key (super annoying imo) and I figured out that I added the ssh url for the origin url and needed to reset it to the http:

change origin url
git remote set-url origin <url-with-your-username>

Combine mp3 files with linux:

ls *.mp3
sudo apt-get install mp3wrap
mp3wrap output.mp3 *.mp3

Regex is always better than splitting a bunch of times and making the code messy. Plus it's a lot easier to pick up the code later on and figure out what's going on. So I decided to take my regex to the next level and start labeling groups (I'm even going to give it it's very own tag :3:

pat = r'(?<=\,\"searchResults\"\:\{)(?<list_results>.*)(?=\,\"resultsHash\"\:)'

m = re.match(pat, url)
if m:
    self.domain = m.group('list_results')

Postges Password Reset and Allowing Connections

Feb 102019

No password on postgres user fix:

pg_hba.conf

local  all   all   trust
ALTER USER postgres with password 'newpassword';

Then you can add your user account as a superuser:

ALTER ROLE ryan with SUPERUSER;

# then restart the server
sudo /etc/init.d/postgresql restart

You'll probably need to change your pg_hba.conf file back to something like this:

local  all                                          trust
host   all        127.0.0.1      255.255.255.255    trust
host   booktown   192.168.1.3    255.255.255.255    ident    sales
host   all        192.168.1.4    255.255.255.255    ident    audit

Increasing File Limits on Linux Mint: Bind 24 too Many Open Files

Feb 102019

This problem has been driving me nuts for a while now, but on the bright side it caused me to close all of my files by habit. The solution to this problem should have been obvious, but like many things in my life, it just didn't click. The problem is that I was trying to change global OS settings from within a non-root account. You have to actually change the config files from the root account and give all sub-accounts the ability to raise their limits. Also note, that the command "sudo ulimit -n 9000000" does not work.

Solution

Temporarily extend limits by switching to root and typing:

ulimit -n 900000

It's better to extend it on all users though so that you don't have to do everything in root. It took me a while to figure this out because I was changing the config in my user account etc/config files, but you have to switch to root and change the config in there to change the allowed limits.

sudo su root
/etc/security
rmate limits.conf

Then add this to the file and save:

* soft nofile 900000
* hard nofile 900000
<user> soft nofile 900000
<user> hard nofile 900000
root soft nofile 900000
root hard nofile 900000

Windows WSL Tips and Solved Issues

Dec 222018

I've completely switched over to Windows 10 with WSL on my main development computer and it's going pretty well. I just cant stand coding in Windows because everything is different and nothing works as well as it does on Linux. My job requires a lot of design work so having my home computers on Linux was not very practical. So when I heard about a native Linux sub-system I jumped at it. I will be putting any issues that I solve in this article. 

Getting Rsub Working with Windows WSL & Ubuntu 18.04

  1. Add rsub to sublime with package control (on Windows)
  2. Install & configure rmate (on Linux)
  3. Install openssh-server (on Linux)
  4. configure ssh (on Linux)
  5. add bashrc script with sudo and -f (on Linux)

Installing & Rmate

pip install rmate
sudo nano /etc/rmate.rc

127.0.0.1
52698

ctrl+o
ctrl+x

Install & Configure Openssh Server

sudo apt install openssh-server

sudo nano /etc/ssh/sshd_config

Port 2222
ListenAddress 0.0.0.0
Protocol 2
PasswordAuthentication yes
StrictModes no

ctrl+o
ctrl+x

sudo nano /etc/ssh/ssh_config

Host *
  RemoteForward 52698 localhost:52698
Port 2222
    SendEnv LANG LC_*
    HashKnownHosts yes
    GSSAPIAuthentication yes

sudo service ssh --full-restart

Bashrc Configurations

sudo nano ~/.bashrc

Open any file with Sublime Text

I plan on expanding this so that it can open on other windows drives like E:/

function subl {
 CUR_PATH=`readlink -f $1`
 if [[ $CUR_PATH == /mnt/c/* ]]; then
 /mnt/c/Program\ Files/Sublime\ Text\ 3/subl.exe "C:${CUR_PATH:6}"
 else
 sudo rmate $CUR_PATH -f
 fi
}

Convert and Open Shell Directory in Explorer

  • $() runs subshell function but leaves quotes around result
  • `` double ticks run the wslpath function in a subshell and strips quotes from result
  • $pwd is in quotes because directory spaces break the wslpath function
  • /$1 is an optional parameter for a subdir path
open() { explorer.exe `wslpath -w "$PWD"/$1`; }

Handy Bash Aliases

alias bashrc='subl ~/.bashrc' # open bashrc config
alias rbash='. ~/.bashrc' # reset bash shell to use changes

alias rbash='. ~/.bashrc' # reset bashrc in terminal
alias startredis='sudo service redis-server start'
alias stopredis='sudo service redis-server stop'

Windows Python Path Conflicting with Pipenv

This one is pretty annoying. I installed python 3.7 on my windows computer so that I could do linting on Sublime Text and it caused my pipenv to start using that path for the --three tag. I suppose I could have specified a different version, but I assumed there would be a way to turn off the windows python path inside WSL. I tried a few different ways, but none of them worked. I gave up and just made a bash function that points to my linux path:

##! Don't install packages with this, it will break dependency matching
pipenv3() { pipenv --python=/usr/bin/python3 install "[email protected]"; }

Note: bash script variables won't work if you use single quotes like this -> '

Other Things

  • ConEmu as bash editor
  • DejaVu Sans Mono font for everything (11pt)
  • Started saving appdata inside Google Drive
  • win+x shows "Power Menu"
  • win+ → or win + ← fits window to half screen
  • display fusion allows shortcuts on secondary taskbar
  • stickies — sticky notes minimize to tray
  • Musicbee — Powerful music player that saves spot

Side Note about Pip

Something that has been bothering me for a while now is whether I should install pipenv with pip or pip3. Turns out that pip is not the python two version of pip, but rather a hybrid of both. So there is pip3, pip, and pip2. So the obvious answer is to install it using plain pip. 

"pip3 always operates on the Python3 environment only, as pip2 does with Python2. pip operates on whichever environment is appropriate to the context."

Use "sudo apt install pip" on Ubuntu — Doesn't work well on Mint

Setting up Postgresql Properly

sudo apt install postgresql
sudo service postgresql start
sudo su - postgres
createuser --superuser ryan
psql # <- command line tool for making queries
\password ryan
\q # <- exit psql to create new users/dbs or import/export db's (psql is for sql)
createdb ryan # or whatever# exit and now you can run psql in your own console with your username.

#start automatically
sudo systemctl enable postgresql

Setting up Redis

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install redis-server

sudo service redis-server start
sudo service redis-server stop
sudo service redis-server restart

# running just redis-server will force you to keep a bash window open
# I usually just create a bashrc alias for this /shrug

# for automatically starting redis enter

sudo systemctl enable redis-server


Search Inside Files - The Power of Grep

Nov 112018

I noticed that my blog has footprints that show which script I used which might cause someone to seek it out and try and exploit it. So I was digging through my code looking for an error message so I could change it and was getting very frustrated. So I looked into finding strings inside files and it worked instantly:

# Recursive Find
grep -rnw '<file path>' -e '<string to find>' # Recursive Replace: grep -lR "<search_phrase>" <file path> | xargs sed -i 's/<search_phrase>/<replace_phrase>/g'

# Current Dir
grep -lR "<search_phrase>" . | xargs sed -i 's/<search_phrase>/<replace_phrase>/g'

Original Post Text

grep -rnw '/path/to/somewhere/' -e 'pattern'

  • -r or -R is recursive,
  • -n is line number, and
  • -w stands for match the whole word.
  • -l (lower-case L) can be added to just give the file name of matching files.


Along with these, --exclude, --include, --exclude-dir flags could be used for efficient searching:

This will only search through those files which have .c or .h extensions:

grep --include=\*.{c,h} -rnw '/path/to/somewhere/' -e "pattern"
This will exclude searching all the files ending with .o extension:

grep --exclude=*.o -rnw '/path/to/somewhere/' -e "pattern"
For directories it's possible to exclude a particular directory(ies) through --exclude-dir parameter. For example, this will exclude the dirs dir1/, dir2/ and all of them matching *.dst/:

grep --exclude-dir={dir1,dir2,*.dst} -rnw '/path/to/somewhere/' -e "pattern"
This works very well for me, to achieve almost the same purpose like yours.

For more options check man grep.

 

Sharing and Flask Dev

Nov 102018

Added NTFS folder sharing over the network without actually having user permission of the folder. Here's how I enabled, adding usershare owner only = false below [global]

sudo nano /etc/samba/smb.conf

# Any line which starts with a ; (semi-colon) or a # (hash) 
# is a comment and is ignored. In this example we will use a #
# for commentary and a ; for parts of the config file that you
# may wish to enable
#
# NOTE: Whenever you modify this file you should run the command
# "testparm" to check that you have not made any basic syntactic 
# errors. 
#

#======================= Global Settings =======================

[global]

usershare owner only = false

## Browsing/Identification ###

ctrl + o

Fix NTFS Permissions

Found some hopfully looking insight on how to give user access to mounted drives.

If you mount a partition to a folder within /home/user it will be owned by the user. Here's the line I added to my /etc/fstab.

UUID=9e5bb53c-4443-4124-96a8-baeb804da204 /home/fragos/Data ext4 errors=remount-ro 0 1

Keyword Raking / Splitting

Going to rake keywords from the comments and then use a 1 sentence lexsum of all of the titles for loop display and other stuff.

# Rake keywords
rake = Rake(min_length=2, max_length=6,
ranking_metric=Metric.DEGREE_TO_FREQUENCY_RATIO) rake.extract_keywords_from_text(textjoin) sumkeywords.append(' : '.join(rake.get_ranked_phrases()))

Source: https://github.com/csurfer/rake-nltk

I had to change the word tokenizer in the class to the nltk twitter tokenizer so that it wouldn't split words by apostrophes.

from nltk.tokenize import wordpunct_tokenize, TweetTokenizer
tknzr = TweetTokenizer()

...

word_list = [word.lower() for word in tknzr.tokenize(sentence)]

I've also decided to use ' : ' as my official list of terms splitting format. Commas are too common and might add complications in the future.

Flask Dev

I turned the CSV file generated from the lexsum generator to preview the summaries and keyword extraction in the flask app.

# load data and create sub dataframe for product asin
data = pd.read_csv('./static/data/sample-products.csv', index_col=0)
product_comments = data.loc[data['asin'] == asin]

# create variables for each rating
for number in range(1,6):
    current = product_comments.loc[product_comments['rating'] == number]
    product['{}_keywords'.format(number)] = current['keywords'].tolist()[0]
    product['{}_title'.format(number)] = current['title'].tolist()[0]
    product['{}_text'.format(number)] = current['text'].tolist()[0]

# load variables inside flask template
<p>{{product['4_text']}}</p>
<p><strong>{{product['4_keywords']}}</strong></p>

undefined

Lexsum in Action

Nov 092018

I finally got around to working on my Amazon project again. 

Misc Notes

# Change postgres data directory

File path:
/etc/postgresql/10/main/postgresql.conf

File System Headache

I decided to clean up my hard drives, but I forgot how much of a headache it was trying to get an NTFS drive to work with transmission-daemon. Whatever I'll just save to my EX4 partition for now and fix it later. 

*Update

I bricked my OS install and had to go down a 3 hour nightmare trying to fix it. I eventually discovered that it was a label from my old partition mount point in the fstab file. Solution:

sudo nano /etc/fstab

# comment out old label

ctrl + o to save
ctrl + x to exit

reboot

My computer still doesn't restart properly because I broke something in the boot order trying to fix it. Not a big deal I just enter my username/password in the terminal then type startx.

LexSum Progress

Had to slice to 50 for each rating to save time, but I can probably make it longer for launch. At first I was thinking there would be 60 million entities to process, but actually its more like 900k x 5 (for each rating) and as long as I don't lexsum 1000+ reviews for ratings it should finish in a few days. I reallllly need to add a timer function asap. I can just time 1000 or so products and multiply that by 900k or whatever the total number of products in my database is and I should have a pretty good idea how long it will take.

if len(titles) > 50:
    titlejoin = ' '.join(lex_sum(' '.join(titles[:50]), sum_count))
    textjoin = ' '.join(lex_sum(' '.join(comments[:50]), sum_count))
else:
    titlejoin = ' '.join(lex_sum(' '.join(titles), sum_count))
    textjoin = ' '.join(lex_sum(' '.join(comments), sum_count))

I'm thinking I can clean these lines up now that I'm staring at it. Maybe something like:

titlejoin = ' '.join(
    lex_sum(' '.join(titles[:min(len(titles), 50)]), sum_count))
textjoin = ' '.join(
    lex_sum(' '.join(comments[:min(len(titles), 50)]), sum_count))

My estimated time remaining function adds time elapsed ever ten iterations to a list, takes the last 500 or less of that list and averages them, and finally multiplies that average by the total remaining iterations and displays it in a human readable format:

avg_sec = 0
times = []
start = time.time()

# Display time remaining
if avg_sec:
    seconds_left = ((limit - count) / 10) * avg_sec
    m, s = divmod(seconds_left, 60)
    h, m = divmod(m, 60)
    print('Estimated Time Left: {}h {}m {}s'.format(
        round(h), round(m), round(s)))

if(not count % 10):
    end = time.time()
    time_block = end - start
    start = end
    times.append(time_block)
    avg_sec = functools.reduce(
        lambda x, y: x + y, times[-min(len(times), 500):]) / len(times[-min(len(times), 500):])
    print('Average time per 10:', round(avg_sec, 2), 'seconds')

Another thought I had is that this save_df module I coded (it's at like 400 lines of code already x_x) is actually a crucial part of my ultimate code base. I'm pretty happy that I spent so much time writing it into proper functions.

Fixed ppc64el apt-get Error THANK GOD

Oct 282018

Today was decent. I managed to fix my main linux install by removing the dpdk architecture that was breaking it, hallelujah!

sudo apt-get purge ".*:ppc64el"
sudo dpkg --remove-architecture ppc64el

I got cuda samples running on it too and have my scraping back up and running. Oh, Amazon changed their date format slightly, but that was an easy fix.

I've been thinking a lot about the party last night and am really going to have to start working on the visualize technique Ron suggested.

Dad and I went over some potential pen holder kit ideas and I think we have a good handle on what we want to do.