Tuesday, September 30, 2014

Review entire git commit history of a file, for security issues, passwords, sensitive information, etc.

Say you want to push a git repo. that you've been working on privately to a publicly viewable place, how do you review not just the current working copy / HEAD for sensitive information, but the whole history of files that might contain details you don't want to make public? Something like this seems reasonably efficient:

git log --patch --reverse path/to/file.py \
  | grep '\(^+\|^commit\)' \
  | sed 's/^+//'
Including only lines starting with "+" or "commit" shows you only what's added to the file. As long as you start at the beginning, the deletions (lines starting with "-") and context (everything else) don't matter.

Deleting the '+' at the beginning of each line means you can dump the output from the above into your favorite editor to get syntax highlighting, which perhaps makes it easier to read. You might need to use a temporary file with an appropriate extension, e.g.:

git log --patch --reverse path/to/file.py \
  | grep '\(^+\|^commit\)' \
  | sed 's/^+//; s/^commit/#commit/' \
  >delme.py
(adding the # in front of commit lines is nice with syntax highlighting in Python)

Thursday, April 19, 2012

Claws-mail - search for messages in date range

Claws-mail message search uses age in days to select messages by time, which is fine for "something more than two weeks ago but less than a month ago" (ag 14 & al 31) but not helpful for "a message in October 2010". This python script prints the Claws-mail extended search string for a given date range:

(replace raw_input with input for Python 3.x)

#!/usr/bin/python
import datetime
FROM = raw_input("From date YYYY-MM-DD: ")
TO = raw_input("To date YYYY-MM-DD: ")
FROM = datetime.datetime.strptime(FROM, "%Y-%m-%d")
TO = datetime.datetime.strptime(TO, "%Y-%m-%d")
NOW = datetime.date.today()
START = (NOW-FROM.date()).days
STOP = (NOW-TO.date()).days
print("al %d & ag %d" % (START, STOP-1))

Wednesday, November 30, 2011

Compiling QR reader for python / Ubuntu 11.10

Steps that apparently worked to get a QR code reader compiled for python (2.7) un Ubuntu 11.10 64 bit as of 2011-11-30.

(an alternative to the steps here might be this: http://pyqrcode.sourceforge.net/ but that didn't work easily for me in recent Ubuntus)

These instructions:
http://hi.baidu.com/paulau/blog/item/915e860ffbf7032c6059f34c.html

cover most of the steps, but lots of details need changing now vs. 2008.

The PyQrcodec_Linux.tar.gz mentioned in various places on the web has disappeared from www.pedemonte.eu - in fact that whole domain is gone. You can get it from http://gentoo.mirrors.pair.com/distfiles/PyQrcodec_Linux.tar.gz

The instructions above say:

sudo apt-get install g++
sudo apt-get install python-dev
sudo apt-get install libcv-dev libcvaux-dev

in addition you now need

sudo apt-get install libhighgui-dev

Untar and cd into the PyQrCodec folder.

then edit setup.py and add

extra_compile_args = ['-fpermissive'],

before each of the "sources = ..." lines.

then in the shell run (once and once only)
for file in $(grep -lre BEGIN ./*|grep -v svn); do sed -e "s/_BEGIN_/_CV_BEGIN_/;s/_END_/_CV_END_/" $file -i; done

now python setup.py build and sudo python setup.py install
should work (with all kinds of warnings).

The how to use example should be:

import PyQrcodec
size, image = PyQrcodec.encode('www.example.com')
image.save('example.png')
status, text = PyQrcodec.decode('example.png')
print(text)

.encode() params are:

Definition:     PyQrcodec.encode(string, image_width=400,
  case_sensitive=True, version=5,
  error_correction_level='QR_ECLEVEL_L',
  encoding_hint='QR_MODE_AN')
Docstring:
  Returns a PIL image of the QR code generated
  from the given string

Note: for http://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Wikipedia_mobile_en.svg/220px-Wikipedia_mobile_en.svg.png I had to flatten the image (replace the transparent background) for decode() to work.

Tuesday, June 21, 2011

Can I delete that branch? Check bzr branch relationships.

EDIT: Turns out bzr missing performs this function, bzr missing path/to/other/branch | head 2 to avoid having the important info. scrolled away. I'll assume bzr missing was added after I wrote this :-)


If you use bazaar (bzr) and end up with several branches for the same project, you can end up wondering if one branch contains all the commits in another, e.g. you need to check that all the work done in a successful experimental branch has been moved into the trunk.


This small python program does that:


leo.repo> bzrin free_layout trunk
Checking for commits of revs in 'free_layout' in 'trunk'
Status of 'free_layout':
unknown:
  .thumbnails/
  demo.jpg
  nohup.out
Status of 'trunk':
unknown:
  *.g1.dml
Counting revs in free_layout
6026 revs in free_layout
Counting revs in trunk
6683 revs in trunk
All revs in free_layout exist in trunk : OK

`bzrin` checks that all the commits in the "free_layout" branch have been merged into the trunk - in this case they have, and you can safely delete "free_layout".


The code uses `subprocess` rather than the python bzr bindings to do its work, but it gets the job done and has proved very useful for tidying up a directory full of branches for various subprojects.


#!/usr/bin/python
"""Check that the latest commit in bzr branch A exists in bzr branch B
"""

# bzrin2
# Author: Terry Brown
# Created: Mon Sep  8 12:18:21 CDT 2008

import subprocess, sys, os
import tempfile  # ? because subprocess.PIPE hangs in .wait() ?

def emit(s):
    sys.stdout.write(s)

def main():
    branch = tuple(sys.argv[1:3])
    emit("Checking for commits of revs in '%s' in '%s'\n" % branch)

    # show status
    for i in branch:
        emit("Status of '%s':\n" % i)
        cmd = subprocess.Popen(('bzr status '+i).split())
        cmd.wait()

    revs = []

    for i in branch:
        emit("Counting revs in %s\n" % i)
        revs.append(set())
        tmpFile, tmpName = tempfile.mkstemp()
        cmd = subprocess.Popen(('bzr log --show-ids --levels=0 '+i).split(),
            stdout = tmpFile)
        os.close(tmpFile)
        cmd.wait()
        source = file(tmpName)
        for line in source:
            content = line.strip()
            if content.startswith('revision-id:'):
                id_ = content.split(None,1)[1]
                while not line.strip() == 'message:':
                    line = source.next()
                line = source.next()
                msg = []
                while not line.strip().startswith('-'*10):
                    msg.append(line.strip())
                    try:
                        line = source.next()
                    except StopIteration:  # end of file
                        break
                revs[-1].add((id_, tuple(msg)))
        os.remove(tmpName)
        emit("%d revs in %s\n" % (len(revs[-1]), i))

    diff = revs[0].difference(revs[1])

    if not diff:
        emit ("All revs in %s exist in %s : OK\n" % branch)
    else:
        emit ("WARNING: %s contains revs NOT in %s\n" % branch)
        for i in diff:
            emit("%s\n%s\n" % (i[0], ''.join(['  '+m for m in i[1]])))
        emit ("WARNING: %s contains revs NOT in %s\n" % branch)

if __name__ == '__main__':
    main()

Thursday, October 28, 2010

Use multiple cores for shell scripts in Ubuntu

So you want to use all your CPU's cores for some shell based batch processing task. Seems there should already be an app for that, and there is, parallel, in the more-utils package in Ubuntu. But it's not that easy to use the target file argument in a shell script. (note: I think there may be more than one version of this utility, I'm referring to the one that ships with Ubuntu).

For example, I wanted to use all cores for this operation:

for i in svg/*.svg; do f=$(basename $i); \
  inkscape-devel --without-gui \
  --export-background=white \
  --export-ps ps/${f%%svg}ps $i; \
  echo $f; done

So parallel has an -i flag which enables replacement of {} with the target argument ($i in the above) but only if it's surrounded by spaces and not quoted, hardly convenient for scripting. This simple wrapper (saved in a file called task, made executable and placed somewhere on your $PATH) gets around that problem:

# helper for parallel
#
# usage: task 'shell-pattern' 'shell commands'

GLOB="$1"
shift
SCRIPT=$(mktemp)
echo "$@" >"$SCRIPT"
chmod +x "$SCRIPT"
parallel "$SCRIPT" -- $GLOB
rm "$SCRIPT"

So now you can use $1 (not $i) in your shell code without any complications. The above example becomes:

task 'svg/*.svg' 'f=$(basename $1); inkscape-devel \
--without-gui --export-background=white --export-ps \
ps/${f%%svg}ps $1; echo $f'

...and running on four cores it's much quicker :-)

Wednesday, May 26, 2010

Loading SQL tables column by column

Goal: Load data copied from an PDF table into a RDMS table column by column, using SQL.

Selecting and copy/pasting the whole PDF table at once didn't extract the data in clean or usable way, things got jumbled. But selecting one column at a time (using xpdf) cleanly extracted the data in that column. But how can you insert it into the table without messing up the ordering of each columns content? OMG! The Excel "reordering destroys data integrity" problem has come to SQL! :-) Anyway, given a table like this:

21AntOne
31BatTwo
76CatThree
89DogFour

The following approach will work (from a postgres / psql session):

create table rescued_data (
  col1 int,
  col2 text,
  col3 text,
  ordering int
);

create temp sequence s;
create temp table col (val text);

\copy col from stdin
21
31
76
89
\.

insert into rescued_data (col1, ordering)
  select val::int, nextval('s') from col;

-- note need to match type with ::int in the above

select setval('s', 1, false);  -- reset the sequence
truncate col;

\copy col from stdin
Ant
Bat
Cat
Dog
\.

update rescued_data set col2 = val
  from (select val, nextval('s') as seq from col) as x
  where seq = ordering;

-- repeating above for next column

select setval('s', 1, false);  -- reset the sequence
truncate col;

\copy col from stdin
One
Two
Three
Four
\.

update rescued_data set col3 = val
  from (select val, nextval('s') as seq from col) as x
  where seq = ordering;

select * from rescued_data;

-- if necessary, you can
alter table rescued_data drop column ordering;

Sunday, May 16, 2010

Python/PyQt upgrade triggers strange bug

Percy: Look, look, I just can't take the pressure of all these omens anymore!
Edmund: Percy...
Percy: No, no, really, I'm serious! Only this morning in the courtyard I saw a horse with two heads and two bodies!
Edmund: Two horses standing next to each other?
Percy: Yes, I suppose it could have been.
Blackadder, "Witchsmeller Pursuivant"

Today I saw a bug with one head and two bodies. Upgrading from Ubuntu to 9.10 to 10.4 broke a tool bar button in Leo, the world's best code editor / project manager / note sorter. The upgrade involved transitions from Python 2.6.4 -> 2.6.5 and PyQt 4.6 -> 4.7.2. The forward and back browsing buttons supplied by Leo's nav_qt plugin stopped working.

After Brain had been debugging, testing, googling, comparing etc. for over two hours, Intuition wanders past and says, "oh, ha, why not try

def __init__ (self,c):
         self.c = c
+        c._prev_next = self
         self.makeButtons()
Sometimes, Brain doesn't like Intuition very much.

Fortunately Brain was able to save some face, as

-        act_l = QtGui.QAction(icon_l, 'prev', ib_w)           
-        act_r = QtGui.QAction(icon_r, 'next', ib_w)           
+        act_l = QtGui.QAction(icon_l, 'prev', ib_w, triggered=self.clickPrev)   
+        act_r = QtGui.QAction(icon_r, 'next', ib_w, triggered=self.clickNext)  
was also required.

So it seems like the upgrade caused two changes which both had the same symptom, making debugging a challenge. It seems like the plugin class instance or the actions it was creating are now being garbage collected where they weren't before. The c._prev_next = self would prevent the instance being collected, although it's unclear that it should also prevent the actions being collected. You would think the GUI's links to the actions would be enough to protect them, so perhaps that bug body wasn't an old glitch going away, but a new one being introduced. OTOH the gui must have a link to the actions, as it's able to trigger them.

The triggered=self.clickPrev addition presumably covers a change in the emission of 'clicked()' by QToolButton, or a change in default actions, or something. Passing the parameter that way is a PyQt alternative to act_r.connect(act_r, QtCore.SIGNAL("triggered()"),self.clickNext), which would probably also have worked.