Goal: Load data copied from an PDF table into a RDMS table column by column, using SQL.
Selecting and copy/pasting the whole PDF table at once didn't extract the data in clean or usable way, things got jumbled. But selecting one column at a time (using xpdf) cleanly extracted the data in that column. But how can you insert it into the table without messing up the ordering of each columns content? OMG! The Excel "reordering destroys data integrity" problem has come to SQL! :-) Anyway, given a table like this:
21 | Ant | One |
31 | Bat | Two |
76 | Cat | Three |
89 | Dog | Four |
The following approach will work (from a postgres / psql session):
create table rescued_data ( col1 int, col2 text, col3 text, ordering int ); create temp sequence s; create temp table col (val text); \copy col from stdin 21 31 76 89 \. insert into rescued_data (col1, ordering) select val::int, nextval('s') from col; -- note need to match type with ::int in the above select setval('s', 1, false); -- reset the sequence truncate col; \copy col from stdin Ant Bat Cat Dog \. update rescued_data set col2 = val from (select val, nextval('s') as seq from col) as x where seq = ordering; -- repeating above for next column select setval('s', 1, false); -- reset the sequence truncate col; \copy col from stdin One Two Three Four \. update rescued_data set col3 = val from (select val, nextval('s') as seq from col) as x where seq = ordering; select * from rescued_data; -- if necessary, you can alter table rescued_data drop column ordering;
No comments:
Post a Comment