Wednesday, May 26, 2010

Loading SQL tables column by column

Goal: Load data copied from an PDF table into a RDMS table column by column, using SQL.

Selecting and copy/pasting the whole PDF table at once didn't extract the data in clean or usable way, things got jumbled. But selecting one column at a time (using xpdf) cleanly extracted the data in that column. But how can you insert it into the table without messing up the ordering of each columns content? OMG! The Excel "reordering destroys data integrity" problem has come to SQL! :-) Anyway, given a table like this:

21AntOne
31BatTwo
76CatThree
89DogFour

The following approach will work (from a postgres / psql session):

create table rescued_data (
  col1 int,
  col2 text,
  col3 text,
  ordering int
);

create temp sequence s;
create temp table col (val text);

\copy col from stdin
21
31
76
89
\.

insert into rescued_data (col1, ordering)
  select val::int, nextval('s') from col;

-- note need to match type with ::int in the above

select setval('s', 1, false);  -- reset the sequence
truncate col;

\copy col from stdin
Ant
Bat
Cat
Dog
\.

update rescued_data set col2 = val
  from (select val, nextval('s') as seq from col) as x
  where seq = ordering;

-- repeating above for next column

select setval('s', 1, false);  -- reset the sequence
truncate col;

\copy col from stdin
One
Two
Three
Four
\.

update rescued_data set col3 = val
  from (select val, nextval('s') as seq from col) as x
  where seq = ordering;

select * from rescued_data;

-- if necessary, you can
alter table rescued_data drop column ordering;

No comments:

Post a Comment