[Dcmlib] [Fwd: pdf parser for generating XML like document]
Mathieu Malaterre
mathieu.malaterre at kitware.com
Tue Oct 25 21:10:53 CEST 2005
one more thing. kword has a pretty decent pdf importer. You can even
select a range of pages to import. opensource is cool :)
Mathieu
Mathieu Malaterre wrote:
>
> No, PDF has no concept of tables, as such. It's just commands to select
> fonts and draw text, and some other commands to draw horizontal lines,
> etc.
>
> I don't know of any easy way to convert PDF to XML for the sort of
> application you're working on, sorry.
>
> - Derek
>
> -------- Original Message --------
> Subject: pdf parser for generating XML like document
> Date: Sun, 23 Oct 2005 17:31:56 -0400
>
> Hello,
>
> I did search for a mailing list on the following web site:
> http://www.foolabs.com/xpdf/
>
> and since I could not find it, I am writting to you directly.
>
> I have the following problem. DICOM is a file format that is specified
> by NEMA at:
>
> http://medical.nema.org/dicom/2004.html
>
> In particular if you look at the document: (1)
> http://medical.nema.org/dicom/2004/04_06PU.PDF
>
> The spec is huge. Therefore I am using pdftotext + python script to
> generate a custom output. You can find everything here:
>
> The python script
> (bascially takes as input the output of `pdftotext -raw -nopgbrk`
> http://cvs.creatis.insa-lyon.fr/viewcvs/viewcvs.cgi/gdcm/Dicts/ParseDict.py
>
> And here is the cleanup output (python script+hand writting):
> http://cvs.creatis.insa-lyon.fr/viewcvs/viewcvs.cgi/gdcm/Dicts/dicomV3.dic
>
> This is very difficult to maintain as every year a new spec is release.
>
> Therefore I was wondering if you could give me some advice on how to
> parse the PDF document(1). Is there some table start/end marker in the
> pdf file that I can use. Is there any API, of the pdf lib that would
> allow me to generate an 'XML' like description of the PDF in a neutral
> way ?
>
> Thanks so much for your time,
> Mathieu
> Ps: If such ML exist, forgive me and please give the reference so that I
> can ask this question.
>
> _______________________________________________
> Dcmlib mailing list
> Dcmlib at creatis.insa-lyon.fr
> http://www.creatis.insa-lyon.fr/mailman/listinfo/dcmlib
>
More information about the Dcmlib
mailing list