DocHandler
dcmspec.doc_handler.DocHandler
Base class for DICOM document handlers.
Handles DICOM documents in various formats (e.g., XHTML, PDF).
Subclasses must implement the load_document
method to handle
reading/parsing input files. The base class provides a generic
download method for both text and binary files.
Source code in src/dcmspec/doc_handler.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
__init__(config=None, logger=None)
Initialize the document handler with an optional logger.
PARAMETER | DESCRIPTION |
---|---|
config
|
Config instance to use. If None, a default Config is created.
TYPE:
|
logger
|
Logger instance to use. If None, a default logger is created.
TYPE:
|
Source code in src/dcmspec/doc_handler.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
clean_text(text)
Clean text content before saving.
Subclasses can override this to perform format-specific cleaning (e.g., remove ZWSP/NBSP for XHTML). By default, returns the text unchanged.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text content to clean.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The cleaned text.
TYPE:
|
Source code in src/dcmspec/doc_handler.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
download(url, file_path, binary=False)
Download a file from a URL and save it to the specified path.
Downloads a file from the given URL and saves it to the specified file path.
By default, saves as text (UTF-8); if binary is True, saves as binary (for PDFs, images, etc).
Subclasses may override this method or the clean_text
hook for format-specific processing.
PARAMETER | DESCRIPTION |
---|---|
url
|
The URL to download the file from.
TYPE:
|
file_path
|
The path to save the downloaded file.
TYPE:
|
binary
|
If True, save as binary. If False, save as UTF-8 text.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The file path where the document was saved.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
If the download or save fails. |
Source code in src/dcmspec/doc_handler.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
load_document(cache_file_name, url=None, force_download=False, *args, **kwargs)
Implement this method to read and parse the document file, returning a parsed object.
Subclasses should implement this method to load and parse a document file (e.g., XHTML, PDF, CSV) and return a format-specific parsed object. The exact type of the returned object depends on the subclass (e.g., BeautifulSoup for XHTML, pdfplumber.PDF for PDF).
PARAMETER | DESCRIPTION |
---|---|
cache_file_name
|
Path or name of the local cached file.
TYPE:
|
url
|
URL to download the file from if not cached or if force_download is True.
TYPE:
|
force_download
|
If True, download the file even if it exists locally.
TYPE:
|
*args
|
Additional positional arguments for format-specific loading.
TYPE:
|
**kwargs
|
Additional keyword arguments for format-specific loading.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The parsed document object (type depends on subclass).
TYPE:
|
Source code in src/dcmspec/doc_handler.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|