DocHandler
dcmspec.doc_handler.DocHandler
Base class for DICOM document handlers.
Handles DICOM documents in various formats (e.g., XHTML, PDF).
Subclasses must implement the load_document
method to handle
reading/parsing input files. The base class provides a generic
download method for both text and binary files.
Progress Reporting: The observer pattern is used for progress reporting. Subclasses may extend the Progress class and use the progress_observer to report additional information (e.g., status, errors, or other context) beyond percent complete, enabling future extensibility for richer progress tracking.
Source code in src/dcmspec/doc_handler.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
__init__(config=None, logger=None)
Initialize the document handler with an optional logger.
PARAMETER | DESCRIPTION |
---|---|
config
|
Config instance to use. If None, a default Config is created.
TYPE:
|
logger
|
Logger instance to use. If None, a default logger is created.
TYPE:
|
Logging
A logger may be passed for custom logging control. If no logger is provided, a default logger for this class is used. In both cases, no logging handlers are added by default. To see log output, logging should be configured in the application (e.g., with logging.basicConfig()).
Source code in src/dcmspec/doc_handler.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
clean_text(text)
Clean text content before saving.
Subclasses can override this to perform format-specific cleaning (e.g., remove ZWSP/NBSP for XHTML). By default, returns the text unchanged.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text content to clean.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The cleaned text.
TYPE:
|
Source code in src/dcmspec/doc_handler.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
download(url, file_path, binary=False, progress_observer=None, progress_callback=None)
Download a file from a URL and save it to the specified path.
PARAMETER | DESCRIPTION |
---|---|
url
|
The URL to download the file from.
TYPE:
|
file_path
|
The path to save the downloaded file.
TYPE:
|
binary
|
If True, save as binary. If False, save as UTF-8 text.
TYPE:
|
progress_observer
|
Optional observer to report download progress.
TYPE:
|
progress_callback
|
[LEGACY, Deprecated] Optional callback to report progress as an integer percent (0-100, or -1 if indeterminate). Use progress_observer instead. Will be removed in a future release.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The file path where the document was saved.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
If the download or save fails. |
Source code in src/dcmspec/doc_handler.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
load_document(cache_file_name, url=None, force_download=False, progress_observer=None, progress_callback=None, *args, **kwargs)
Implement this method to read and parse the document file, returning a parsed object.
Subclasses should implement this method to load and parse a document file (e.g., XHTML, PDF, CSV) and return a format-specific parsed object. The exact type of the returned object depends on the subclass (e.g., BeautifulSoup for XHTML, pdfplumber.PDF for PDF).
PARAMETER | DESCRIPTION |
---|---|
cache_file_name
|
Path or name of the local cached file.
TYPE:
|
url
|
URL to download the file from if not cached or if force_download is True.
TYPE:
|
force_download
|
If True, download the file even if it exists locally.
TYPE:
|
progress_observer
|
Optional observer to report download progress.
TYPE:
|
progress_callback
|
[LEGACY, Deprecated] Optional callback to report progress as an integer percent (0-100, or -1 if indeterminate). Use progress_observer instead. Will be removed in a future release.
TYPE:
|
*args
|
Additional positional arguments for format-specific loading.
TYPE:
|
**kwargs
|
Additional keyword arguments for format-specific loading.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The parsed document object (type depends on subclass).
TYPE:
|
Source code in src/dcmspec/doc_handler.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|