XHTMLDocHandler
dcmspec.xhtml_doc_handler.XHTMLDocHandler
Bases: DocHandler
Handler class for DICOM specifications documents in XHTML format.
Provides methods to download, cache, and parse XHTML documents, returning a BeautifulSoup DOM object. Inherits configuration and logging from DocHandler.
Source code in src/dcmspec/xhtml_doc_handler.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
__init__(config=None, logger=None)
Initialize the XHTML document handler and set cache_file_name to None.
Source code in src/dcmspec/xhtml_doc_handler.py
24 25 26 27 |
|
clean_text(text)
Clean text content before saving.
Removes zero-width space (ZWSP) and non-breaking space (NBSP) characters.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text content to clean.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The cleaned text.
TYPE:
|
Source code in src/dcmspec/xhtml_doc_handler.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
download(url, cache_file_name)
Download and cache an XHTML file from a URL.
Uses the base class download method, saving as UTF-8 text and cleaning ZWSP/NBSP.
PARAMETER | DESCRIPTION |
---|---|
url
|
The URL of the XHTML document to download.
TYPE:
|
cache_file_name
|
The filename of the cached document.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The file path where the document was saved. |
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
If the download or save fails. |
Source code in src/dcmspec/xhtml_doc_handler.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
load_document(cache_file_name, url=None, force_download=False)
Open and parse an XHTML file, downloading it if needed.
PARAMETER | DESCRIPTION |
---|---|
cache_file_name
|
Path to the local cached XHTML file.
TYPE:
|
url
|
URL to download the file from if not cached or if force_download is True.
TYPE:
|
force_download
|
If True, do not use cache and download the file from the URL.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
BeautifulSoup
|
Parsed DOM.
TYPE:
|
Source code in src/dcmspec/xhtml_doc_handler.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
parse_dom(file_path)
Parse a cached XHTML file into a BeautifulSoup DOM object.
PARAMETER | DESCRIPTION |
---|---|
file_path
|
Path to the cached XHTML file to parse.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
BeautifulSoup
|
The parsed DOM object.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
If the file cannot be read or parsed. |
Source code in src/dcmspec/xhtml_doc_handler.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|