Find and Replace with StructuredDocumentTag
Working with StructuredDocumentTags
Hi,
We are now evaluating Aspose.Words to use it in our product.
During the evaluating we are building a demo application to
demonstrate Aspose.Words potential internally.
The demo application should work with Word templates (read
and write) and we have the following questions:
1) How to create a StructuredDocumentTag that allows the user to enter only numbers (and how to control if the user can use the dot (.) sign)?
2) How to set the “watermark” (placeholder) of a StructuredDocumentTags?
3) How to create a repeating section content control?
Thanks,
Omri
Unable to read tables data from StructuredDocumentTag directly
Hi,
Attached a document with some StructuredDocumentTags. In the StructuredDocumentTags there is text and tables data.
If I try to convert the nodes to html in a generic method I can
get all the data from the document. But if I try to convert only the StructuredDocumentTags
I don’t get the tables data.
All the tables and the texts are inside the StructuredDocumentTags
so logically it should be the same.
I think this is a bug in your parsing - you think that the table is outside the StructuredDocumentTag but word shows that it is inside.
Here is an example code that takes this word document and
create 2 htm files, one is good (using the generic approach) and one is bad (recursively
going over the StructuredDocumentTags and getting the data out of them).
var html = string.Empty;
var htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
ExportImagesAsBase64 = true,
ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None
};
//Generic approach - go over all the nodes
using (var inputStream = File.OpenRead(@"E:\WordTest\11.docx"))
{
var doc = new Aspose.Words.Document(inputStream);
CompositeNode parent = doc;
foreach (Aspose.Words.Node node in doc.ChildNodes)
{
html += node.ToString(htmlSaveOptions);
}
File.WriteAllText(@"E:\WordTest\11_good.htm", html, Encoding.UTF8);
}
// Exclusive approach - go over the StructuredDocumentTag only
html = string.Empty;
using (var inputStream = File.OpenRead(@"E:\WordTest\11.docx"))
{
var doc = new Aspose.Words.Document(inputStream);
html = ReadStructuredDocumentTagOnly(html, htmlSaveOptions, doc);
File.WriteAllText(@"E:\WordTest\11_bad.htm", html, Encoding.UTF8);
}
Helper method:
private static string ReadStructuredDocumentTagOnly(string html, Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions, CompositeNode parent)
{
foreach (Aspose.Words.Node node in parent.ChildNodes)
{
if (node.NodeType == NodeType.StructuredDocumentTag)
{
StructuredDocumentTag structuredDocumentTag = (StructuredDocumentTag)node;
foreach (Aspose.Words.Node textNode in structuredDocumentTag.ChildNodes)
{
html += textNode.ToString(htmlSaveOptions);
}
}
else
{
if (node is CompositeNode)
{
if (((CompositeNode)node).ChildNodes != null
&& ((CompositeNode)node).ChildNodes.Count > 0)
{
html = ReadStructuredDocumentTagOnly(html, htmlSaveOptions, (CompositeNode)node);
}
}
}
}
return html;
}
Please fix this bug or give us advice how to work around it.
Thanks
Text scrambled when converting Word to HTML
Hi,
For some reason when we convert the following document to
html the text in the rows is scrambled, it seems like the last word becomes the
first.
Here is our code:
var sourcefile = @"E:\WordTest\15.docx";
var html = string.Empty;
var htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
ExportImagesAsBase64 = true,
ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None
};
using (var inputStream = File.OpenRead(sourcefile))
{
var doc = new Aspose.Words.Document(inputStream);
CompositeNode parent = doc;
foreach (Aspose.Words.Node node in doc.ChildNodes)
{
html += node.ToString(htmlSaveOptions);
}
File.WriteAllText(@"E:\WordTest\15.html", html, Encoding.UTF8);
}
Please advise how to workaround this bug or release a fix.
Thanks!
TXT file rendered not as expected.
Aspose.Words .Net4.5.2 Support
importing html extremely slow in linux
OutOfMemory with updateFields and other operations
the issues reported here were discovered while investigating the workaround suggested on this topic low performance when save document to PDF format through Aspose Word Java library
Using updateFields, getPageCount or saving a PDF/XPS (regardless if only the 1st page are all pages) for a document larger than 3000 pages will cause an OOM exception on 32bit JRE with 1Gb of heap available. The test code is attached if needed.
From the discussions on the other thread I can assume the problem is caused by Aspose creating the APS (Aspose Page Specification) model in memory. And while I understand why this is happening and the technical challenges, this is a serious limitation of the updateFields functionality in addition to the PDF save.
Regards,
Dragos
System.ArgumentException while saving odt document
Replace a word using an image
Adding Watermark to Document
Corrupted RTF with footnote
Aspose Words for Java (tested v16.11.0) cannot read the following RTF file.
"com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded."
We don't know how this file was created, but it seems that there is an invalid footnote on a table cell (and not a paragraph). MsWord 2010 seems to keep this footnote while saving to RTF. However it is lost when saved to DOCX.
I suggest that such invalid footnotes are ignored, like MsWord does when saving to DOCX.
Thanks
Romain
Re: The document appears to be corrupted and cannot be loaded
Replacing Error
Images in table cells
Charts in Word
border-collapse and border-spacing seems to be ignored
import com.aspose.words.Document;
import com.aspose.words.DocumentBuilder;
/**
* Created by Alexander.Joerg on 09.11.2016.
*/
public class AsposeTest {
public static void main(String[] args) {
try {
String html = "<style>\n" +
" #article {\n" +
" width: 100%;\n" +
" border-collapse: separate;\n" +
" border-spacing: 5px\n" +
" }\n" +
"\n" +
" #article td, #article th {\n" +
" font-size: 1em;\n" +
" border: 1px solid #98bf21;\n" +
" padding: 3px 7px 2px 7px;\n" +
" }\n" +
"\n" +
" #article th {\n" +
" font-size: 1.1em;\n" +
" text-align: left;\n" +
" padding-top: 5px;\n" +
" padding-bottom: 4px;\n" +
" background-color: #a7c942;\n" +
" color: #fff;\n" +
" }\n" +
"\n" +
" #article tr.alt td {\n" +
" color: #000;\n" +
" background-color: #eaf2d3;\n" +
" }\n" +
"</style>\n" +
"<table id=\"article\">\n" +
" <tr>\n" +
" <th>Position</th>\n" +
" <th>Article</th>\n" +
" <th>Desc</th>\n" +
" <th>Tax</th>\n" +
" <th>Amount</th>\n" +
" <th>Unitcost\n</th>\n" +
" <th>TotalPrice</th>\n" +
" </tr>\n" +
" <tr class=\"alt\">\n" +
" <td>1</td>\n" +
" <td>0000001</td>\n" +
" <td>Table</td>\n" +
" <td>19,00</td>\n" +
" <td>1 ST</td>\n" +
" <td>250 EUR</td>\n" +
" <td>250 EUR</td>\n" +
" </tr>\n" +
" <tr>\n" +
" <td>2</td>\n" +
" <td>0000002</td>\n" +
" <td>Bench</td>\n" +
" <td>19,00</td>\n" +
" <td>2 ST</td>\n" +
" <td>100 EUR</td>\n" +
" <td>200 EUR</td>\n" +
" </tr>\n" +
" <tr class=\"alt\">\n" +
" <td></td>\n" +
" <td></td>\n" +
" <td></td>\n" +
" <td></td>\n" +
" <td></td>\n" +
" <td><b>TotalPrice</b></td>\n" +
" <td><b>450 EUR</b></td>\n" +
" </tr>\n" +
"</table>\n";
Document document = new Document();
DocumentBuilder documentBuilder = new DocumentBuilder(document);
documentBuilder.insertHtml(html);
document.save("C:\\temp\\testDocument.docx");
document.save("C:\\temp\\testDocument.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Copying Sections - How to reduce space between sections?
for (int i = 0; i < cSections.Length; i++)
{
Section sect = cSections[i];
if (!sect.Equals(contentPage.FirstSection))
{
contentPage.FirstSection.AppendContent(sect);
}
}
while (!contentPage.LastSection.Equals(contentPage.FirstSection))
{
contentPage.LastSection.Remove();
}