TL;DR
Before you publish AI-generated text, check for: hidden Unicode, raw Markdown, em-dashes, smart quotes, reasoning blocks, template placeholders, and spacing issues. TextPurify runs all seven checks in one click. This guide explains exactly what each problem is and where it causes damage.
You have a 1,200-word article from ChatGPT, a product description from Claude, or a landing page draft from Jasper. It reads well. The brief is covered. You are ready to publish. But if you copy and paste directly from the AI interface into your CMS, email tool, or document, you are almost certainly carrying invisible problems into production.
These are not rare edge cases. Every major AI writing tool outputs text that looks clean in its own interface but is loaded with formatting conventions, invisible characters, and special Unicode that become visible — or cause errors — the moment the text leaves the AI app. This happens because AI tools are built to render their own output, not to deliver clean plain text to every destination you might need.
Why AI Text Needs Cleaning
Three things are going on simultaneously when an AI tool writes for you:
First, the model is trained to output Markdown because its own interface renders it. The**bold**, *italic*, and ## Heading formatting that looks perfect in ChatGPT is raw syntax everywhere else.
Second, the model is trained on professionally typeset text — books, articles, web pages — and replicates their typographic conventions. Em-dashes (—), curly quotes (“text”), and the Unicode ellipsis character (…) are all standard in typeset prose but cause specific downstream problems in email clients, CMSs, and code.
Third, the model embeds invisible characters from its training data: zero-width spaces, non-breaking spaces, byte-order marks. Some models — DeepSeek in particular — add them as watermarks. These characters are undetectable by eye but break word counts, CMS validation, database imports, and string comparisons.
The 7-Point Cleaning Checklist
1. Hidden Zero-Width Characters
What to look for: U+200B (zero-width space), U+FEFF (byte-order mark), U+00A0 (non-breaking space), U+00AD (soft hyphen), and related invisible Unicode.
Why it matters: They break word counts, make find-and-replace fail, corrupt CMS text field validation, cause database query mismatches, and can be used for prompt injection attacks in AI-assisted workflows. You cannot see them — you can only see the damage they cause.
In TextPurify: Enable "Hidden chars" (on by default). All invisible Unicode is stripped in one pass. Non-breaking spaces are converted to regular spaces.
2. Markdown Formatting Syntax
What to look for: **bold**, *italic*, ## Headings, - bullet lists, `inline code`, ```code blocks```
Why it matters: Markdown renders correctly only in tools built to render it. In WordPress, Gmail, HubSpot, Mailchimp, Microsoft Word, and plain-text editors, the raw syntax is visible. A blog post with ## Introduction at the top looks broken to every reader.
In TextPurify: Enable "Strip markdown" to remove all Markdown syntax while keeping the text. Enable "Remove asterisks" for an additional pass that removes every * character, including bullet-point symbols that Strip Markdown might leave as hyphens.
3. Em-Dashes and Typographic Punctuation
What to look for: — (U+2014 em-dash), – (U+2013 en-dash), … (U+2026 Unicode ellipsis), and curly quotes (“” ‘’)
Why it matters: Em-dashes cause layout issues in Webflow, Squarespace, and HubSpot rich-text fields. Curly quotes fail JSON parsing, corrupt HTML attribute values, and break SQL strings. The Unicode ellipsis is counted as one character by CMSs with character-count limits, versus three characters for three periods — causing unpredictable behavior in length-constrained fields.
In TextPurify: Enable "Dashes," "Quotes," and "Ellipsis." All three are normalized to plain ASCII equivalents: em-dashes to spaced hyphens, curly quotes to straight quotes, Unicode ellipsis to three dots.
4. AI Reasoning Blocks
What to look for: <think>...</think>, <thinking>...</thinking>, <reasoning>...</reasoning>
Why it matters: DeepSeek R1 and Claude with extended thinking generate chain-of-thought reasoning that can run hundreds of lines. This block is hidden in the AI interface but appears in full when you access the API, use third-party tools, or copy from certain automation workflows. Including it in published content is always an error.
In TextPurify: Enable "Remove think blocks." The entire reasoning section is deleted; the final answer that follows </think> is preserved exactly.
5. Template Placeholders
What to look for: [KEYWORD], [BRAND], {{first_name}}, [INSERT NAME], [CTA]
Why it matters: Jasper, some custom ChatGPT prompts, and workflow templates use placeholder variables. When these go unfilled and you publish without catching them, you end up with "Buy [PRODUCT] today" on a live page or "Hi {{first_name}}," in a sent email.
In TextPurify: Enable "Template tags" to strip all [UPPERCASE_BRACKET] patterns and {{handlebars}} variables. Review the cleaned output before publishing to confirm no legitimate content was caught.
6. Line Breaks and Trailing Whitespace
What to look for: Windows-style CRLF line endings mixed with Unix LF, trailing spaces at line ends, three or more consecutive blank lines
Why it matters: CRLF line endings cause display issues when AI text is processed by Unix-based systems or pasted into plain-text fields. Trailing whitespace inflates character counts in strict CMS fields. Excessive blank lines add unwanted vertical space in editors that do not collapse them.
In TextPurify: Enable "Line breaks" and "Trailing spaces" (both on by default). All line endings are normalized to LF; trailing whitespace is trimmed from every line.
7. Sentence Spacing and Punctuation Gaps
What to look for: Missing space after a period, double spaces between words, space before a comma or period
Why it matters: AI models occasionally drop a space after a period when transitioning between sentences, or insert a space before punctuation. These are minor but they accumulate in long documents and look unprofessional in published content.
In TextPurify: Enable "Fix spacing" to normalize sentence and punctuation spacing automatically.
Cleaning for Different Destinations
Different publishing destinations have different tolerances for the issues above. A quick reference:
- WordPress / CMS: Run everything. These platforms are the most sensitive to encoding noise, Markdown syntax, and invisible characters. Use Writer mode.
- Gmail / email clients: Run hidden chars, dashes, quotes, and fix spacing. Turn off Strip Markdown if you want bullet points (they will be formatted by Gmail). Enable Remove Asterisks to kill bold markers.
- HubSpot / Mailchimp: Same as CMS. Full cleaning. These tools have strict encoding requirements and behave unpredictably with Unicode formatting.
- Microsoft Word / Google Docs: Use Writer mode. Strip Markdown, normalize typography, remove hidden chars. Word and Docs will handle their own formatting from there.
- Notion: Notion renders Markdown natively. Leave Strip Markdown off if you want formatting preserved. Enable hidden char removal and quote normalization.
- Developer / code context: Use Developer mode. This preserves quotes and Markdown structure but removes zero-width characters and normalizes whitespace — keeping code-relevant syntax intact.
Building This Into Your Workflow
Manual cleaning — find-and-replacing Markdown, em-dashes, curly quotes, and invisible characters individually — takes five to ten minutes per document. For a content operation producing ten articles a week, that is meaningful overhead, and it is easy to miss items when doing it by hand.
The practical workflow: paste your AI output into TextPurify, select the mode that matches your destination, and click Clean. The full seven-point checklist runs in under three seconds. Copy the cleaned text and paste it into your CMS, email tool, or document. Done.
All processing happens in your browser. Your text never leaves your device, is never stored, and is never transmitted to any server. The only data that persists across sessions is your cleaning history and personal stats, both saved to your browser's localStorage.
For model-specific issues — DeepSeek <think> blocks, Claude <thinking> output, or ChatGPT-specific formatting — the dedicated pages for each model have presets configured for exactly what that tool produces.