Human translators, powered by technology

Machine translation generates a first draft. AI scoring highlights problem areas. Human translators make the final call.

Spanish Portuguese French Korean Chinese Vietnamese Arabic Haitian Creole Hindi Urdu

Hawk is the translation system built by the Center for Cooperative Media for NJ News Commons partner newsrooms. Professional human translators are the core of the process — machine translation and AI scoring are tools that make them faster and more effective, not replacements for human expertise. You submit an article. Hawk generates a machine draft, flags potential issues, and hands everything to a human translator who reviews, edits, and approves the final version before it reaches your readers.

The newsroom's API key is checked

Every partner newsroom has a unique API key — like a password that identifies them. Hawk checks the key instantly to confirm: is this a real partner? Do they still have translation credits available today? If yes, the job moves forward. If not, they get an immediate error message so they know what to fix.

A translation job is created and queued

Hawk records the job in its database — the original article text, the target language, the newsroom's preferences — and immediately replies to the newsroom with a job ID. The newsroom doesn't have to wait. They get that ID right away and can check back for results. The actual translation happens in the background.

The article is broken into translatable pieces

News articles aren't plain text — they contain HTML formatting (bold text, links, image captions, pull quotes). Hawk's segmenter carefully separates the readable text from the formatting code so only the words get translated, while all the layout instructions pass through untouched. After translation, the pieces are reassembled exactly as they were.

Local terms and dialect choices are locked in before translation

This is where local and cultural knowledge matters most. Hawk applies a custom journalism glossary before anything is sent to the translation engine. Proper nouns that shouldn't be translated — like "Board of Education," "Paterson," or "NJ Transit" — get locked in so they come through correctly in every language. The glossary also enforces regional dialect choices — using the Caribbean Spanish familiar to NJ's largely Puerto Rican and Dominican communities, not Peninsular Spanish from Spain. STNS colleagues maintain this glossary.

DeepL generates a machine draft

Hawk uses DeepL — one of the most accurate machine translation services available — to produce an initial draft for seven of the ten languages. This draft is a starting point for human translators, not the final product. DeepL was chosen because it handles journalistic prose better than most alternatives, giving human translators a stronger foundation to work from. For Haitian Creole, Hindi, and Urdu (which DeepL doesn't support as well), segments are flagged for extra human translator attention.

AI scoring highlights what needs human attention

After the machine draft is generated, Hawk runs each paragraph through an AI quality check using Claude (an AI model made by Anthropic). Claude reads both the original English and the machine-translated version and scores each paragraph on fluency (does it read naturally?) and accuracy (is the meaning preserved?). Anything scoring below a 3 out of 5 gets flagged so human translators know exactly where to focus their effort. This scoring exists to make human translators more efficient — not to replace their judgment.

The human-approved translation is delivered

For reviewed and certified tiers, the translation is delivered only after a human translator has reviewed and approved it. For the instant tier, the machine draft is delivered with AI quality scores so newsroom editors can assess it themselves. If the newsroom provided a webhook URL, Hawk sends the result there automatically. The newsroom can then publish it directly to their CMS. No manual copy-paste required.

Three levels of human involvement

Newsrooms choose the tier that fits their deadline and the sensitivity of the content. The reviewed and certified tiers — where human translators edit and approve the output — are the standard for publication-quality journalism.

Instant
~2 min, machine draft only
Machine draft plus AI quality scoring — no human translator review. Good for breaking news or lower-risk content where speed matters most. Quality scores are included so newsroom editors can assess the output themselves.
Reviewed
4–24 hours, one human translator
A professional human translator reviews and edits the machine draft before delivery. AI quality scores highlight which passages need the most attention. The translator sees both the original English and the machine draft side by side, and makes the final editorial decisions. The standard for publication-quality journalism.
Certified
24–72 hours, two human translators
One human translator reviews and edits. A second human translator certifies the final version. All edits are tracked for a full audit trail. For high-stakes content — legal matters, health information, elections coverage — where accuracy is critical and errors could cause real harm.

Ten languages, chosen for New Jersey's communities

The ten launch languages were selected based on NJ demographic data — specifically, the language communities most underserved by English-only local news coverage.

Language Native Code Community Status
Spanish Español ES Largest non-English speaking community in NJ Available
Portuguese Português PT Brazilian and Portuguese communities statewide Available
French Français FR West African diaspora communities Available
Korean 한국어 KO Bergen County Available
Chinese (Simplified) 中文 ZH Northern NJ Chinese communities Available
Vietnamese Tiếng Việt VI South Jersey, Hudson County Available
Arabic العربية AR Paterson, Edison Available
Haitian Creole Kreyòl ayisyen HT Large community in Newark, Trenton Limited
Hindi हिन्दी HI Edison, Middlesex County Limited
Urdu اردو UR Pakistani community, shared with Hindi speakers Limited

Languages marked "limited" are supported but the machine drafts require more human translator attention because the automated translation engine has weaker coverage for those language pairs. Human translator review is especially important for these languages.

Why local terms and dialect choices matter

Machine translation fails on local names — it may try to translate "Board of Education" literally, or anglicize "Paterson" into something unrecognizable. It also defaults to generic, often Peninsular, variants of a language instead of the regional dialect spoken by the community being served. NJ's Spanish-speaking population is predominantly Puerto Rican and Dominican; NJ's Portuguese speakers are largely Brazilian. The glossary enforces both correct proper nouns and the colloquial, regional, and cultural vocabulary choices that make a translation feel natural to local readers.

How glossary protection works

STNS maintains a list of terms — proper nouns, government titles, and regional vocabulary — that should always appear a specific way in each target language. Before translation, Hawk scans every paragraph and substitutes the correct version of each term. This includes enforcing dialect-appropriate word choices (e.g. "guagua" instead of "autobús" for Caribbean Spanish readers). Longer phrases take priority — "Board of Education" is matched before "Board" — so no phrase gets partially overwritten.

NJ Transit
NJ Transit
Brand name — passed through unchanged
Board of Education
Junta de Educación
Protected in Spanish — not auto-translated
Borough Council
Consejo Municipal
Official title — canonical translation enforced
Freeholder
Concejal del condado
NJ-specific government role
Assembly member
Miembro de la Asamblea
Legislative title — format controlled
Passaic County
Condado de Passaic
Place name — glossary controls the format
bus
guagua
Caribbean dialect — not "autobús" (Peninsular)
apartment
apartamento
Regional preference — not "piso" (Spain)

For STNS colleagues: the glossary is editable per-newsroom, per-language. If you spot a recurring translation error — whether a mistranslated proper noun or a dialect choice that doesn't match your community — adding a glossary entry fixes it for every future story from that partner.