How Hawk works — human translators powered by technology

Hawk is the translation system built by the Center for Cooperative Media for NJ News Commons partner newsrooms. Professional human translators are the core of the process — machine translation and AI scoring are tools that make them faster and more effective, not replacements for human expertise. You submit an article. Hawk generates a machine draft, flags potential issues, and hands everything to a human translator who reviews, edits, and approves the final version before it reaches your readers.

The newsroom's API key is checked

Every partner newsroom has a unique API key — like a password that identifies them. Hawk checks the key instantly to confirm: is this a real partner? Do they still have translation credits available today? If yes, the job moves forward. If not, they get an immediate error message so they know what to fix.

A translation job is created and queued

Hawk records the job in its database — the original article text, the target language, the newsroom's preferences — and immediately replies to the newsroom with a job ID. The newsroom doesn't have to wait. They get that ID right away and can check back for results. The actual translation happens in the background.

The article is broken into translatable pieces

News articles aren't plain text — they contain HTML formatting (bold text, links, image captions, pull quotes). Hawk's segmenter carefully separates the readable text from the formatting code so only the words get translated, while all the layout instructions pass through untouched. After translation, the pieces are reassembled exactly as they were.

Local terms and dialect choices are locked in before translation

This is where local and cultural knowledge matters most. Hawk applies a custom journalism glossary before anything is sent to the translation engine. Proper nouns that shouldn't be translated — like "Board of Education," "Paterson," or "NJ Transit" — get locked in so they come through correctly in every language. The glossary also enforces regional dialect choices — using the Caribbean Spanish familiar to NJ's largely Puerto Rican and Dominican communities, not Peninsular Spanish from Spain. STNS colleagues maintain this glossary.

DeepL generates a machine draft

Hawk uses DeepL — one of the most accurate machine translation services available — to produce an initial draft for seven of the ten languages. This draft is a starting point for human translators, not the final product. DeepL was chosen because it handles journalistic prose better than most alternatives, giving human translators a stronger foundation to work from. For Haitian Creole, Hindi, and Urdu (which DeepL doesn't support as well), segments are flagged for extra human translator attention.

AI scoring highlights what needs human attention

After the machine draft is generated, Hawk runs each paragraph through an AI quality check using Claude (an AI model made by Anthropic). Claude reads both the original English and the machine-translated version and scores each paragraph on fluency (does it read naturally?) and accuracy (is the meaning preserved?). Anything scoring below a 3 out of 5 gets flagged so human translators know exactly where to focus their effort. This scoring exists to make human translators more efficient — not to replace their judgment.

The human-approved translation is delivered

For reviewed and certified tiers, the translation is delivered only after a human translator has reviewed and approved it. For the instant tier, the machine draft is delivered with AI quality scores so newsroom editors can assess it themselves. If the newsroom provided a webhook URL, Hawk sends the result there automatically. The newsroom can then publish it directly to their CMS. No manual copy-paste required.

Three levels of human involvement

Newsrooms choose the tier that fits their deadline and the sensitivity of the content. The reviewed and certified tiers — where human translators edit and approve the output — are the standard for publication-quality journalism.

Instant

~2 min, machine draft only

Machine draft plus AI quality scoring — no human translator review. Good for breaking news or lower-risk content where speed matters most. Quality scores are included so newsroom editors can assess the output themselves.

Reviewed

4–24 hours, one human translator

A professional human translator reviews and edits the machine draft before delivery. AI quality scores highlight which passages need the most attention. The translator sees both the original English and the machine draft side by side, and makes the final editorial decisions. The standard for publication-quality journalism.

Certified

24–72 hours, two human translators

One human translator reviews and edits. A second human translator certifies the final version. All edits are tracked for a full audit trail. For high-stakes content — legal matters, health information, elections coverage — where accuracy is critical and errors could cause real harm.

Ten languages, chosen for New Jersey's communities

The ten launch languages were selected based on NJ demographic data — specifically, the language communities most underserved by English-only local news coverage.

Language	Native	Code	Community	Status
Spanish	Español	ES	Largest non-English speaking community in NJ	Available
Portuguese	Português	PT	Brazilian and Portuguese communities statewide	Available
French	Français	FR	West African diaspora communities	Available
Korean	한국어	KO	Bergen County	Available
Chinese (Simplified)	中文	ZH	Northern NJ Chinese communities	Available
Vietnamese	Tiếng Việt	VI	South Jersey, Hudson County	Available
Arabic	العربية	AR	Paterson, Edison	Available
Haitian Creole	Kreyòl ayisyen	HT	Large community in Newark, Trenton	Limited
Hindi	हिन्दी	HI	Edison, Middlesex County	Limited
Urdu	اردو	UR	Pakistani community, shared with Hindi speakers	Limited

Languages marked "limited" are supported but the machine drafts require more human translator attention because the automated translation engine has weaker coverage for those language pairs. Human translator review is especially important for these languages.

Why local terms and dialect choices matter

Machine translation fails on local names — it may try to translate "Board of Education" literally, or anglicize "Paterson" into something unrecognizable. It also defaults to generic, often Peninsular, variants of a language instead of the regional dialect spoken by the community being served. NJ's Spanish-speaking population is predominantly Puerto Rican and Dominican; NJ's Portuguese speakers are largely Brazilian. The glossary enforces both correct proper nouns and the colloquial, regional, and cultural vocabulary choices that make a translation feel natural to local readers.

How glossary protection works

STNS maintains a list of terms — proper nouns, government titles, and regional vocabulary — that should always appear a specific way in each target language. Before translation, Hawk scans every paragraph and substitutes the correct version of each term. This includes enforcing dialect-appropriate word choices (e.g. "guagua" instead of "autobús" for Caribbean Spanish readers). Longer phrases take priority — "Board of Education" is matched before "Board" — so no phrase gets partially overwritten.

NJ Transit

→

NJ Transit

Brand name — passed through unchanged

Board of Education

→

Junta de Educación

Protected in Spanish — not auto-translated

Borough Council

→

Consejo Municipal

Official title — canonical translation enforced

Freeholder

→

Concejal del condado

NJ-specific government role

Assembly member

→

Miembro de la Asamblea

Legislative title — format controlled

Passaic County

→

Condado de Passaic

Place name — glossary controls the format

bus

→

guagua

Caribbean dialect — not "autobús" (Peninsular)

apartment

→

apartamento

Regional preference — not "piso" (Spain)

For STNS colleagues: the glossary is editable per-newsroom, per-language. If you spot a recurring translation error — whether a mistranslated proper noun or a dialect choice that doesn't match your community — adding a glossary entry fixes it for every future story from that partner.