background
the reach of your content is a key consideration when running a technical blog. in particular, content written only in Korean has a limited reach, which limits knowledge sharing with developers around the world. in fact, developer demographics show that South Korea ranks in the bottom 15 of the top countries for developers, further emphasizing the need for multilingual support.
(See: μ λ¬Έ κ°λ°μ μ κΈ°μ€ μμ 15κ°κ΅)
the need for automation
modern AI translation and traditional translation tools provide fairly accurate translations. however, it's inefficient to manually translate and upload every time you write a new article. especially if you want to provide not only English, but also Chinese and Japanese versions, which have a large number of developers, it's not practical to do it manually.
to solve this problem, we implemented an automatic translation script that utilizes the DeepL API.
development Environment Configuration
We installed the necessary packages to run Node.js scripts in the Next.js environment.
pnpm install deepl-node dotenv tsx
initially, I tried to use ts-node
, but it had configuration conflicts with the Next.js environment. instead, we set up a standalone execution environment using the tsx
library.
project Structure
first of all, my project has roughly the following structure.
.
βββ src/
β βββ app/
β βββ posts/
β βββ [slug]/
β βββ page.tsx
βββ posts/
β βββ post1.mdx
βββ package.json
then I created a script like this
import fs from "fs/promises";
import path from "path";
import * as deepl from "deepl-node";
import matter from "gray-matter";
import dotenv from "dotenv";
dotenv.config();
const DEEPL_API_KEY = process.env.DEEPL_API_KEY!;
const translator = new deepl.Translator(DEEPL_API_KEY);
const SOURCE_DIR = "src/posts";
const TARGET_DIR = "src/posts/en";
interface PostContent {
content: string;
data: {
title: string;
description: string;
[key: string]: string;
};
}
async function translatePost(content: {
data: { [p: string]: string };
content: string;
}): Promise<PostContent> {
const translatedTitle = await translator.translateText(
content.data.title,
"ko",
"en-US",
);
const translatedDescription = await translator.translateText(
content.data.description,
"ko",
"en-US",
);
const translatedContent = await translator.translateText(
content.content,
"ko",
"en-US",
);
return {
content: translatedContent.text,
data: {
...content.data,
title: translatedTitle.text,
description: translatedDescription?.text,
originalLang: "ko",
},
};
}
async function processFile(filename: string) {
try {
const sourcePath = path.join(SOURCE_DIR, filename);
const targetPath = path.join(TARGET_DIR, filename);
// νμΌ μ‘΄μ¬ μ¬λΆ νμΈ
try {
await fs.access(sourcePath);
} catch (error) {
throw new Error(`νμΌμ μ°Ύμ μ μμ΅λλ€: ${filename}`);
}
// MDX νμΌ μ½κΈ°
const fileContent = await fs.readFile(sourcePath, "utf-8");
const { data, content } = matter(fileContent);
// λ²μ μ€ν
console.log(`${filename} λ²μ μ€...`);
const translated = await translatePost({ data, content });
// λ²μλ MDX νμΌ μμ±
const translatedFileContent = matter.stringify(
translated.content,
translated.data,
);
await fs.mkdir(TARGET_DIR, { recursive: true });
await fs.writeFile(targetPath, translatedFileContent);
console.log(`${filename} λ²μ μλ£!`);
} catch (error) {
console.error(`Error:`, error);
process.exit(1);
}
}
// λͺ
λ Ήμ€ μΈμμμ νμΌλͺ
κ°μ Έμ€κΈ°
const filename = process.argv[2];
if (!filename) {
process.exit(1);
}
// νμΌ νμ₯μ νμΈ
if (!filename.endsWith(".mdx")) {
console.error("Error: MDX νμΌλ§ μ§μλ©λλ€.");
process.exit(1);
}
processFile(filename);
add the script frompackage.json
as well.
{
"scripts": {
"translate": "tsx scripts/translate-posts.ts"
}
}
running Result
now, if you type the command in the terminal
this will generate a translated MDX file.
problems
in our initial implementation, we sent the text of the MDX file directly to the DeepL API, but we found the following issues
- breaking Markdown syntax
- unnecessary translation of code blocks
- distorted image tag and link structure
- original
- japanese translation
workaround
i was wondering what to do and came up with the following ideas. first, I saw the part HTML handlingin the DeepL API documentation and realized that sending the text as HTML seemed to handle it well without breaking the form.
therefore, we implemented the following improved process to solve the problem mentioned above.
- MDX to HTML conversion
- Sending the DeepL API with the option to preserve HTML tags
- translated HTML β MDX reconversion
const convertMDXToHtml = async (markdown: string) => {
try {
const html = await unified()
.use(remarkParse)
.use(remarkHtml)
.process(markdown);
return html.toString();
} catch (err) {
console.error("MD => HTML λ³ν μ€ μ€λ₯κ° λ°μνμ΅λλ€: ");
return "error";
}
};
const convertHtmlToMDX = async (html: string) => {
try {
const markdown = await unified()
.use(rehypeParse)
.use(rehypeRemark)
.use(remarkStringify)
.process(html);
return markdown.toString();
} catch (err) {
console.error("HTML => MD λ³ν μ€ μ€λ₯κ° λ°μνμ΅λλ€: ");
return "error";
}
};
async function translatePost(
content: { data: { [p: string]: string }; content: string },
targetLang: TargetLanguageCode,
): Promise<PostContent> {
// μ€λ΅...
// md => html
const html = await convertMDXToHtml(content.content);
// html => translated html
const translatedContent = await translator.translateText(
html,
"ko",
targetLang,
{
tagHandling: "html",
},
);
// translated html => md
const mdx = await convertHtmlToMDX(translatedContent.text);
return {
content: mdx,
// ...
};
}
now the form is coming in correctly!
closing thoughts
with this automation implementation, we have completed the process of deploying multilingual versions of our blog posts without any difficulty. i'll continue to verify that the translated posts are translated as I intended, but for now, I'm excited to see if creating static files in multiple languages actually brings in more traffic for my SEO efforts!