Unofficial Google Translate in C# and VB.NET

Let me first start by apologizing for being away for so long time without writing any posts. I was just not in the mood if you know what I mean.

But today I’m going to show you how to translate some text using Google Translate, right inside your own .NET programs.

I was in need of that feature today, and at first glance I just thought “sure Google is nice, they have an API for programmers like me”. Turns out they do, but that’s a JSON API intended for javascript. Yes yes, we can indeed access that in .NET but I didn’t want to start messing with JSON at the moment and I always like a challenge. So I begin to look into the possibilities for screen scraping, a process I’m quite comfortable with as I’ve done a lot of it.

But be warned. If Google changes anything in their layout your application is likely to break. I will not offer support on this, but If you ask nice I may be able to help you anyway.

For me that doesn’t really matter as it’s primarily intended for personal use on a project I’m constantly working on.

So let’s get coding. First take a look at translation. For testing purposes I will go from Danish to English. But normally Google hides the parameters inside Ajax and POST so I will help you do the hard work and show you this URL.

http://translate.google.com/?hl=en&ie=UTF8&text=Hej+verden&langpair=da|en


It’s the direct translation URL. Put in your own text instead of “Hej Verden”, change the langpair to suit your needs (da|en means from Danish to English) and you are good to
go.

Let’s take that into .NET. Today I’m going to present it in C# but there will be a VB example at the end.

Let’s start out by creating a new Windows Forms project and open code view. You should begin with some imports. There should be some auto generated ones, so just insert
this at the end of the imports.

using System.Net;
using System.Text.RegularExpressions;

After that we can create or initial function. It will take 3 parameters. “input” is gonna be the text to be translated, “langFrom” is the language to translate from in shortcode (like
da or en, for Danish or English) and “langTo” is gonna be the language to translate to, in the same format as “langFrom”.

public string TranslateText(string input, string langFrom, string langTo)
{
	//Function here
}

Now we can move on to the next step where we will actually fetch some data from Google. We will create a new instance of WebClient and add the appropriate headers to it
(to make sure Google is going to send us UTF-8 encoded text). From where we will take the parameters passed to the function and insert them into our previous translation
URL, and after that fetch the content.

public string TranslateText(string input, string langFrom, string langTo)
{
	//Defines a new WebClient
	WebClient Client = new WebClient();
	//Sets the client encoding to UTF8
	Client.Headers.Add("Charset", "text/html; charset=UTF-8");
	//Creates the string. And yes I prefer this over string.format ! ;)
	string downloadUrl = "http://www.google.com/translate_t?hl=da&ie=UTF8&text=" + input + "&langpair="+langFrom+"|"+langTo;
	//Downloads the string from the URL above
	string data = Client.DownloadString(downloadUrl);
	return data;
}

Now we have the data stored inside our “data” variable. Let’s just parse it real fast. We will begin by finding where the “resultbox” and afterwards parse our way trough until we
hit two “” right after each other (indicating the end of the resultbox).

public string TranslateText(string input, string langFrom, string langTo)
{
	//Defines a new WebClient
	WebClient Client = new WebClient();
	//Sets the client encoding to UTF8
	Client.Headers.Add("Charset", "text/html; charset=UTF-8");
	//Creates the string. And yes I prefer this over string.format ! ;)
	string downloadUrl = "http://www.google.com/translate_t?hl=da&ie=UTF8&text=" + input + "&langpair="+langFrom+"|"+langTo;
	//Downloads the string from the URL above
	string data = Client.DownloadString(downloadUrl);
	//Searches for the beginning of the resultbox and cuts everything away before that
	data = data.Substring(data.IndexOf("<span id=result_box")+19);
	//Finds the ending of the resultbox by searching for two spans right after each other
	data = data.Remove(data.IndexOf("</span></span>")+7);
	return data;
}

Now we have the contents of the “resultbox” (and a little of it’s beginning) and are ready to move on to the next step. Here we will use a regex for counting the occurences of
spans and afterwards loop through the entire datablock, extract each span and put the together in the variable “translatedText” and return it at the end.

public string TranslateText(string input, string langFrom, string langTo)
{
	//Defines a new WebClient
	WebClient Client = new WebClient();
	//Sets the client encoding to UTF8
	Client.Headers.Add("Charset", "text/html; charset=UTF-8");
	//Creates the string. And yes I prefer this over string.format ! ;)
	string downloadUrl = "http://www.google.com/translate_t?hl=da&ie=UTF8&text=" + input + "&langpair="+langFrom+"|"+langTo;
	//Downloads the string from the URL above
	string data = Client.DownloadString(downloadUrl);
	//Searches for the beginning of the resultbox and cuts everything away before that
	data = data.Substring(data.IndexOf("<span id=result_box")+19);
	//Finds the ending of the resultbox by searching for two spans right after each other
	data = data.Remove(data.IndexOf("</span></span>")+7);
	//Defines a new regex used for counting all spans inside the resultbox
	Regex spans = new Regex("<span");
	//Finds the count and puts it inside the variable spanOccurences
	int spanOccurences = spans.Matches(data).Count;
	//Defines an empty string for use in the for loop
	string translatedText = "";
	//Extract each tiny bit of text from each span in the resultbox
	for (int i = 0; i < spanOccurences; i++)
	{
		//Defines currentBlock and sets it to everything which comes after the first "<span"
		string currentBlock = data.Substring(data.IndexOf("<span") + 5);
		//Finds the ending of the current span and removes everything after that
		currentBlock = currentBlock.Remove(currentBlock.IndexOf("</span>"));
		//Goes back to the beginning and cleans everything from inside the first span
		currentBlock = currentBlock.Substring(currentBlock.IndexOf(">") + 1);
		//Removes the current processed span from the beginning of the data for next extraction
		data = data.Substring(data.IndexOf("</span>") + 7);
		//Adds the extracted text to the translatedText variable
		translatedText += currentBlock;
	}
	//Returns the translated text
	return translatedText;
}

And the full code in C# looks like.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Net;
using System.Text.RegularExpressions;
namespace TranslateScraper
{
	public partial class Form1 : Form
	{
		public Form1()
		{
			InitializeComponent();
		}
		private void Form1_Load(object sender, EventArgs e)
		{
			MessageBox.Show(TranslateText("Tillykke! Dine programmer kan nu bruge Google Oversæt", "da", "en"));
		}
		/// <summary>
		/// Translates a text using screenscaping on Google Translate
		/// </summary>
		/// <param name="input">The string to translate</param>
		/// <param name="langFrom">The language to translate from. Fx "en" for English or "da" for Danish</param>
		/// <param name="langTo">The language to translate to in the same format as langFrom</param>
		/// <returns></returns>
		public string TranslateText(string input, string langFrom, string langTo)
		{
			//Defines a new WebClient
			WebClient Client = new WebClient();
			//Sets the client encoding to UTF8
			Client.Headers.Add("Charset", "text/html; charset=UTF-8");
			//Creates the string. And yes I prefer this over string.format ! ;)
			string downloadUrl = "http://www.google.com/translate_t?hl=da&ie=UTF8&text=" + input + "&langpair="+langFrom+"|"+langTo;
			//Downloads the string from the URL above
			string data = Client.DownloadString(downloadUrl);
			//Searches for the beginning of the resultbox and cuts everything away before that
			data = data.Substring(data.IndexOf("<span id=result_box")+19);
			//Finds the ending of the resultbox by searching for two spans right after each other
			data = data.Remove(data.IndexOf("</span></span>")+7);
			//Defines a new regex used for counting all spans inside the resultbox
			Regex spans = new Regex("<span");
			//Finds the count and puts it inside the variable spanOccurences
			int spanOccurences = spans.Matches(data).Count;
			//Defines an empty string for use in the for loop
			string translatedText = "";
			//Extract each tiny bit of text from each span in the resultbox
			for (int i = 0; i < spanOccurences; i++)
			{
				//Defines currentBlock and sets it to everything which comes after the first "<span"
				string currentBlock = data.Substring(data.IndexOf("<span") + 5);
				//Finds the ending of the current span and removes everything after that
				currentBlock = currentBlock.Remove(currentBlock.IndexOf("</span>"));
				//Goes back to the beginning and cleans everything from inside the first span
				currentBlock = currentBlock.Substring(currentBlock.IndexOf(">") + 1);
				//Removes the current processed span from the beginning of the data for next extraction
				data = data.Substring(data.IndexOf("</span>") + 7);
				//Adds the extracted text to the translatedText variable
				translatedText += currentBlock;
			}
			//Returns the translated text
			return translatedText;
		}
	}
}

And in VB.

Imports System.Text.RegularExpressions
Imports System.Net
Public Class Form1
	Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
		MsgBox(TranslateText("Tillykke! Dine programmer kan nu bruge Google oversæt", "da", "en"))
	End Sub
	''' <summary>
	''' Translates a text using screenscaping on Google Translate
	''' </summary>
	''' <param name="input">The string to translate</param>
	''' <param name="langFrom">The language to translate from. Fx "en" for English or "da" for Danish</param>
	''' <param name="langTo">The language to translate to in the same format as langFrom</param>
	''' <returns></returns>
	Public Function TranslateText(ByVal input As String, ByVal langFrom As String, ByVal langTo As String) As String
		'Defines a new WebClient
		Dim Client As New WebClient()
		'Sets the client encoding to UTF8
		Client.Headers.Add("Charset", "text/html; charset=UTF-8")
		'Creates the string. And yes I prefer this over string.format ! ;)
		Dim downloadUrl As String = "http://www.google.com/translate_t?hl=da&ie=UTF8&text=" & input & "&langpair=" & langFrom & "|" &
		'Downloads the string from the URL above
		Dim data As String = Client.DownloadString(downloadUrl)
		'Searches for the beginning of the resultbox and cuts everything away before that
		data = data.Substring(data.IndexOf("<span id=result_box") + 19)
		'Finds the ending of the resultbox by searching for two spans right after each other
		data = data.Remove(data.IndexOf("</span></span>") + 7)
		'Defines a new regex used for counting all spans inside the resultbox
		Dim spans As New Regex("<span")
		'Finds the count and puts it inside the variable spanOccurences
		Dim spanOccurences As Integer = spans.Matches(data).Count
		'Defines an empty string for use in the for loop
		Dim translatedText As String = ""
		'Extract each tiny bit of text from each span in the resultbox
		For i As Integer = 0 To spanOccurences - 1
			'Defines currentBlock and sets it to everything which comes after the first "<span"
			Dim currentBlock As String = data.Substring(data.IndexOf("<span") + 5)
			'Finds the ending of the current span and removes everything after that
			currentBlock = currentBlock.Remove(currentBlock.IndexOf("</span>"))
			'Goes back to the beginning and cleans everything from inside the first span
			currentBlock = currentBlock.Substring(currentBlock.IndexOf(">") + 1)
			'Removes the current processed span from the beginning of the data for next extraction
			data = data.Substring(data.IndexOf("</span>") + 7)
			'Adds the extracted text to the translatedText variable
			translatedText += currentBlock
	Next
	'Returns the translated text
	Return translatedText
End Function
End Class

And that concludes this tutorial. Thanks for reading, I hope to be back soon with some fresh new content, and possibly an article about my home automation system.

Leave a Reply