PHP – get domain name from URL

Getting the domain name from an URL (in PHP) seems to be a trivial task at first, but after further consideration you start to see the fine print. Because top level domains (TLDs) can have an arbitrary amount of words (for example there is .com, .co.uk., .lib.ca.us, and so on), it’s not possible to tell where the domain name ends and the TLD begins, without having a full list of all TLDs.

Fortunately the Mozilla foundation has compiled and published such a list here, under the Mozilla Public License v 2.0. Since it’s a very big list, and removing the comments from it when executing would consume noticeable resources, I’ve further processed it and made a version without the comments that can be used quickly by PHP. You can either make a processed version of the TLD list yourself, by removing all the comments from the file provided by Mozilla, or you can download the processed list of TLDs I made here. Note however that my processed list of TLDs is not kept up to date with the latest TLDs that might be released.

One last note before going into the code: for linux hosting systems, I’ve set it up so that it saves the TLD list in the RAM memory by using the “/dev/shm” drive. This is done so that in high-usage situations, it doesn’t have to populate the array of TLDs from the hard-drive file every time the function is requested. You can disable this behaviour by introducing another value for the $hosting_type parameter, such as for example “linux_custom” or “windows”.

try
	{
		$tld_list_filename_to_be_used = $tld_list_filename;
		
		if ($hosting_type == 'linux')
		{
			// if the hosting environment is linux, and php has access to /dev/shm/
			// then save the tld_list there in order to buffer it in RAM memory and greatly increase the speed of repeated calls to this function
			$memory_filename = "/dev/shm/tld_list_processed.txt";
			if (!file_exists($memory_filename)) copy($tld_list_filename, $memory_filename);	
			$tld_list_filename_to_be_used = $memory_filename;
		}
		
		
		$fh = flockopen($tld_list_filename_to_be_used, "r");
		$domain_suffixes = array();
		if ($fh) 
		{
			while (($line = fgets($fh)) !== false) 
			{
				$domain_suffixes[rtrim($line)] = true;
			}
			flockclose($fh);
		}
		
		$hostname = get_hostname_from_url($url);
		$domain = $hostname;
			
		// DEBUG thgat tests if the suffix list is properly initialized
		// print_r($domain_suffixes);
		
		// try to get domain name without subdomain
		// this can be done by finding the longest TLD suffix that fits
		$hostname_components = explode(".", $hostname);
		$count_hostname_components = count($hostname_components);
		for ($i = 0; $i < $count_hostname_components; $i++)
		{
			$potential_domain_suffix = '';
			$first = true;
			for ($j = ($i+1); $j < $count_hostname_components; $j++)
			{
				if (!$first) $potential_domain_suffix .= ".";
				$potential_domain_suffix .= $hostname_components[$j];
				if ($first) $first = false;
			}
			
			
			if (isset($domain_suffixes[$potential_domain_suffix]))
			{
				$domain = $hostname_components[$i].".".$potential_domain_suffix;
				break;
			}
		}
		
		return $domain;
	}
	catch (Exception $e)
	{
		 error_log($e -> getMessage());
		 return "";
	}

Leave a Reply

Your email address will not be published. Required fields are marked *

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close