This is a Java implementation of the WHATWG URL Living Standard.
The main advantage of the WHATWG URL standard is that it fixes the various shortcomings and quirks
of java.net.URL, RFC 3986, RFC 3987, etc.
- parse URLs into an easy-to-use representation
- read and set individual URL components (scheme, host, port, path, query, fragment, …)
- work with URL search parameters via a dedicated
UrlSearchParamsAPI - report validation errors and failures as defined by the WHATWG specification
This is the code repository of the URL support used by HtmlUnit.
The library was forked from the whatwg-url project
by Stephane Bastian.
The code has been adapted to match HtmlUnit's code style rules and the Gson dependency has been
removed.
The code is being expanded, restructured and improved primarily to meet the requirements of this
project.
It is in sync with this specific WHATWG commit (27 September 2023).
❤️ Sponsor
HtmlUnit@mastodon | HtmlUnit@bsky | HtmlUnit@Twitter
Add to your pom.xml:
<dependency>
<groupId>org.htmlunit</groupId>
<artifactId>htmlunit-url</artifactId>
<version>4.0.0</version>
</dependency>Add to your build.gradle:
implementation group: 'org.htmlunit', name: 'htmlunit-url', version: '4.0.0'The WHATWG URL Living Standard is the authoritative reference for
how URLs should be parsed and serialized across browsers. It supersedes RFC 3986, RFC 3987 and the
older java.net.URL behavior and fixes many long-standing inconsistencies.
The library is rather slim (~55 KB), has only one dependency (on ICU4J) and ships with 3 000+ tests. Some test data are borrowed from Web Platform Tests, the cross-browser test suite used by Safari, Chrome, Firefox and Edge.
As a side note, there is a basic benchmark (built with JMH) that iterates over 500+ typical URLs and measures throughput (~350 000 ops/s on an AMD Ryzen 5, but your mileage may vary).
import org.htmlunit.url.Url;
Url url = Url.create("http://www.myurl.com/path1?a=1&b=2#hash1");
System.out.println(url.hash()); // #hash1
System.out.println(url.host()); // www.myurl.com
System.out.println(url.hostname()); // www.myurl.com
System.out.println(url.href()); // http://www.myurl.com/path1?a=1&b=2#hash1
System.out.println(url.origin()); // http://www.myurl.com
System.out.println(url.password()); //
System.out.println(url.pathname()); // /path1
System.out.println(url.port()); //
System.out.println(url.protocol()); // http:
System.out.println(url.search()); // ?a=1&b=2
System.out.println(url.searchParams()); // a=1&b=2
System.out.println(url.username()); //import org.htmlunit.url.Url;
Url url = Url.create("path1?a=1&b=2#hash1", "http://www.myurl.com/path2?c=3&d=2#hash2");
System.out.println(url.hash()); // #hash1
System.out.println(url.host()); // www.myurl.com
System.out.println(url.hostname()); // www.myurl.com
System.out.println(url.href()); // http://www.myurl.com/path1?a=1&b=2#hash1
System.out.println(url.origin()); // http://www.myurl.com
System.out.println(url.password()); //
System.out.println(url.pathname()); // /path1
System.out.println(url.port()); //
System.out.println(url.protocol()); // http:
System.out.println(url.search()); // ?a=1&b=2
System.out.println(url.searchParams()); // a=1&b=2
System.out.println(url.username()); //import org.htmlunit.url.Url;
Url url = Url.create("http://www.myurl.com/path1?a=1&b=2#hash1");
// hash
System.out.println(url.hash()); // #hash1
url.hash("hash2");
System.out.println(url.hash()); // #hash2
// host
System.out.println(url.host()); // www.myurl.com
url.host("anotherhost.io");
System.out.println(url.host()); // anotherhost.io
// set other properties such as username, password, pathname, port, protocol, etc.The specification defines ValidationError. A validation error does not stop processing the URL unless it is also a failure.
If a failure occurs when calling Url.create(…) a ValidationException is thrown.
However, if a failure occurs when calling a setter, an exception is not thrown — the
ValidationError is instead added to Url.validationErrors().
import org.htmlunit.url.Url;
import org.htmlunit.url.ValidationError;
// failure during creation → ValidationException thrown
Url url = Url.create("https://exa%23mple.org"); // throws ValidationException
// failure during a setter → recorded in validationErrors()
Url url2 = Url.create("http://www.myurl.com");
url2.host("1.2.3.4.5");
System.out.println(url2.validationErrors().get(0)); // IPV4_TOO_MANY_PARTSpublic interface Url {
static boolean canParse(String url);
static boolean canParse(String url, String baseUrl);
static Url create();
static Url create(String input);
static Url create(String input, String baseUrl);
String hash();
Url hash(String value);
String host();
Url host(String value);
String hostname();
Url hostname(String value);
String href();
Url href(String value);
String origin();
String password();
Url password(String value);
String pathname();
Url pathname(String value);
String port();
Url port(String value);
String protocol();
Url protocol(String value);
String search();
Url search(String value);
UrlSearchParams searchParams();
String toJSON();
String username();
Url username(String value);
// not in the spec, but very useful to list validation errors when parsing
// the initial raw URL or when setting properties.
List<ValidationError> validationErrors();
}
public interface UrlSearchParams {
UrlSearchParams append(String name, String value);
Collection<String> delete(String name);
boolean delete(String name, String value);
UrlSearchParams entries(BiConsumer<String, String> consumer);
String get(String name);
Collection<String> getAll(String name);
boolean has(String name);
boolean has(String name, String value);
UrlSearchParams set(String name, String value);
int size();
UrlSearchParams sort();
}
public enum ValidationError {
// ... various enum constants ...
String description();
boolean isFailure();
}The latest builds are available from our Jenkins CI build server
If you use Maven please add:
<dependency>
<groupId>org.htmlunit</groupId>
<artifactId>htmlunit-url</artifactId>
<version>5.0.0-SNAPSHOT</version>
</dependency>
You have to add the Sonatype Central snapshot repository to the repositories section of your
pom.xml:
<repositories>
<repository>
<name>Central Portal Snapshots</name>
<id>central-portal-snapshots</id>
<url>https://central.sonatype.com/repository/maven-snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
You need:
- Java 17 or later
- A local Maven installation
Create a local clone of the repository and you are ready to start.
Open a command line window from the root folder of the project and call
mvn compile
mvn test
Pull Requests and all other Community Contributions are essential for open source software. Every contribution — from bug reports to feature requests, typos to full new features — is greatly appreciated.
This part is intended for committers who are packaging a release.
- Check all your files are checked in
- Execute these Maven commands to be sure all tests are passing and everything is up to date
mvn versions:display-plugin-updates
mvn versions:display-dependency-updates
mvn -U clean test
-
Update the version number in
pom.xmlandREADME.md -
Commit the changes
-
Build and deploy the artifacts
mvn -up clean deploy
-
Go to Maven Central Portal and process the deploy
- publish the package and wait until it is processed
-
Create the version on GitHub
- Login to GitHub and open the project https://github.com/HtmlUnit/htmlunit-url
- Click Releases > Draft new release
- Fill the tag and title fields with the release number (e.g. 4.0.0)
- Append
- htmlunit-url-4.x.x.jar
- htmlunit-url-4.x.x.jar.asc
- htmlunit-url-4.x.x.pom
- htmlunit-url-4.x.x.pom.asc
- htmlunit-url-4.x.x-javadoc.jar
- htmlunit-url-4.x.x-javadoc.jar.asc
- htmlunit-url-4.x.x-sources.jar
- htmlunit-url-4.x.x-sources.jar.asc
- And publish the release
-
Update the version number in
pom.xmlto start the next snapshot development -
Update the HtmlUnit
pom.xmlto use the new release
- RBRi
- Stephane Bastian and all contributors to whatwg-url
This project is licensed under the Apache 2.0 License
Many thanks to all of you contributing to whatwg-url in the past.
Special thanks to:
JetBrains for providing IntelliJ IDEA under their
open source development license and