I want to extract domain name from uri.
For example, input to the regular expression may be of one of the below types
in all the cases the input should return test.net
Below is the code in implemented for my purpose
val re = "([http[s]?://[w{3}\\.]?]+)(.*)".r
But I didn't get expected result
below is my output
val re(prefix, domain) = "https://www.test.net"
prefix: String = https://www.t
domain: String = est.net
what is problem with my regular expression and how can I fix it?
what is problem with my regular expression and how can I fix it?
You are using a character class
[http.?://(www.)?]
This means:
h
t
t
.
?
:
/
/
(
w
w
w
.
)
?
It does not include an s
, so it will not match https://
.
It is not clear to me why you are using a character class here, nor why you are using duplicate characters in the class.
Ideally, you shouldn't try to parse URIs yourself; someone else has already done the hard work. You could, for example, use the java.net.URI
class:
import java.net.URI
val u1 = new URI("test.net")
u1.getHost
// res: String = null
val u2 = new URI("https://www.test.net")
u2.getHost
// res: String = www.test.net
val u3 = new URI("https://test.net")
u3.getHost
// res: String = test.net
val u4 = new URI("http://www.test.net")
u4.getHost
// res: String = www.test.net
val u5 = new URI("http://test.net")
u5.getHost
// res: String = test.net
Unfortunately, as you can see, what you want to achieve does not actually comply with the official URI syntax.
If you can fix that, then you can use java.net.URI
. Otherwise, you will need to go back to your old solution and parse the URI yourself:
val re = "(?>https?://)?(?>www.)?([^/?#]*)".r
val re(domain1) = "test.net"
//=> domain1: String = test.net
val re(domain2) = "https://www.test.net"
//=> domain2: String = test.net
val re(domain3) = "https://test.net"
//=> domain3: String = test.net
val re(domain4) = "http://www.test.net"
//=> domain4: String = test.net
val re(domain5) = "http://test.net"
//=> domain5: String = test.net
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments