Let me get straight to my problem.
public static final String EXAMPLE_TEST = "<span id=\"lblObject\"><a href=\"http://www.guideline.gov/content.aspx?id=15135\" alt=\"View object\">Manual medicine guidelines for musculoskeletal injuries.</a></span>";
//public static final String EXAMPLE_TEST ="<a href=\"http://www.guideline.gov/content.aspx?id=1112\"></a>";
public static void main(String[] args) {
Pattern pattern = Pattern.compile("<a href=\"http://www.guideline.gov/content.aspx?id=(\\d+)\"");
// in case you would like to ignore case sensitivity,
// you could use this statement:
// Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(EXAMPLE_TEST);
// check all occurance
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
}
There is some problem with the regex. The example string I have used is just a dummy string. Actually I will have a html file in which there are many url links which have the following pattern http://www.guideline.gov/content.aspx?id=some_number
. I need to grab those links from that html file. Please guys can you help me find whats wrong with my regex.
Use the below program.
String htmlText = "<span id=\"lblObject\"><a href=\"http://www.guideline.gov/content.aspx?id=15135\" alt=\"View object\">Manual medicine guidelines for musculoskeletal injuries.</a></span>";
Pattern pattern = Pattern.compile( "href=\"(http://www.guideline.gov/content.aspx\\?id=.*?)\"" );
Matcher matcher = pattern.matcher( htmlText );
while ( matcher.find() )
{
String matchedText = matcher.group( 0 );
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(matchedText);
String url = null;
if (m.find()) {
url = m.group(1);
System.out.println(url);
}
}
// output : http://www.guideline.gov/content.aspx?id=15135
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments