Regex pattern to grab url not working

ntstha

Let me get straight to my problem.

public static final String EXAMPLE_TEST = "<span id=\"lblObject\"><a href=\"http://www.guideline.gov/content.aspx?id=15135\" alt=\"View object\">Manual medicine guidelines for musculoskeletal injuries.</a></span>";

    //public static final String EXAMPLE_TEST ="<a href=\"http://www.guideline.gov/content.aspx?id=1112\"></a>";
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("<a href=\"http://www.guideline.gov/content.aspx?id=(\\d+)\"");
        // in case you would like to ignore case sensitivity,
        // you could use this statement:
        // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(EXAMPLE_TEST);
        // check all occurance
        while (matcher.find()) {
            System.out.print("Start index: " + matcher.start());
            System.out.print(" End index: " + matcher.end() + " ");
            System.out.println(matcher.group());
        }


    }

There is some problem with the regex. The example string I have used is just a dummy string. Actually I will have a html file in which there are many url links which have the following pattern http://www.guideline.gov/content.aspx?id=some_number. I need to grab those links from that html file. Please guys can you help me find whats wrong with my regex.

Amit Kumar user3037405

Use the below program.

String htmlText = "<span id=\"lblObject\"><a href=\"http://www.guideline.gov/content.aspx?id=15135\" alt=\"View object\">Manual medicine guidelines for musculoskeletal injuries.</a></span>";
    Pattern pattern = Pattern.compile( "href=\"(http://www.guideline.gov/content.aspx\\?id=.*?)\"" );

    Matcher matcher = pattern.matcher( htmlText );
    while ( matcher.find() )
    {
        String matchedText = matcher.group( 0 );
        Pattern p = Pattern.compile("href=\"(.*?)\"");
        Matcher m = p.matcher(matchedText);
        String url = null;
        if (m.find()) {
            url = m.group(1);
            System.out.println(url);
        }
    }

// output : http://www.guideline.gov/content.aspx?id=15135

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related