I want to filtrate website content that I have stored in a String with StringUtils.
Got some problems with the libraries.
Java-code:
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.lang3.*;
public class URLConnectionReader {
public static void main(String[] args) {
String siteContent = getUrlContents("https://www.tradegate.de/indizes.php?buchstabe=A");
System.out.println(siteContent);
inputHandler(siteContent);
}
public static void inputHandler(String input) {
String str = StringUtils.substringBetween(input, "<a id=", "</a>");
System.out.println(str);
}
private static String getUrlContents(String theUrl)
{
StringBuilder content = new StringBuilder();
try
{
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null)
{
content.append(line + "\n");
}
bufferedReader.close();
}
catch(Exception e)
{
e.printStackTrace();
}
return content.toString();
}
}
The following steps were performed:
- Downloading commons-lang3-3.12.0-bin.zip
- Unpacking and saving the JAR-files to the eclipse directory
- Add the external libraries to the JAVA build path and apply changes
- Deleting and reassigning, restarting ECLIPSE
- autobuild function = on
Although it is referenced as an external library, this exception happens:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang3/StringUtils
at URLConnectionReader.inputHandler(URLConnectionReader.java:21)
at URLConnectionReader.main(URLConnectionReader.java:16)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 2 more
I searched some threads for troubleshooting, but i don´t get a clue at which point i made the mistake.

When parsing HTML, you should use a HTML-parser instead of trying manualy to manipulate using string methods or regex. Among many, Jsoup is one of the best known and in my opinion the most intuitive and easiest parser you can use when working with HTML using Java. Look at this examples to get familiar with the selector syntax or/and read the documentation of the Selector API
Get the jar or dependency from Maven central jsoup 1.15.3
Using Jsoup and assuming you are interessted in the content of the table body of that page from your question, something like below should give you a starting point:
Output: