So, I'm trying to realize a program that downloads all the Issues from a GitHub repository and stores their IDs and their Bodies into a .CSV file. This is what I wrote and kinda works (it downloads the issues):
FileWriter writer = new FileWriter("ISSUE-DOWNLOAD.csv");
writer.append("Id \t Body Text");
writer.append("\n");
for (GHIssue issue : repository.getIssues(stateOpen)) {
String body = issue.getBody();
if( body!=null )
{
writer.append(issue.getNumber() + "\t");
writer.append(body + "\t");
}
writer.append("\n");
}
I think that the problem might be the MarkDown language of GitHub issues and the fact that excel may not read non UTF-8 characters. The CSV is full of "???????" indeed. And if I try reading the file through Python, I get UTF-8 DECODING ERRORS:
df = pd.read_csv('ISSUE-DOWNLOAD.csv', sep='\t', na_values='n/a')
File "pandas_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit File "pandas_libs\parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header File "pandas_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 73222: invalid start byte
Does anybody know how can I handle this? Thanks so much in advance!
Never use your custom parser or writer for a non-trivial format. Use an API like
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.9.0</version>
</dependency>
then, create with your required config (e.g. the default)
try (CSVPrinter printer = new CSVPrinter(new FileWriter("/tmp/uster_issues.csv"), CSVFormat.DEFAULT)) {
printer.printRecord("number", "title", "createdAt", "body");
for (GHIssue issue : repository.getIssues(GHIssueState.ALL))
printer.printRecord(issue.getNumber(), issue.getTitle(), issue.getCreatedAt(), issue.getBody());
} catch (IOException ex) {
ex.printStackTrace();
}
you can open the file setting the right CSV import options for example using LibreOffice
with your expected result
all fields (numbers, strings, dates and long string like body
) have been imported.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments