Having problems while writing a .CSV file in Java?

Nicola C.

So, I'm trying to realize a program that downloads all the Issues from a GitHub repository and stores their IDs and their Bodies into a .CSV file. This is what I wrote and kinda works (it downloads the issues):

        FileWriter writer = new FileWriter("ISSUE-DOWNLOAD.csv");
        writer.append("Id \t Body Text");
        writer.append("\n");

        for (GHIssue issue : repository.getIssues(stateOpen)) {
            String body = issue.getBody(); 
            if( body!=null ) 
            {   
                writer.append(issue.getNumber() + "\t");
                writer.append(body + "\t");
            }
            writer.append("\n");
        }

The problem is that I'm not truly creating a .CSV file where a every row has an ID and a BODY in two columns, but I'm creating a file that puts the body wherever.

I think that the problem might be the MarkDown language of GitHub issues and the fact that excel may not read non UTF-8 characters. The CSV is full of "???????" indeed. And if I try reading the file through Python, I get UTF-8 DECODING ERRORS:

    df = pd.read_csv('ISSUE-DOWNLOAD.csv', sep='\t', na_values='n/a')

File "pandas_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit File "pandas_libs\parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header File "pandas_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 73222: invalid start byte

Does anybody know how can I handle this? Thanks so much in advance!

josejuan

Never use your custom parser or writer for a non-trivial format. Use an API like

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.9.0</version>
</dependency>

then, create with your required config (e.g. the default)

try (CSVPrinter printer = new CSVPrinter(new FileWriter("/tmp/uster_issues.csv"), CSVFormat.DEFAULT)) {
    printer.printRecord("number", "title", "createdAt", "body");
    for (GHIssue issue : repository.getIssues(GHIssueState.ALL))
        printer.printRecord(issue.getNumber(), issue.getTitle(), issue.getCreatedAt(), issue.getBody());
} catch (IOException ex) {
    ex.printStackTrace();
}

you can open the file setting the right CSV import options for example using LibreOffice

enter image description here

with your expected result

enter image description here

all fields (numbers, strings, dates and long string like body) have been imported.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related