Click here to Skip to main content
15,126,209 members
Articles / Hosted Services / Microservices
Article
Posted 18 Nov 2021

Stats

5.3K views
1 bookmarked

Creating a OneNote Markdown Converter

Rate me:
Please Sign up or sign in to vote.
3.00/5 (1 vote)
18 Nov 2021CPOL8 min read
In this article of this series, we’ll use the Graph API client to consume OneNote documents through a microservice that allows them to be converted into Markdown format.
Here we’ll build a Spring Boot web app and microservice using the Graph API client to query for a list of OneNote notebooks, display an HTML preview of the notebook pages, and convert and download the pages as Markdown.

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

OneNote is a great tool for creating notes, either through a desktop application or via the online Office platform. These notes can be exported as Word or PDF documents, but many enterprises require content in other formats, like Markdown.

Let’s build on the work we did in the previous article. We’ll build a Spring Boot web app and microservice using the Graph API client to query for a list of OneNote notebooks, display an HTML preview of the notebook pages, and convert and download the pages as Markdown. We’ll see how teams can automate the process of converting content without needing to first download files as Word documents and convert them into secondary formats.

The Sample Application

You can find the source code for this sample application on GitHub. The MSALOneNoteConverter repo contains the frontend web app, and the MSALOneNoteBackend repo contains the microservice.

Build the Back-End Microservice Application

We’ll start with the back-end microservice. This is responsible for returning the list of notebooks to the front-end and providing the ability to convert notebook pages from HTML to Markdown.

Bootstrap the Spring Project

We’ll generate the initial application template using Spring Initalizr to create a Java Maven project, which generates a JAR file using the latest non-snapshot version of Spring against Java 11.

The microservice requires the following dependencies:

Image 1

Expose the Graph API Client

The first step is to expose the Graph API client — configured with the authentication provider we created in the previous article — as a bean. We’ll do this in the GraphClientConfiguration class in the following package:

Java
package com.matthewcasperson.onenotebackend.configuration;

We inject an instance of the AADAuthenticationProperties class. This provides access to the values in our Spring configuration file, including the client ID, client secret, and tenant ID.

Java
@Autowired
AADAuthenticationProperties azureAd;

We then create an instance of the Graph API client using the OboAuthenticationProvider created in the previous article. Note that we’re requesting a token with a scope of https://graph.microsoft.com/Notes.Read.All, granting us read access to OneNote notes.

Java
  @Bean
  public GraphServiceClient<Request> getClient() {
    return GraphServiceClient.builder()
        .authenticationProvider(new OboAuthenticationProvider(
            Set.of("https://graph.microsoft.com/Notes.Read.All"),
            azureAd.getTenantId(),
            azureAd.getClientId(),
            azureAd.getClientSecret()))
        .buildClient();
  }
}

Configure Spring Security

We configure our microservice to require a valid token for all requests through the AuthSecurityConfig class:

Java
package com.matthewcasperson.onenotebackend.configuration;
 
import com.azure.spring.aad.webapi.AADResourceServerWebSecurityConfigurerAdapter;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
 
@EnableWebSecurity
public class AuthSecurityConfig extends AADResourceServerWebSecurityConfigurerAdapter {
 
  @Override
  protected void configure(final HttpSecurity http) throws Exception {
    super.configure(http);
    // @formatter:off
    http
        .authorizeRequests()
        .anyRequest()
        .authenticated();
    // @formatter:on
  }
}

Add a Conversion Library

Our microservice will take advantage of Pandoc to perform the conversion between HTML and Markdown. Pandoc is an open-source document converter, that we’ll invoke using a community Java wrapper library.

We add the following dependency to the Maven pom.xml file to include the Pandoc wrapper in our project.

XML
<dependency>
<groupId>org.bitbucket.leito</groupId>
<artifactId>universal-document-converter</artifactId>
<version>1.1.0</version>
</dependency>

Note that the wrapper simply calls the Pandoc executable, so Pandoc needs to be installed and available on the operating system path.

Add the Spring REST Controller

The bulk of our microservice is found in the REST controller handling requests from the front-end web application. This controller is found in the OneNoteController class, in the following package:

Java
package com.matthewcasperson.onenotebackend.controllers;

This class is doing a lot of work, so let’s examine it piece by piece.

We start by injecting an instance of the Graph API client.

Java
@RestController
public class OneNoteController {
 
  @Autowired
  GraphServiceClient<Request> client;

Our front-end web application needs a list of the notebooks created by the currently logged-in user. This is provided by the getNotes method.

Java
@GetMapping("/notes")
public List<String> getNotes() {
  return getNotebooks()
      .stream()
      .map(n -> n.displayName)
      .collect(Collectors.toList());
}

To keep this sample application simple, we’ll provide the ability to view and convert the first page of the first section in any selected notebook. The getNoteHtml method provides the page HTML.

Java
@GetMapping("/notes/{name}/html")
public String getNoteHtml(@PathVariable("name") final String name) {
  return getPageHTML(name);
}

In addition to the page HTML, our microservice allows us to retrieve the page Markdown. The Markdown content is returned by the getNoteMarkdown method.

Java
@GetMapping("/notes/{name}/markdown")
public String getNoteMarkdown(@PathVariable("name") final String name) {
  final String content = getPageHTML(name);
  return convertContent(content);
}

We have several private methods to support the public endpoint methods. These private methods are responsible for querying the Graph API and performing the content conversion.

The getPageHTML method returns the first page from the first section of the named notebook.

One thing to note while using the Graph API client is that many methods can return null values. Fortunately, the client methods that can return null have been annotated with @Nullable. This provides IDEs with the information required to warn us when we might be referencing possible null values.

We make liberal use of the Optional class to avoid littering our code with null checks:

Java
private String getPageHTML(final String name) {
  return getNotebooks()
      .stream()
      // find the notebook that matches the supplied name
      .filter(n -> name.equals(n.displayName))
      // we only expect one notebook to match
      .findFirst()
      // get the notebook sections
      .map(notebook -> notebook.sections)
      // get the first page from the first section
      .map(sections -> getSectionPages(sections.getCurrentPage().get(0).id).get(0))
      // get the page id
      .map(page -> page.id)
      // get the content of the page
      .flatMap(this::getPageContent)
      // if any of the operations above returned null, return an error message
      .orElse("Could not load page content");
}

The conversion of HTML to Markdown is performed in the convertContent method. We use the Pandoc wrapper exposed by the DocumentConverter class to convert the original page HTML into Markdown.

Note that DocumentConverter constructs the arguments to be passed to the external Pandoc application, but doesn’t include the Pandoc app itself. This means we need to install Pandoc alongside our microservice. It also means we pass data through external files instead of directly passing strings.

The convertContent method creates two temporary files: the first containing the input HTML, and the second for the output Markdown. It then passes those files to Pandoc, reads the content of the output file, and cleans everything up.

To convert notes to different formations, this method could be edited to specify different Pandoc arguments, or swapped out completely to replace Pandoc as a conversion tool:

Java
private String convertContent(final String html) {
  Path input = null;
  Path output = null;

  try {
    input = Files.createTempFile(null, ".html");
    output = Files.createTempFile(null, ".md");

    Files.write(input, html.getBytes());

    new DocumentConverter()
        .fromFile(input.toFile(), InputFormat.HTML)
        .toFile(output.toFile(), "markdown_strict-raw_html")
        .convert();

    return Files.readString(output);
  } catch (final IOException e) {
    // silently ignore
  } finally {
    try {
      if (input != null) {
        Files.delete(input);
      }
      if (output != null) {
        Files.delete(output);
      }
    } catch (final Exception ex) {
      // silently ignore
    }
  }

  return "There was an error converting the file";
}

The next set of methods are responsible for calling the Graph API.

The getNotebooks method retrieves a list of notebooks created by the currently logged in user.

One thing to be aware of when interacting with the Graph API is that it typically won’t return child resources when requesting a parent resource. However, it’s possible to override this behavior with the $expand query parameter. Here, we request a list of notebook resources and expand their sections:

Java
private List<Notebook> getNotebooks() {
  return Optional.ofNullable(client
          .me()
          .onenote()
          .notebooks()
          .buildRequest(new QueryOption("$expand", "sections"))
          .get())
      .map(BaseCollectionPage::getCurrentPage)
      .orElseGet(List::of);
}

Because sections don’t support the expansion of child pages, we use the getSectionPages method to make a second request to return the list of pages associated with each section.

Java
private List<OnenotePage> getSectionPages(final String id) {
  return Optional.ofNullable(client
          .me()
          .onenote()
          .sections(id)
          .pages()
          .buildRequest()
          .get())
      .map(OnenotePageCollectionPage::getCurrentPage)
      .orElseGet(List::of);
}

The OnenotePage class doesn’t include the content of the page. To access the content, we need to make one more API request:

Java
private Optional<String> getPageContent(final String id) {
    return Optional.ofNullable(client
        .me()
        .onenote()
        .pages(id)
        .content()
        .buildRequest()
        .get())
        .map(s -> toString(s, null));
}

The toString method converts a stream to a string and captures any exceptions, allowing us to perform this conversion in a lambda. Checked exceptions don’t play well with lambdas passed to classes like Optional.

Java
  private String toString(final InputStream stream, final String defaultValue) {
    try (stream) {
      return new String(stream.readAllBytes(), StandardCharsets.UTF_8);
    } catch (final IOException e) {
      return defaultValue;
    }
  }
}

Build the Front-End Web Application

The frontend web application displays the list of notebooks created by the currently logged-in user, previews the first page of the first section of a selected notebook, and allows the page to be downloaded as a Markdown file.

The MSALOneNoteConverter repo contains the code for this section.

Bootstrap the Spring Project

Just as we did for the back-end, we’ll generate the initial application template using Spring Initalizr to create a Java Maven project, which generates a JAR file using the latest non-snapshot version of Spring against Java 11.

The web application requires the following dependencies:

Image 2

Configure Spring Security

Like the microservice, our web application is configured to require authenticated access to all pages through the AuthSecurityConfig class.

Java
package com.matthewcasperson.onenote.configuration;
 
...
// imports
...

@EnableWebSecurity
@EnableGlobalMethodSecurity(prePostEnabled = true)
public class AuthSecurityConfig extends AADWebSecurityConfigurerAdapter {
 
    @Override
    protected void configure(final HttpSecurity http) throws Exception {
        super.configure(http);
        // @formatter:off
        http
            .authorizeRequests()
                .anyRequest().authenticated()
            .and()
                .csrf()
                .disable();
        // @formatter:on
    }
}

Build a WebClient

We need a WebClient in order for the frontend application to interact with the microservice. WebClient is the new non-blocking solution for making HTTP calls, and is the preferred option over the older RestTemplate.

To call the microservice, each request must have an associated access token. The WebClientConfig class configures an instance of WebClient to include a token sourced from an OAuth2AuthorizedClient:

Java
package com.matthewcasperson.onenote.configuration;

...
// imports
...

@Configuration
public class WebClientConfig {
  @Bean
  public OAuth2AuthorizedClientManager authorizedClientManager(
      final ClientRegistrationRepository clientRegistrationRepository,
      final OAuth2AuthorizedClientRepository authorizedClientRepository) {
 
    final OAuth2AuthorizedClientProvider authorizedClientProvider =
        OAuth2AuthorizedClientProviderBuilder.builder()
            .clientCredentials()
            .build();
 
    final DefaultOAuth2AuthorizedClientManager authorizedClientManager =
        new DefaultOAuth2AuthorizedClientManager(
            clientRegistrationRepository, authorizedClientRepository);
    authorizedClientManager.setAuthorizedClientProvider(authorizedClientProvider);
 
    return authorizedClientManager;
  }
 
  @Bean
  public static WebClient webClient(final OAuth2AuthorizedClientManager oAuth2AuthorizedClientManager) {
    final ServletOAuth2AuthorizedClientExchangeFilterFunction function =
        new ServletOAuth2AuthorizedClientExchangeFilterFunction(oAuth2AuthorizedClientManager);
    return WebClient.builder()
        .apply(function.oauth2Configuration())
        .build();
  }
}

Build the MVC Controller

The MVC controller defined in the OneNoteController class exposes the endpoints that users access via their web browsers. We'll take a look at the code for the following package:

Java
package com.matthewcasperson.onenote.controllers;

Let’s break down and examine this code.

We inject an instance of the WebClient created by the WebClientConfig class.

Java
@Controller
public class OneNoteController {
 
  @Autowired
  WebClient webClient;

The getIndex method receives an OAuth2AuthorizedClient configured to access the microservice. This client is passed to the WebClient to retrieve a list of the notebooks created by the currently logged-in user. The resulting list is saved as the model attribute notes:

Java
@GetMapping("/")
public ModelAndView getIndex(
    @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client) {
  final List notes = webClient
      .get()
      .uri("http://localhost:8081/notes/")
      .attributes(oauth2AuthorizedClient(client))
      .retrieve()
      .bodyToMono(List.class)
      .block();

  final ModelAndView mav = new ModelAndView("index");
  mav.addObject("notes", notes);
  return mav;
}

The getPageView method captures two paths that allow the selected notebook to be previewed in HTML form and downloaded as Markdown.

The iframesrc model attribute is a path to an endpoint that returns the notebook page as HTML. The markdownsrc model attribute is a path to an endpoint that provides the page as a downloadable Markdown file:

Java
@GetMapping("/notes/{name}")
public ModelAndView getPageView(@PathVariable("name") final String name) {
  final ModelAndView mav = new ModelAndView("pageview");
  mav.addObject("iframesrc", "/notes/" + name + "/html");
  mav.addObject("markdownsrc", "/notes/" + name + "/markdown");
  return mav;
}

To preview the notebook page’s HTML, the getNoteHtml method returns the raw HTML, along with the X-Frame-Options and Content-Security-Policy headers that allow this endpoint to be viewed in an HTML iframe element.

Java
@GetMapping(value = "/notes/{name}/html", produces = MediaType.TEXT_HTML_VALUE)
@ResponseBody
public String getNoteHtml(
    @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client,
    @PathVariable("name") final String name,
    final HttpServletResponse response) {
  response.setHeader("X-Frame-Options", "SAMEORIGIN");
  response.setHeader("Content-Security-Policy", " frame-ancestors 'self'");
  return webClient
      .get()
      .uri("http://localhost:8081/notes/" + name + "/html")
      .attributes(oauth2AuthorizedClient(client))
      .retrieve()
      .bodyToMono(String.class)
      .block();
}

The getNoteMarkdown method provides the page as a downloadable Markdown file. By returning a ResponseEntity object and defining the Content-Type and Content-Disposition headers, we instruct the browser to download the returned content rather than display it in the browser.

Java
  @GetMapping("/notes/{name}/markdown")
  public ResponseEntity<byte[]> getNoteMarkdown(
      @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client,
      @PathVariable("name") final String name) {
    final String markdown = webClient
        .get()
        .uri("http://localhost:8081/notes/" + name + "/markdown")
        .attributes(oauth2AuthorizedClient(client))
        .retrieve()
        .bodyToMono(String.class)
        .block();
 
    final HttpHeaders headers = new HttpHeaders();
    headers.setContentType(MediaType.TEXT_MARKDOWN);
    final String filename = "page.md";
    headers.setContentDispositionFormData(filename, filename);
    return new ResponseEntity<>(markdown.getBytes(), headers, HttpStatus.OK);
  }
}

Create the Thymeleaf Templates

The index.html page displays the list of notebooks, and provides a button to redirect the browser to the next page:

HTML
<html>
<head>
  <link rel="stylesheet" href="/style.css">
  <script>
    function handleClick() {
      if (note.selectedIndex !== -1) {
        location.href='/notes/' + note.options[note.selectedIndex].value;
      } else {
        alert("Please select a notebook");
      }
    }
  </script>
</head>
<body>
<div class="container">
  <div class="header">
    <div class="title"><a href="/">ONENOTE CONVERTER</a></div>
  </div>
  <div class="main">
    <form class="formContainer">
      <div class="formRow">
        <select style="display: block" size="5" id="note">
          <option th:each="note: ${notes}" th:value="${note}" th:text="${note}">
          </option>
        </select>
      </div>
      <div class="formRow">
        <input type="button" value="View Note" onclick="handleClick();">
      </div>
    </form>
  </div>
</div>
</body>
</html>

Image 3

The pageview.html page displays the page’s HTML in an iframe and provides a form button to download the Markdown file.

HTML
<html>
<head>
  <link rel="stylesheet" href="/style.css">
</head>
<body>
<div class="container">
  <div class="header">
    <div class="title"><a href="/">ONENOTE CONVERTER</a></div>
  </div>
  <div class="main">
    <form class="formContainer">
      <div class="formRow">
        <iframe style="width: 100%; height: 400px" th:src="${iframesrc}"></iframe>
      </div>
      <div class="formRow">
        <form style="margin-top: 10px" th:action="${markdownsrc}">
          <input type="submit" value="Download Markdown" />
        </form>
      </div>
    </form>
  </div>
</div>
</body>
</html>

Image 4

Conclusion

By taking advantage of the Graph API client, we can interact with the Microsoft Graph API using a fluent and type-safe interface. It’s far more convenient and reliable than performing raw HTTP requests and processing the returned JSON.

In this article, we used the Graph API client to retrieve OneNote notebook pages, preview the original page HTML, and provide the ability to download a Markdown version of the page. Though this was a relatively simple example, it demonstrates how Spring Boot applications can seamlessly interact with Microsoft Office documents on behalf of an end user, by using the Microsoft Graph API and Azure AD.

In the final article of this series, we’ll see how to integrate Spring with Microsoft Teams to create a simple incident management bot.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Matthew Casperson
Technical Writer
Australia Australia

Comments and Discussions

 
-- There are no messages in this forum --
Building Rich Microsoft Graph Apps with MSAL and Graph SDK for Java