Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using Python and OCaml in the same Jupyter notebook (janestreet.com)
165 points by l-m-z on Dec 20, 2019 | hide | past | favorite | 34 comments


The jupyter-notebook interface is IMO, not really suited for interactive development. Notebooks enforce code to be broken up into pieces of cells to be evaluated. The code must be written in a specific linear way to cater the format of notebooks. Also the distinction between the code & notebooks are too big, it's clunky to move between. You get to resort copy-pasting back-and-forth. It's always frustrating when I try to do development in an interactive way, and finds out that hooking on jupyter-notebook (whether the language is Python or Node.JS or etc...) is the best way.

I think there should be a better integration between editors & REPLs, something like Common Lisp & Emacs SLIME. SLIME queries a server running on Common Lisp to incrementally compile & evaluate code from the editor (like LSP, but not autocomplete queries but evaluation queries). I hope the LSP protocol gains ability to evaluate code for languages with REPLs, it would be awesome if it allows interactive development in multiple editors & multiple languages (like how LSP is facilitating autocomplete).


> Also the distinction between the code & notebooks are too big, it's clunky to move between. You get to resort copy-pasting back-and-forth.

We have solved this problem by providing 2-way sync between notebooks and libraries, using a system called nbdev:

https://www.fast.ai/2019/12/02/nbdev/


Jeremy Howard is a genius, do you have any release date for fastai v2?


It's feature complete now, and we're writing the paper and book about it. So it's ready for early adopters to start prototyping with. Lots of folks already doing so and discussing here: https://forums.fast.ai/c/fastai-users/fastai-v2

Not sure when the final version will be released. Main thing we need to do is cleaning up the docs.


Another complaint I have with jupyter notebooks is that they don't play nice with git. For this (and other reasons), I much prefer Emacs org-mode, or even running Python in Rmarkdown+Rstudio+reticulate.


I second emacs-org mode. It is extremely powerful though I don't know if it addresses GPs concerns. You could just execute the compiler etc against the actual program from within the code block and then "+CALL" in other sections to avoid the copypasta'ing I guess. I don't do very advanced stuff so I haven't run into problems with just using code in the blocks.

The main problem with emacs org-mode is that users probably need to use emacs, and lots of people have their own editor preferences, and while some have org-mode-alike's, they aren't as fully capable as emacs org-mode. Which is what Jupyter addresses by making the interface more accessible/editable to all.

That said, I think one of the most important factors that the linear notebook flow encourages is a focus on reproducibility, whether it's Jupyter or Org-Mode. I feel like overcomplicating the dev environment would make the much harder and would at least need to be planned for.


You can also extract the code via tangling so you can run it outside emacs.


Even more you can compose a jupyter notebook from an org file


I didn't know that! Cool!


Jupytext solves this for me completely. It is an extension to jupyter code which can be turned on (globally or per-notebook, with additional configuration options) which automatically syncs the notebook to a file which can be checked into git and has nice diffs. It is a two-way binding, so editing the file will include those changes in jupyter, and if you check out a file in that format it can be converted into a jupyter notebook on your side easily as well.


For python, visual studio code has some of this integration. It still runs a jupyter kernel, but will run "cells" directly from your source code file (where the cells are delimited by "# %%" comments) and output to a separate pane with a jupyter-driven repl in it.

I accidentally discovered this after exporting a jupyter notebook to python and opening it in vscode.


Emacs and Atom also support interacting with running Jupyter kernels when editing scripts ([1], [2]). In a sense, notebook-less Jupyter kernels already are the "LSP for REPLs".

[1] https://github.com/dzop/emacs-jupyter

[2] https://atom.io/packages/hydrogen


PyCharm also has this integration. You can run python files with “#%%” delimiters for cells or Jupyter notebooks. I had issues with scrolling because opening the notebooks results in a split view with editor and notebook rendering.


That's pretty cool! It would have been much better if it didn't need the #%% and just evaluated the top-level structure (like function definition) with jupyter.

Still just merging the notebooks into the editor is a big step. :-)


> The code must be written in a specific linear way to cater the format of notebooks.

This issue is super annoying. Notebook's _force_ you into that linear style -- which then packages up your code in a form that can't compose with anything ... Its not even a limitation of the notebook file format as if often thought -- its simply the case that notebook's work best when the majority of code in the notebook is in the same scope -- otherwise you can't interactively inspect intermediate variables ...

I was helping a friend of mine with some data science tasks and made a little library to restore some ability to compose a notebook written in that linear style into larger programs ...

https://github.com/breathe/NotebookScripter

Its relatively easy to take a notebook written normally, define a few parameters for it that can be configured externally, and turn that notebook into a callable function ...

I really think that the various notebook implementation's (jupyter, swift playgrounds) -- _need_ to add some sort of parameterization model to the notebook concept. Its not that hard to come up with a transformation that allows one to (1) work interactively in a notebook which has its parameters inferred from defaults (2) exposes a callable interface to that notebook that allows the caller to supply new values for parameters

A Notebook <-> function transformation is really all that is needed to create a real-programming model for Notebook's which would let one compose notebook's into larger programs while still writing the notebooks themselves in a way that's useful for data exploration ...

Going both ways Notebook <-> function and function <-> Notebook would be such a nice general purpose development feature... I'd love to take any function in a swift project and be able to interactively develop the function body in a 'playground like environment'. All that's required would be to supply values for the function's arguments -- and then there is no reason (aside from the limitations of the existing tools) why you couldn't compile the function into a form that supported a 'playground-like' experience for working on the function body.


jupyter-notebook is not meant for that. It's meant for code and data exploration. If you dev something using the notebook, you are doing it wrong. That's what IDE are for.


You mean something like Julia from Atom?


This drops graphs straight back into emacs:

https://github.com/dzop/emacs-jupyter


As static images, though? Not interactive?


Notebooks are not for interactively developing large codebases, instead see them as extremely powerful scientific calculators and you'll understand their appeal.


I'm not saying that I don't understand the appeal; it's just that interactive development is usually done in Jupyter Notebooks due to the shortcomings of various editors not able to do code evaluation.

I'm dreaming about something like Emacs Slime[0] for other languages (that aren't Common Lisp) where you can write code, evaluate it in the editor, switch to the REPL and test it, recompile while debugging.... etc. [1]

And the most similar tool of the other languages... is basically jupyter-notebook. There is no such tool for interactive development except with the bare-bone REPL in the shell.

[0] https://common-lisp.net/project/slime/

[1] https://www.youtube.com/watch?v=_B_4vhsmRRI, https://www.youtube.com/watch?v=sBcPNr1CKKw,


CIDER is a SLIME-like environment for Clojure and ClojureScript, with a client for Emacs, among many other editors. If there's not an equivalent for Python, I hope someone is working on it :-)

https://github.com/clojure-emacs/cider


CIDER has been the best programming environment I have experienced (probably haven't tried enough of SLIME though). It combines coding with dynamic experimentation in such a fun way.

For literate programming, the Clojure/Clojurescript ecosystem keeps bringing up fascinating environments (in addition to CIDER+Org-mode and Clojupyter).

https://github.com/metasoarous/oz

https://github.com/jsa-aerial/saite

https://github.com/pink-gorilla/gorilla-notebook

are all actively developed and worth following, and they are all rather innovative in the ideas they bring to the table.

For polyglot reproducible literate programming, there is also https://nextjournal.com (implemented in Clojure/Clojurescript).


The pain point with CIDER for me has been, the complexity of setting it up and keeping it current (so many components to it Clojure, leiningen, cider, nrepl, etc), and the fact that it can't drop to a live REPL at the point of exception.


Worth mentioning there is an emerging scientific computational framework Owl[1][2] and interesting concept of using OCaml-based languages for more formal protocol or logic description with Imandra[3].

[1] https://ocaml.xyz

[2] https://github.com/owlbarn

[3] https://imandra.ai


Really nice. Whenever I read a Jane Street OCaml blog article, I wish I had the same thing for Haskell. Oh well, life is not perfect.

I have used the Haskell TensorFlow bindings, but the OCaml examples look more straightforward.


There is also a notebook project from Netflix called Polynote that support Scala and Python interop. I think they opted for not making a Jupiter kernel because of the need for fast code completion.

https://polynote.org/


Why use Python though? Once you are rolling with an ML language wouldn't it make sense to use the C++ interfaces of the machine learning/data science projects that make Python interesting at all?


Many python libraries C implementations doesn't have ergonomic C bindings, they were explicitly made by python developers to be called from python so porting them to other languages is usually pointless. They are user friendly thanks to the large amount of python wrapper code, throw that away and you don't have much of value left.


One of the worst examples is TensorFlow, which is really, really hard to use from anything other than Python


That's pretty cool. Reminds me of mixing blocks from different languages in org mode.


Neat! I love FFI and I love functional programming.


Very cool, but expected a pic of snake riding a camel


..on Jupiter




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: