We are a medical device company developing solutions for the diagnosis and treatment of lung cancer with core technology in computer vision and augmented reality (AR). Most of our value proposition comes from software and algorithm development. In facing algorithmic challenges in our work, we do a lot of research activities and iterate quickly to convert what we learn into working features.
In addition to being able to implement an idea fast, we look for methodologies and tools that allow us to minimize the time between having a working prototype and having a production ready feature in the product.
As a part of our philosophy all team members have a solid background in software engineering so they can write production quality code from the beginning.
Choosing the Programming Language
Selecting the right programming language is important as porting product and infrastructure code after investing months of development can waste time. The programming language should fit the company and its mission while supporting the development culture and methodology of the R&D team.
The ideal programming language should:
- Be a general purpose language:
- Support solid modern programming concepts
- Allow simplicity in general tasks like multi-threading, networking,
- databases, etc.
- Allow fast prototyping
- Have a large, active community
- Have good tooling (IDE, build tools, CI)
- Enable algorithm development in the field of Machine Learning/Computer Vision/Image processing and numerical computations in general
- Create graphic user interfaces
- Support both 2D and 3D visualization
- Be interoperable with other languages
Available options:
- C++
- Python
- C#
- Go
- MATLAB
As the pace of development is a key parameter, we have ranked the languages according to level, degree of difficulty and chance of bugs:
- Python
- Go
- C#
- C++
Python and Go are comparable in terms of development speed, especially for web development. However, as we are focusing on data science programming, Python remains in first place.
Advantages of Python:
- Python is simple, productive and readable so it is more maintainable
- Python is a popular, mature and modern OOP language with a huge active community
- Python and its packages cover all software fields, allowing easy integration of numerous libraries in one code base. Many tasks related to WEB, networking, cryptography, async programming, scientific programming and machine learning are already implemented so it’s easy to become productive with the standard library alone
- As Python is the most common language in the research community, academics publish source code associated with research papers. This provides easy access to cutting edge findings in the Computer Vision and Deep Learning research communities
Limitations of Python:
- Performance issues due to:
- Being an interpreted language
- Global Interpreter Lock (GIL), which prevents multi-threading
- Bugs due to the dynamic nature of the language (duck typing)
- Weak packaging system
- PIP and CONDA help with package management, but both have issues involving dependencies
Options that address these limitations:
- For CPU-intensive tasks, multiprocessing can be used to avoid the limitation of GIL. There are external native libraries available for most computational tasks (like Numpy), so heavy number crunching can be avoided in Python. If there is no native library for the task, then it’s possible to implement critical parts in C++.
- For IO bound tasks, Python’s multithreading has been adequate since the GIL release.
- To address the duck typing issue, mature static analysis tools are available and from Python v3.5 there are built-in type hints in the code that are utilized by modern IDEs to find errors
- The use of containers (like Docker and virtual environments) isolate components and therefore reduce the chance of package dependency conflicts
Summary
Today Python is used as the default language by many medical imaging companies while C++ is used as a fallback when performance improvement is needed.
The following list of libraries is a good start:
- OpenCV - for image processing and some geometry related algorithms
- ITK - for image segmentations
- VTK, Qt3d - for 3D rendering
- Qt with PyQt (QtQuick, QWidgets) - for application UI development
- Numpy - for all matrix and vector operations
- SciPy - general purpose algorithms and optimization
- Pandas - structured data manipulation and reporting
- PyDicom - for loading DICOM images
- Numba - JIT compiler that translates a Python code into fast machine code
- Nose - for automatic testing
- Tensorflow / Theano - backend engine for deep learning
- Keras - research and production of deep learning algorithms
This software stack may be applicable to many companies beyond the field of medical imaging.