Show HN: I am building an open-source Confluence and Notion alternative
Docmost is an open-source collaborative wiki software in beta, offering real-time collaboration, permissions management, and more. Users can access the website for details, documentation, and development resources.
Read original articleDocmost is an open-source collaborative wiki and documentation software currently in beta. It offers features like real-time collaboration, spaces, permissions management, groups, comments, page history, search, and file attachment. Users can access the Docmost website for more information and the documentation to get started. Feedback is encouraged as the software progresses towards a stable release. For those interested in contributing, there is development documentation available on self-hosting and development. Screenshots of the home and editor interfaces are also provided on the Docmost website for a visual representation of the software.
Related
Show HN: Online OPML editor to manage subscription lists
The OPML Editor on GitHub manages RSS and Atom feeds. Users can add, merge OPML files, and remove duplicates using Svelte and CodeMirror technologies. Licensed under AGPL-3.0. Access at opml.imadij.com.
Documentation Driven Development (2022)
Documentation Driven Development (DDD) is proposed as a more effective approach than Test-Driven Development (TDD) for software development. DDD involves starting with documentation to iron out implementation details before coding, helping to address API shifts and scope misunderstandings early on. By documenting requirements and potential future API changes, developers can better plan and refine their code, avoiding costly refactors later. DDD emphasizes the importance of various forms of documentation, including design mockups, API references, and tests, to communicate thoughts and guide development. This method encourages a feedback loop on APIs and work scope, enhancing code quality and project outcomes. DDD aligns with concepts like Behavioral Driven Development (BDD) and Acceptance Test-Driven Development (ATDD), emphasizing user behavior validation and strong communication practices. By incorporating DDD into their workflow, developers can improve collaboration, goal refinement, and code quality, leading to more successful project outcomes.
OpenEMR: Open-source medical record software
OpenEMR is a feature-rich open-source electronic health records and medical practice management solution, offering ONC Certification, advanced features, multilingual support, and community-driven development. It prioritizes data ownership, security, and accessibility.
The Eternal Truth of Markdown
Markdown, a simplified code alternative to HTML, enables diverse document formats from plain text. Despite lacking standardization, it thrives for its adaptability and simplicity, appealing to writers and programmers alike.
Overleaf: An open-source online real-time collaborative LaTeX editor
Overleaf is an open-source online LaTeX editor on GitHub. It offers project details, installation instructions, Docker setup, and contribution guidelines. For more, refer to the GitHub repository.
One thing to remember: devs and tech-savvy people skip everything and look directly at the terminal commands/code. It’s the reason you should never insert the “don’ts” in your repository readme too high on the page: they will be the first things we’ll cut and paste :D
This is not a criticism; it seems you did a wonderful job. Just the feedback of one of many dummy experimenters that you might lose on that page :)
- An export function (PDFs).
- An integrated diagram editor like Gliffy.
- History / diffs.
Outline is the closest to this so far, but we are in no rush, so we'll watch the development of this as well. Thanks for sharing!
I bring this up because a feature that could set you apart from others is the concept of a “merge request” for documentation. Where someone can make a document, another can modify it and submit changes for review.
GitBook has this but it lacks in some other key ways for us.
(But I'm not as big a fan of certain wiki software products that seem guided by enterprise sales to customers who don't seem to understand wikis. :) )
One thing an enterprise product did do passably well, for a big win, was integration of a drawing tool. Not everyone in a company needs that integration, but some users will, and its presence can mean that a super-helpful visual is captured when it otherwise wouldn't.
1. Everything is locked in. I want to be able to easily export or back up my notes.
2. The pricing is so nickel and dimey. Have more than 100 nodes in the document tree? Upgrade your tier. Adding new people to projects is a buying decision every time and it’s fatiguing.
Can you tell us more about how it uses pg and redis?
I work on XWiki [1]. Nice to see fellows building open source alternatives, we can't have enough of this. I hope you succeed.
It takes a lot and lot of work to build something comparable to Confluence. XWiki has been there since the beginning. How do you position yourself compared to XWiki? What made you decide not to join the forces?
- Managing pages in git/other vcs as plain text, using any editor I choose. I can commit pages using git or other vcs, don't have to use the browser to add pages.
- Writing pages in some markup language, maybe not markdown, as it is not expressive enough in some areas. Maybe markdown is possible for simple pages and the wiki knows it is markdown from the file extension, but the wiki also allows more powerful formats like restructuredText, which can be extended by the user.
- Server-side rendering of pages, that can easily be cached (since pages are files, one could easily check the shasum of the file to determin cache validity), which makes display of pages almost instant, as opposed to laggy shitty confluence.
Very long coffee breaks, maybe down the street, for documentation from across town to load. We didn’t attempt to update that documentation, so anything better, is better. I’m barely exaggerating
I also noticed that the documentation is using Docusaurus - it would be awesome to use Docmost for it, so that you have both a demo environment (at least R/O) and do dogfooding
You do not want to run this in Postgres, or any RDBMS for that matter. I promise you. Here [0] is `y-sweet` [1] discussing (at a shallow level) why persisting the actual content in an RDBMS isn't great. At $COMPANY, we ran Postgres on massive EC2s with native NVMe drives for storage, and they still struggled with this stuff (albeit with the rest of the app also using them). Use an object store, use an LSM-tree solution like MyRocks [2], just don't use an RDBMS, and especially not Postgres. It is uniquely bad at this. I'll explain.
Let's say I'm storing RFC2324 [3]. In TXT format, this is just shy of 20 KB. Even if it's 1/5th that size, it doesn't matter for the purposes of this discussion. As you may or may not know, Postgres uses something called TOAST [4] for storing large amounts of data (by default, any time a tuple hits 2 KB). This is great, except there's an overhead to de-TOAST things. This overhead can add up on retrievals.
Then there's WAL amplification. Postgres doesn't really do an `UPDATE`, it does a `DELETE` + `INSERT`. Even worse, it has to write entire pages (8 KB) [5], not just the changed content (there are circumstances in which this isn't true, but assume it is in general). Here's a view of `pg_stat_wal`, after I've been playing with it:
docmost=# SELECT wal_fpi, wal_bytes FROM pg_stat_wal:
wal_fpi | wal_bytes
---------+-----------
1641 | 11537465
(1 row)
Now I'll change a single byte in the aforementioned RFC, and run that again: docmost=# SELECT wal_fpi, wal_bytes FROM pg_stat_wal;
wal_fpi | wal_bytes
---------+-----------
1654 | 11656052
(1 row)
That is nearly 120 KB of WAL written to change one byte. This is of course dependent upon the size of the document being edited, but it's always going to be bad.Now let's look at the search query [6], which I've reproduced (mostly; I left out creator_id and the ORDER BY) here:
docmost=# EXPLAIN(ANALYZE, BUFFERS, COSTS) SELECT id, title, icon, parent_page_id, slug_id, creator_id, created_at, updated_at, ts_headline('english', text_content, to_tsquery('english', 'method'), 'MinWords=9, MaxWords=10, MaxFragments=10') FROM pages WHERE space_id = '01906698-1b7c-712b-8d4f-935930b03318' AND tsv @@ to_tsquery('english', 'method');
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on pages (cost=0.00..12.95 rows=1 width=192) (actual time=13.473..48.684 rows=3 loops=1)
Filter: ((tsv @@ '''method'''::tsquery) AND (space_id = '01906698-1b7c-712b-8d4f-935930b03318'::uuid))
Rows Removed by Filter: 3
Buffers: shared hit=32
Planning:
Buffers: shared hit=1
Planning Time: 0.261 ms
Execution Time: 48.717 ms
~50 msec to do a relatively simple SELECT with no JOINs isn't great, and it's from the use of `ts_headline`. Unfortunately, it has to parse the original document, not just the tsvector summary to produce results. If I remove that function from the query, it plummets to sub-msec times, as I would expect.It doesn't get better if I forcibly disable sequential scans to get it to favor the GIN index on `tsv` (unsurprising, given the small dataset):
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.pages (cost=106.29..110.56 rows=1 width=192) (actual time=17.983..51.424 rows=3 loops=1)
Recheck Cond: (pages.tsv @@ '''method'''::tsquery)
Filter: (pages.space_id = '01906698-1b7c-712b-8d4f-935930b03318'::uuid)
Heap Blocks: exact=1
Buffers: shared hit=41
-> Bitmap Index Scan on pages_tsv_idx (cost=0.00..106.29 rows=1 width=0) (actual time=1.231..1.231 rows=7 loops=1)
Index Cond: (pages.tsv @@ '''method'''::tsquery)
Buffers: shared hit=25
Planning:
Buffers: shared hit=1
Planning Time: 0.343 ms
Execution Time: 51.647 ms
And speaking of GIN indices, while they're great for this, they also need regular maintenance, else you risk massive slowdowns [7]. This was after having inserted a few large-ish documents similar to the RFC, and creating a few short pages organically. docmost=# SELECT * FROM pgstatginindex('pages_tsv_idx');
version | pending_pages | pending_tuples
---------+---------------+----------------
2 | 23 | 26
Let's force an early cleanup: docmost=# EXPLAIN (ANALYZE, BUFFERS, COSTS) SELECT gin_clean_pending_list('pages_tsv_idx'::regclass);
QUERY PLAN
--------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=8) (actual time=16.574..16.577 rows=1 loops=1)
Buffers: shared hit=4659 dirtied=47 written=22
Planning Time: 0.322 ms
Execution Time: 16.776 ms
17 msec doesn't sound like a lot, but bear in mind this was only hitting 4659 pages, or 37 MB. It can get worse.You should also take a look at the DB config if you're to keep using it, starting with `shared_buffers`, since it's currently at the default value of 128 MB. That is not going to work well for anyone trying to use this for real work.
You should also optimize your column ordering. EDB has a great writeup [8] on why this matters.
Finally, I would like to commend you for using UUIDv7. While ideally I'd love (as someone who works with DBs) to see integers or natural keys, at least these are k-sortable. Oh, and foreign keys – thank you! They're so often eschewed in favor of "we'll handle it in the app", but they can absolutely save your data from getting borked.
[0]: https://digest.browsertech.com/archive/browsertech-digest-fi...
[1]: https://github.com/jamsocket/y-sweet
[2]: http://myrocks.io
[3]: https://www.rfc-editor.org/rfc/rfc2324.txt
[4]: https://www.postgresql.org/docs/current/storage-toast.html
[5]: https://wiki.postgresql.org/wiki/Full_page_writes
[6]: https://github.com/docmost/docmost/blob/main/apps/server/src...
[7]: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4...
Note: Outline is another Open-Source Documentation/Wiki and Collaboration tooling option I like.
Knowledge management is a special area. Look forward to seeing this grow.
As a heavy user of both Confluence and Notion, and in the interest in seeing alternatives like this grow:
Is there any plan to make this tool local/offline-first and mobile-first? There's a big need in this feature, and something that's best baked into the bread early. It's a big gap of Notion and ultimately why I had to ditch it.
Confluence has some ways to at least cache enough of it, or use a plugin. Confluence is also massive, lots of features (including workflows and approvals).. it might be worth clarifying which ones you're covering and planning to cover.
* markdown support (for writing/formatting)
* mermaid support (for diagrams)
Did you consider Nix/Guix instead of docker as a suggested way to deploy? Docker is a harmful and very common tool, witch lead to a gazillion of wasted resources and security nightmare due to pulling anything from unknown sources and put it in production.
Aside, similarly, MarkDown is popular but it's really a crappy set of markups that fails in the most useful productivity aspect: outlining. Org-mode is less known being tied to Emacs but it's far, far more featuresfull and immediate to use.
Beside that's I wish the best luck to all devs, there are gazillion of webapps all suffering the modern stack issue: inability to integrate anything, so the need to recreate the wheel everytime and incorporate a feature at a time to the point of being monsters, but your sauce so far seems to be the most polished I've seen.
Do you have plans to offer a hosted/managed/SaaS service? As others have pointed out, not everyone wants to self-host, and offering a managed service doesn't diminish the advantages of it being Free and Open Source (assuming good data export/import features).
For comparison, the SourceHut project offers a managed service, which is well-run, well-liked, and brings them good revenue.
I consider NextCloud to be an example of what not to do. There are plenty of NextCloud providers, but (from what I can tell) none of them are closely tied to the development of NextCloud itself. Bug-reports to service-providers can be expected to be met with that's a NextCloud bug, not our problem.
Unfortunately I’d never advocate for something like this at my work. Self-hosting doesn’t make sense in terms of total cost of ownership. I’d rather engineers spent time solving problems in our core business than making sure our wiki is online.
I use PlantUML extensively and tools like Znai and others have native support for it.
This is something I’m going to keep a close eye on. My company is using confluence and I hate how slow confluence is.
Your marketing site, the menu doesn’t close when clicking on an item on mobile Firefox on IOS
On docmost.com, pinch to zoom is disabled when viewing screenshots (Firefox Android).
What was the thought process for AGPL instead of something else ?
I will admit that it would be very cool to see a client that abstracted got and markdown away for non-technical users.
I feel like obsidian with some more git polish could get it done.
I like the focus on UI (many open source projects missing this aspect)
I will definitely check it out.
Related
Show HN: Online OPML editor to manage subscription lists
The OPML Editor on GitHub manages RSS and Atom feeds. Users can add, merge OPML files, and remove duplicates using Svelte and CodeMirror technologies. Licensed under AGPL-3.0. Access at opml.imadij.com.
Documentation Driven Development (2022)
Documentation Driven Development (DDD) is proposed as a more effective approach than Test-Driven Development (TDD) for software development. DDD involves starting with documentation to iron out implementation details before coding, helping to address API shifts and scope misunderstandings early on. By documenting requirements and potential future API changes, developers can better plan and refine their code, avoiding costly refactors later. DDD emphasizes the importance of various forms of documentation, including design mockups, API references, and tests, to communicate thoughts and guide development. This method encourages a feedback loop on APIs and work scope, enhancing code quality and project outcomes. DDD aligns with concepts like Behavioral Driven Development (BDD) and Acceptance Test-Driven Development (ATDD), emphasizing user behavior validation and strong communication practices. By incorporating DDD into their workflow, developers can improve collaboration, goal refinement, and code quality, leading to more successful project outcomes.
OpenEMR: Open-source medical record software
OpenEMR is a feature-rich open-source electronic health records and medical practice management solution, offering ONC Certification, advanced features, multilingual support, and community-driven development. It prioritizes data ownership, security, and accessibility.
The Eternal Truth of Markdown
Markdown, a simplified code alternative to HTML, enables diverse document formats from plain text. Despite lacking standardization, it thrives for its adaptability and simplicity, appealing to writers and programmers alike.
Overleaf: An open-source online real-time collaborative LaTeX editor
Overleaf is an open-source online LaTeX editor on GitHub. It offers project details, installation instructions, Docker setup, and contribution guidelines. For more, refer to the GitHub repository.